[R] 데이터 전처리 (Preprocessing)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

라일락 꽃이 피는 날

[R] 데이터 전처리 (Preprocessing) 본문

데이터 분석/R

[R] 데이터 전처리 (Preprocessing)

eunki 2021. 6. 25. 17:54

728x90

데이터 전처리 (Preprocessing)

filter()	행 추출
select()	열(변수) 추출
arrange()	정렬
mutate()	변수 추가
summarise()	통계치 산출
group_by()	집단별로 나누기
left_join()	데이터 합치기 (열)
bind_rows()	데이터 합치기 (행)

library(dplyr)

1. filter

%>% : 파이프 연산자 (pipe operator), chain operator

RStudio 단축키 [Ctrl + Shift + M]

논리 연산자		산술 연산자
<	작다	+	더하기
<=	작거나 같다	-	빼기
>	크다	*	곱하기
>=	크거나 같다	/	나누기
==	같다	^, **	제곱
!=	같지 않다	%/%	나눗셈의 몫
\|	또는	%%	나눗셈의 나머지
&	그리고
%in%	매칭 확인

# 1반인 경우만 출력
exam %>% filter(class == 1)

# 3반이 아닌 경우만 출력
exam %>% filter(class != 3)

# 수학 점수가 50점을 초과한 경우
exam %>% filter(math > 50)

# 수학 점수가 50점 미만인 경우
exam %>% filter(math < 50)

# 영어 점수가 80점 이상인 경우
exam %>% filter(english >= 80)

# 영어 점수가 80점 이하인 경우
exam %>% filter(english <= 80)

%in% : 매치 연산자 (match operator)

# 1, 3, 5반에 해당되면 추출 
exam %>% filter(class == 1 | class == 3 | class == 5) 
exam %>% filter(class %in% c(1,3,5))

2. select

# math 변수 추출
exam %>% select(math)

# class, math, english 변수 추출
exam %>% select(class, math, english)

# math 변수 제외
exam %>% select(-math)

# math, english 변수 제외
exam %>% select(-math, -english)

exam %>% 
  filter(class == 1) %>%  # class가 1인 행 추출
  select(english)  # english 추출

exam %>% 
  select(id, math) %>%  # id, math 추출
  head(10)  # 앞부분 10행까지 추출

3. arrange

# math 오름차순 정렬
exam %>% arrange(math)

# math 내림차순 정렬
exam %>% arrange(desc(math))

# class 및 math 오름차순 정렬
exam %>% arrange(class, math)

4. mutate

exam %>% 
  mutate(total = math + english + science) %>%  # 총합 변수 추가

exam %>% 
  mutate(total = math + english + science,  # 총합 변수 추가
         mean = (math + english + science) / 3) %>%  # 총평균 변수 추가

exam %>% 
  mutate(test = ifelse(science >= 60, "pass", "fail")) %>%

5. group_by, summarise

mean()	평균
sum()	합계
sd()	표준편차
median()	중앙값
min()	최솟값
max()	최댓값
n()	빈도

exam %>% 
  group_by(class) %>%  # class 별로 분리
  summarise(mean_math = mean(math))  # math 평균 산출

exam %>% 
  group_by(class) %>%  # class 별로 분리
  summarise(mean_math = mean(math),  # math 평균
            sum_math = sum(math),  # math 합계
            median_math = median(math),  # math 중앙값
            n = n())  # 학생 수

6. left_join

test1 <- data.frame(id = c(1, 2, 3, 4, 5), 
                    midterm = c(60, 80, 70, 90, 85)) 
test2 <- data.frame(id = c(1, 2, 3, 4, 5), 
                    final = c(70, 83, 65, 95, 80)) 

total <- left_join(test1, test2, by = "id") 
total

7. bind_rows

group_a <- data.frame(id = c(1, 2, 3, 4, 5), 
                      test = c(60, 80, 70, 90, 85)) 
group_b <- data.frame(id = c(6, 7, 8, 9, 10), 
                      test = c(70, 83, 65, 95, 80)) 

group_all <- bind_rows(group_a, group_b) 
group_all

728x90

'데이터 분석 > R' 카테고리의 다른 글

[R] 데이터 정제하기 (0)	2021.06.25
[R] 그래프 그리기 (0)	2021.06.25
[R] 파생 변수 생성 (0)	2021.06.19
[R] 데이터 수정하기 (0)	2021.06.19
[R] 데이터 파악하기 (0)	2021.06.19

'데이터 분석/R' Related Articles

라일락 꽃이 피는 날

[R] 데이터 전처리 (Preprocessing) 본문

[R] 데이터 전처리 (Preprocessing)

'데이터 분석 > R' 카테고리의 다른 글

티스토리툴바