R Language
데이터 프레임 집계
수색…
소개
집계는 R의 가장 일반적인 용도 중 하나입니다. 여기서 R을 수행하는 몇 가지 방법이 있습니다. 여기에서 설명하겠습니다.
베이스 R로 집계
이를 위해 다음과 같이 사용할 수있는 aggregate 함수를 사용합니다.
aggregate(formula,function,data)
다음 코드는 집계 함수를 사용하는 다양한 방법을 보여줍니다.
암호:
df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
# sum, grouping by one column
aggregate(value~group, FUN=sum, data=df)
# mean, grouping by one column
aggregate(value~group, FUN=mean, data=df)
# sum, grouping by multiple columns
aggregate(value~group+subgroup,FUN=sum,data=df)
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
aggregate(value~group, FUN=function(x) sum(x[x>2]), data=df)
산출:
> df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(df)
group subgroup value
1 Group 1 A 2.0
2 Group 1 A 2.5
3 Group 2 A 1.0
4 Group 2 A 2.0
5 Group 2 B 1.5
>
> # sum, grouping by one column
> aggregate(value~group, FUN=sum, data=df)
group value
1 Group 1 4.5
2 Group 2 4.5
>
> # mean, grouping by one column
> aggregate(value~group, FUN=mean, data=df)
group value
1 Group 1 2.25
2 Group 2 1.50
>
> # sum, grouping by multiple columns
> aggregate(value~group+subgroup,FUN=sum,data=df)
group subgroup value
1 Group 1 A 4.5
2 Group 2 A 3.0
3 Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> aggregate(value~group, FUN=function(x) sum(x[x>2]), data=df)
group value
1 Group 1 2.5
2 Group 2 0.0
dplyr로 집계
dplyr로 집계하는 것은 쉽습니다! 이를 위해 group_by () 및 summarize () 함수를 사용할 수 있습니다. 몇 가지 예가 아래에 나와 있습니다.
암호:
# Aggregating with dplyr
library(dplyr)
df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
print(df)
# sum, grouping by one column
df %>% group_by(group) %>% summarize(value = sum(value)) %>% as.data.frame()
# mean, grouping by one column
df %>% group_by(group) %>% summarize(value = mean(value)) %>% as.data.frame()
# sum, grouping by multiple columns
df %>% group_by(group,subgroup) %>% summarize(value = sum(value)) %>% as.data.frame()
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
df %>% group_by(group) %>% summarize(value = sum(value[value>2])) %>% as.data.frame()
산출:
> library(dplyr)
>
> df = data.frame(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(df)
group subgroup value
1 Group 1 A 2.0
2 Group 1 A 2.5
3 Group 2 A 1.0
4 Group 2 A 2.0
5 Group 2 B 1.5
>
> # sum, grouping by one column
> df %>% group_by(group) %>% summarize(value = sum(value)) %>% as.data.frame()
group value
1 Group 1 4.5
2 Group 2 4.5
>
> # mean, grouping by one column
> df %>% group_by(group) %>% summarize(value = mean(value)) %>% as.data.frame()
group value
1 Group 1 2.25
2 Group 2 1.50
>
> # sum, grouping by multiple columns
> df %>% group_by(group,subgroup) %>% summarize(value = sum(value)) %>% as.data.frame()
group subgroup value
1 Group 1 A 4.5
2 Group 2 A 3.0
3 Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> df %>% group_by(group) %>% summarize(value = sum(value[value>2])) %>% as.data.frame()
group value
1 Group 1 2.5
2 Group 2 0.0
data.table로 집계하기
data.table 패키지를 사용하여 그룹화하는 것은 dt[i, j, by]
구문을 사용하여 수행됩니다. " dt를 가져 와서 i를 사용하여 행을 부분 집합 한 다음 j로 그룹화합니다. "dt 문 내에서 , 여러 계산 또는 그룹을 목록에 넣어야합니다. list()
의 별칭은 .()
이므로 둘 다 서로 바꿔서 사용할 수 있습니다. 아래 예제에서는 .()
사용 .()
.
암호:
# Aggregating with data.table
library(data.table)
dt = data.table(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
print(dt)
# sum, grouping by one column
dt[,.(value=sum(value)),group]
# mean, grouping by one column
dt[,.(value=mean(value)),group]
# sum, grouping by multiple columns
dt[,.(value=sum(value)),.(group,subgroup)]
# custom function, grouping by one column
# in this example we want the sum of all values larger than 2 per group.
dt[,.(value=sum(value[value>2])),group]
산출:
> # Aggregating with data.table
> library(data.table)
>
> dt = data.table(group=c("Group 1","Group 1","Group 2","Group 2","Group 2"), subgroup = c("A","A","A","A","B"),value = c(2,2.5,1,2,1.5))
> print(dt)
group subgroup value
1: Group 1 A 2.0
2: Group 1 A 2.5
3: Group 2 A 1.0
4: Group 2 A 2.0
5: Group 2 B 1.5
>
> # sum, grouping by one column
> dt[,.(value=sum(value)),group]
group value
1: Group 1 4.5
2: Group 2 4.5
>
> # mean, grouping by one column
> dt[,.(value=mean(value)),group]
group value
1: Group 1 2.25
2: Group 2 1.50
>
> # sum, grouping by multiple columns
> dt[,.(value=sum(value)),.(group,subgroup)]
group subgroup value
1: Group 1 A 4.5
2: Group 2 A 3.0
3: Group 2 B 1.5
>
> # custom function, grouping by one column
> # in this example we want the sum of all values larger than 2 per group.
> dt[,.(value=sum(value[value>2])),group]
group value
1: Group 1 2.5
2: Group 2 0.0
Modified text is an extract of the original Stack Overflow Documentation
아래 라이선스 CC BY-SA 3.0
와 제휴하지 않음 Stack Overflow