Python Language
판다 변환 : 그룹에 대한 연산을 수행하고 결과를 연결합니다.
수색…
단순 변환
먼저 더미 데이터 프레임을 만듭니다.
고객은 n 개의 주문을 가질 수 있고, 주문은 m 개의 항목을 가질 수 있으며, 항목은 더 많은 주문을받을 수 있다고 가정합니다
orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
# And this is how the dataframe looks like:
print(orders_df)
# customer_id order_id item
# 0 1 1 apples
# 1 1 1 chocolate
# 2 1 1 chocolate
# 3 1 2 coffee
# 4 1 2 coffee
# 5 2 3 apples
# 6 2 3 bananas
# 7 3 4 coffee
# 8 3 5 milkshake
# 9 3 6 chocolate
# 10 3 6 strawberry
# 11 3 6 strawberry
.
.
이제 팬더 transform
기능을 사용하여 고객 당 주문 수를 계산합니다.
# First, we define the function that will be applied per customer_id
count_number_of_orders = lambda x: len(x.unique())
# And now, we can tranform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that is called 'number_of_orders_per_cient'
orders_df # Take the original dataframe
.groupby(['customer_id'])['order_id'] # Create a seperate group for each customer_id & select the order_id
.transform(count_number_of_orders)) # Apply the function to each group seperatly
# Inspecting the results ...
print(orders_df)
# customer_id order_id item number_of_orders_per_cient
# 0 1 1 apples 2
# 1 1 1 chocolate 2
# 2 1 1 chocolate 2
# 3 1 2 coffee 2
# 4 1 2 coffee 2
# 5 2 3 apples 1
# 6 2 3 bananas 1
# 7 3 4 coffee 3
# 8 3 5 milkshake 3
# 9 3 6 chocolate 3
# 10 3 6 strawberry 3
# 11 3 6 strawberry 3
그룹당 여러 결과
그룹당 하위 계산을 반환하는 transform
함수 사용
앞의 예에서는 클라이언트 당 하나의 결과가있었습니다. 그러나 그룹에 대해 다른 값을 반환하는 함수를 적용 할 수도 있습니다.
# Create a dummy dataframe
orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples',
'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
# Let's try to see if the items were ordered more than once in each orders
# First, we define a fuction that will be applied per group
def multiple_items_per_order(_items):
# Apply .duplicated, which will return True is the item occurs more than once.
multiple_item_bool = _items.duplicated(keep=False)
return(multiple_item_bool)
# Then, we transform each group according to the defined function
orders_df['item_duplicated_per_order'] = ( # Put the results into a new column
orders_df # Take the orders dataframe
.groupby(['order_id'])['item'] # Create a seperate group for each order_id & select the item
.transform(multiple_items_per_order)) # Apply the defined function to each group separately
# Inspecting the results ...
print(orders_df)
# customer_id order_id item item_duplicated_per_order
# 0 1 1 apples False
# 1 1 1 chocolate True
# 2 1 1 chocolate True
# 3 1 2 coffee True
# 4 1 2 coffee True
# 5 2 3 apples False
# 6 2 3 bananas False
# 7 3 4 coffee False
# 8 3 5 milkshake False
# 9 3 6 chocolate False
# 10 3 6 strawberry True
# 11 3 6 strawberry True
Modified text is an extract of the original Stack Overflow Documentation
아래 라이선스 CC BY-SA 3.0
와 제휴하지 않음 Stack Overflow