Python Language
판다 변환 : 그룹에 대한 연산을 수행하고 결과를 연결합니다.

Django Flask HTML GNU/Linux matplotlib MySQL opencv pandas Regular Expressions tensorflow

단순 변환

먼저 더미 데이터 프레임을 만듭니다.

고객은 n 개의 주문을 가질 수 있고, 주문은 m 개의 항목을 가질 수 있으며, 항목은 더 많은 주문을받을 수 있다고 가정합니다

orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 
                     'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']

# And this is how the dataframe looks like:
print(orders_df)
#     customer_id  order_id        item
# 0             1         1      apples
# 1             1         1   chocolate
# 2             1         1   chocolate
# 3             1         2      coffee
# 4             1         2      coffee
# 5             2         3      apples
# 6             2         3     bananas
# 7             3         4      coffee
# 8             3         5   milkshake
# 9             3         6   chocolate
# 10            3         6  strawberry
# 11            3         6  strawberry

.
.

이제 팬더 `transform` 기능을 사용하여 고객 당 주문 수를 계산합니다.

# First, we define the function that will be applied per customer_id 
count_number_of_orders = lambda x: len(x.unique())

# And now, we can tranform each group using the logic defined above
orders_df['number_of_orders_per_cient'] = (               # Put the results into a new column that is called 'number_of_orders_per_cient'
                     orders_df                            # Take the original dataframe
                    .groupby(['customer_id'])['order_id'] # Create a seperate group for each customer_id & select the order_id
                    .transform(count_number_of_orders))   # Apply the function to each group seperatly 

# Inspecting the results ... 
print(orders_df)
#     customer_id  order_id        item  number_of_orders_per_cient
# 0             1         1      apples                           2
# 1             1         1   chocolate                           2
# 2             1         1   chocolate                           2
# 3             1         2      coffee                           2
# 4             1         2      coffee                           2
# 5             2         3      apples                           1
# 6             2         3     bananas                           1
# 7             3         4      coffee                           3
# 8             3         5   milkshake                           3
# 9             3         6   chocolate                           3
# 10            3         6  strawberry                           3
# 11            3         6  strawberry                           3

그룹당 여러 결과

그룹당 하위 계산을 반환하는 `transform` 함수 사용

앞의 예에서는 클라이언트 당 하나의 결과가있었습니다. 그러나 그룹에 대해 다른 값을 반환하는 함수를 적용 할 수도 있습니다.

# Create a dummy dataframe
orders_df = pd.DataFrame()
orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]
orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]
orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 
                     'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']


# Let's try to see if the items were ordered more than once in each orders

# First, we define a fuction that will be applied per group
def multiple_items_per_order(_items):
    # Apply .duplicated, which will return True is the item occurs more than once.
    multiple_item_bool = _items.duplicated(keep=False) 
    return(multiple_item_bool)

# Then, we transform each group according to the defined function
orders_df['item_duplicated_per_order'] = (                    # Put the results into a new column 
                        orders_df                             # Take the orders dataframe
                        .groupby(['order_id'])['item']        # Create a seperate group for each order_id & select the item
                        .transform(multiple_items_per_order)) # Apply the defined function to each group separately

# Inspecting the results ... 
print(orders_df)
#     customer_id  order_id        item  item_duplicated_per_order
# 0             1         1      apples                      False
# 1             1         1   chocolate                       True
# 2             1         1   chocolate                       True
# 3             1         2      coffee                       True
# 4             1         2      coffee                       True
# 5             2         3      apples                      False
# 6             2         3     bananas                      False
# 7             3         4      coffee                      False
# 8             3         5   milkshake                      False
# 9             3         6   chocolate                      False
# 10            3         6  strawberry                       True
# 11            3         6  strawberry                       True

Modified text is an extract of the original Stack Overflow Documentation

아래 라이선스 CC BY-SA 3.0

와 제휴하지 않음 Stack Overflow

단순 변환

먼저 더미 데이터 프레임을 만듭니다.

이제 팬더 transform 기능을 사용하여 고객 당 주문 수를 계산합니다.

그룹당 여러 결과

그룹당 하위 계산을 반환하는 transform 함수 사용

이제 팬더 `transform` 기능을 사용하여 고객 당 주문 수를 계산합니다.

그룹당 하위 계산을 반환하는 `transform` 함수 사용