python_透视表和交叉表

原创

六mo神剑 2022-07-18 15:01:00 博主文章分类：Python ©著作权

文章标签 python 聚合函数 edn 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者六mo神剑的原创作品，请联系作者获取转载授权，否则将追究法律责任

Pivot Tables and Cross-Tabulation

# Python和pandas中，可以通过本章所介绍的groupby功能以及
# （能够利⽤层次化索引的）重塑运算制作透视表。 DataFrame有
# ⼀个pivot_table⽅法，此外还有⼀个顶级的pandas.pivot_table函
# 数。除能为groupby提供便利之外， pivot_table还可以添加分项
# ⼩计，也叫做margins。
# 10.4 透视表和交叉表
# 透视表（pivot table）是各种电⼦表格程序和其他数据分析软件
# 中⼀种常⻅的数据汇总⼯具。它根据⼀个或多个键对数据进⾏聚
# 合，并根据⾏和列上的分组键将数据分配到各个矩形区域中。在

# Python和pandas中，可以通过本章所介绍的groupby功能以及
# （能够利⽤层次化索引的）重塑运算制作透视表。 DataFrame有
# ⼀个pivot_table⽅法，此外还有⼀个顶级的pandas.pivot_table函
# 数。除能为groupby提供便利之外， pivot_table还可以添加分项
# ⼩计，也叫做margins。
tips.pivot_table(index=['day', 'smoker'])
tips.pivot_table(['tip_pct', 'size'], index=['time', 'day'],
                 columns='smoker')
tips.pivot_table(['tip_pct', 'size'], index=['time', 'day'],
                 columns='smoker', margins=True)
这⾥， All值为平均数：不单独考虑烟⺠与⾮烟⺠（All列），不单
# 独考虑⾏分组两个级别中的任何单项（All⾏）。
# 要使⽤其他的聚合函数，将其传给aggfunc即可。例如，使⽤
# count或len可以得到有关分组⼤⼩的交叉表（计数或频率）：
# 这⾥， All值为平均数：不单独考虑烟⺠与⾮烟⺠（All列），不单
# 独考虑⾏分组两个级别中的任何单项（All⾏）。
# 要使⽤其他的聚合函数，将其传给aggfunc即可。例如，使⽤
# count或len可以得到有关分组⼤⼩的交叉表（计数或频率）：
tips.pivot_table('tip_pct', index=['time', 'smoker'], columns='day',
                 aggfunc=len, margins=True)
day    Fri    Sat    Sun    Thur    All
time    smoker                    
Dinner    No    3.0    45.0    57.0    1.0    106.0
Yes    9.0    42.0    19.0    NaN    70.0
Lunch    No    1.0    NaN    NaN    44.0    45.0
Yes    6.0    NaN    NaN    17.0    23.0
All        19.0    87.0    76.0    62.0    244.0
tips.pivot_table('tip_pct', index=['time', 'size', 'smoker'],
                 columns='day', aggfunc='mean', fill_value=0)
Cross-Tabulations: Crosstab
交叉表： crosstab
# 交叉表（cross-tabulation，简称crosstab）是⼀种⽤于计算分组
# 频率的特殊透视表。看下⾯的例⼦：
# 交叉表： crosstab
# 交叉表（cross-tabulation，简称crosstab）是⼀种⽤于计算分组
# 频率的特殊透视表。看下⾯的例⼦：
from io import StringIO
data = """\
Sample  Nationality  Handedness
1   USA  Right-handed
2   Japan    Left-handed
3   USA  Right-handed
4   Japan    Right-handed
5   Japan    Left-handed
6   Japan    Right-handed
7   USA  Right-handed
8   USA  Left-handed
9   Japan    Right-handed
10  USA  Right-handed"""
data = pd.read_table(StringIO(data), sep='\s+')
data
pd.crosstab(data.Nationality, data.Handedness, margins=True)
pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)
pd.options.display.max_rows =