python 对dataframe分组求平均 dataframe求平均值

转载

mob6454cc6658d1 2023-09-15 23:27:23

文章标签 python 最小值自定义浮点数 文章分类 Python 后端开发

DataFrame中常见的方法：

基本数学操作

较为复杂功能：分组统计

pandas.DataFrame.count

功能

参数

DataFrame中常见的方法：

基本数学操作

df.count() #非空元素计算
df.min() #最小值
df.max() #最大值
df.idxmin() #最小值的位置，类似于R中的which.min函数
df.idxmax() #最大值的位置，类似于R中的which.max函数
df.quantile(0.1) #10%分位数
df.sum() #求和
df.mean() #均值
df.median() #中位数
df.mode() #众数
df.var() #方差
df.std() #标准差
df.mad() #平均绝对偏差
df.skew() #偏度
df.kurt() #峰度
df.describe() #一次性输出多个描述性统计指标

较为复杂功能：分组统计

df.groupby('Person').sum()

pandas.DataFrame.count

功能

计数

参数

1、轴:{0或' index '， 1或' columns '}，默认为0

如果为每个列生成0或' index '计数。如果为每一行生成1个或“列”计数。

2、级别:int或str，可选

如果轴是一个多索引(层次结构)，则沿着特定的级别计数，折叠成一个dataframe。str指定级别名称。

3、numeric_only:布尔值，默认为False

只包含浮点数、int或boolean数据。

给出的例子

1、构建一个DataFrame

df = pd.DataFrame({"Person":
 ...                    ["John", "Myla", "Lewis", "John", "Myla"],
 ...                    "Age": [24., np.nan, 21., 33, 26],
 ...                    "Single": [False, True, True, True, False]})
 >>> df
    Person   Age  Single
 0    John  24.0   False
 1    Myla   NaN    True
 2   Lewis  21.0    True
 3    John  33.0    True
 4    Myla  26.0   False

2、统计NA

>>> df.count()
 Person    5
 Age       4
 Single    5
 dtype: int64

3、针对每一行，进行统计

df.count(axis='columns')
 0    3
 1    2
 2    3
 3    3
 4    3
 dtype: int64

注意：这里axis='columns'表示按“列”操作，相当于axis=0；如果axis=1,对每一行进行操作

4、计算多索引的一个级别

>>> df.set_index(["Person", "Single"]).count(level="Person")
             Age
 Person
 John      2
 Lewis     1
 Myla      1

set_index相关补充

DataFrame可以通过set_index方法，可以使用现有列设置单索引和复合索引

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

参数：

keys：label or array-like or list of labels/arrays，这个是需要设置为索引的列名，可以是单个列名，或者是多个列名
drop：bool, default True，删除要用作新索引的列
append：bool, default False，添加新索引
inplace：bool, default False，是否要覆盖数据集
verify_integrity：bool, default False，检查新索引是否重复。否则，将检查推迟到必要时进行。设置为False将改善此方法的性能

注意：drop为False，inplace为True时，索引将会还原为列

官网例子：

df = pd.DataFrame({'month': [1, 4, 7, 10],
                   'year': [2012, 2014, 2013, 2014],
                   'sale': [55, 40, 84, 31]})


#设置单个列作为索引
df.set_index('month')
'''
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31
'''
#设置复合索引
df.set_index(['year', 'month'])
'''
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31
'''
#自定义索引和某列作为复合索引
df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
'''
         month  sale
   year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31
'''
#自定义索引
s = pd.Series([1, 2, 3, 4])
df.set_index([s, s**2])
'''
      month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31
'''

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。