用Python怎么对一个表格进行评估

转载

GhostLover 2024-09-14 09:47:52

文章标签 用Python怎么对一个表格进行评估 python matplotlib 数据 excel表格 文章分类 Python 后端开发

文章目录

第一步：把excel表格读进来
第二步：数据切割
第三步：统计各项数据
第四步：matplotlib画图
完整python代码

需要分析的Excel表格是一张

2021年华为杯数学建模E题的获奖名单。需要分析出各个奖项在每个学校的分布。下面是表格的一部分。

用Python怎么对一个表格进行评估_python

第一步：把excel表格读进来

sheet = pd.read_excel("2021E.xls", sheet_name=0)

这个函数依赖xlrd包，pands内部会引用这个包。第一个参数是文件路径，我用的是相对路径也可以是绝对路径。还有几个比较重要的参数：

参数	含义
sheet_name	选择哪个sheet读入，默认是第0个sheet
header	指定哪一行作为列名，默认第0行
names	自定义列名，比如names = [‘xxx’, ‘xxx’ , ‘xxx’]
index_col	指定哪一列作为行索引

可以看下sheet的数据类型，打印的结果是：
(2704, 10)
<class ‘pandas.core.frame.DataFrame’>

说明Excel中有2704行，10列数据，数据类型是DataFrame对象。

print(sheet.shape)
print(type(sheet))

第二步：数据切割

需要将一等奖，二等奖，三等奖分割出来。
分割一等奖的数据：从打印结果可以看出一等奖有31个。

sheet1 = sheet.loc[sheet['奖项'] == '一等奖']
print(sheet1.shape)

分割二等奖的数据：有336个

sheet2 = sheet.loc[sheet['奖项'] == '二等奖']
print(sheet2.shape)

分割三等奖的数据：有537个

sheet3 = sheet.loc[sheet['奖项'] == '三等奖']
print(sheet3.shape)

分割出所有获奖的：有904个

sheet4 = sheet.loc[(sheet['奖项'] == '一等奖'])|(sheet['奖项'] == '二等奖')|(sheet['奖项'] == '三等奖')]
print(sheet4.shape)

注：loc方法，按行列名称索引，iloc方法，按整数编号索引。

第三步：统计各项数据

output1 = sheet1['队长所在单位'].value_counts()
print(type(output1))

value_counts是pandas 统计数据频率的函数，支持Series类型和DataFrame类型，我这里sheet1[‘队长所在单位’]是Series类型。output1是Series类型数据。

第四步：matplotlib画图

下面的程序是画出二等奖获得数量前十的学校，并显示出获奖数量。

# 解决坐标轴刻度负号乱码
plt.rcParams['axes.unicode_minus'] = False
# 解决中文乱码问题
plt.rcParams['font.sans-serif'] = ['Simhei']

x = output2.index[:10]
y = output2.values[:10]
for i in range(10):
      plt.bar(x[i],y[i])
      plt.text(x[i], y[i], str(y[i]), ha="center", va="bottom")
plt.title("二等奖获奖情况")
plt.xlabel("大学")
plt.ylabel("数量")
plt.show()

用Python怎么对一个表格进行评估_matplotlib_02

完整python代码

import pandas as pd
import matplotlib.pyplot as plt

sheet = pd.read_excel("2021E.xls", sheet_name=0)
print(sheet.shape)
print(type(sheet))

sheet1 = sheet.loc[sheet['奖项'] == '一等奖']
print(sheet1.shape)


sheet2 = sheet.loc[sheet['奖项'] == '二等奖']
print(sheet2.shape)

sheet3 = sheet.loc[sheet['奖项'] == '三等奖']
print(sheet3.shape)

sheet4 = sheet.loc[(sheet['奖项'] == '一等奖')|(sheet['奖项'] == '二等奖')|(sheet['奖项'] == '三等奖')]
print(sheet4.shape)



output2 = sheet2['队长所在单位'].value_counts()
print(output2)
print(type(output2))

plt.rcParams['axes.unicode_minus'] = False
# 解决中文乱码问题
plt.rcParams['font.sans-serif'] = ['Simhei']

x = output2.index[:10]
y = output2.values[:10]
for i in range(10):
    plt.bar(x[i], y[i])
    plt.text(x[i], y[i], str(y[i]), ha="center", va="bottom")


plt.title("二等奖获奖情况")
plt.xlabel("大学")
plt.ylabel("数量")
plt.show()

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。