python数学建模资料用python做数学建模

转载

小咪咪 2023-08-21 11:46:09

文章标签 python数学建模资料 python pandas 数据分析代码块 文章分类 Python 后端开发

Pandas读书笔记-数据分析

①Serises

1基本用法1
2基本用法2(对索引进行修改)
3传入字典
4判空isnull()
5Series本身的属性-name

②DataFrame部分

1构建一个DataFrame
2修改index和columns
3列操作

①增加新列
②删除列
③可以输入给DataFrame的数据

4index对象

index的方法和属性

5reindex

1删除指定轴上的值
2索引
3loc方法和iloc方法
4算术相加的一些对齐问题
5DataFrame+Series
6排序和排名
7统计方法
8唯一值、值计数以及成员资格

2021.8.6笔记
对于书上的一些内容类如函数映射这些有所删减

①Serises

1基本用法1

代码块

# 例1
obj = Series([4,-8,2,3])
print(obj.values)
print(obj.index)
print(obj)

运行结果

[ 4 -8 2 3] RangeIndex(start=0, stop=4, step=1) 0 4 1 -8 2 2 3 3 dtype: int64

2基本用法2(对索引进行修改)

代码块

# 对索引进行修改
obj2 = Series([1,3,-5,2],index = ['a','b','c','d'])
# 或者是obj2.index = ['a','b','c','d']
print(obj2['a'])
print(obj2[obj2 > 0])
print(obj2)
obj2.index

运行结果

1 a 1 b 3 d 2 dtype: int64 a 1 b 3 c -5 d 2 dtype: int64
Out[8]:
Index(['a', 'b', 'c', 'd'], dtype='object')

3传入字典

如果只传入字典，那么直接按照字典顺序排列。同时传入字典和索引，按照索引来，索引中在字典里面找不到的就用NaN来指示。

代码块

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj1 = Series(sdata)
print(obj1)
obj2 = Series(sdata,index = states)
print(obj2)

运行结果

Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64 California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64

4判空isnull()

检测缺失数据NaN之类的。

代码块

# 借上一条的数据obj2
obj2.isnull()

运行结果

California True Ohio False Oregon False Texas False dtype: bool

5Series本身的属性-name

代码块

obj2.name = 'Location Price'
obj2.index.name = 'locationName'
obj2

运行结果

locationName California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 Name: Location Price, dtype: float64

②DataFrame部分

1构建一个DataFrame

DataFrame会自动上索引，根据字典的排序

代码块

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
frame

运行结果

stateyearpop0Ohio20001.51Ohio20011.72Ohio20023.63Nevada20012.44Nevada20022.95Nevada20033.2

2修改index和columns

修改columns代码

pd.DataFrame(data, columns=['year', 'state', 'pop'])

运行结果

	year	state	pop
0	2000	Ohio	1.5
1	2001	Ohio	1.7
2	2002	Ohio	3.6
3	2001	Nevada	2.4
4	2002	Nevada	2.9
5	2003	Nevada	3.2

修改index和columns

columns在data里面找不到的东西就用NaN来代替

frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2

运行结果

yearstatepopdebtone2000Ohio1.5NaNtwo2001Ohio1.7NaNthree2002Ohio3.6NaNfour2001Nevada2.4NaNfive2002Nevada2.9NaNsix2003Nevada3.2NaN

3列操作

①增加新列

为不存在的列赋值会创建出一个新列

代码块

# 直接增加一个新列，长度必须与原来保持一致
frame2['new'] = 99
frame2['new2'] = np.arange(3,9,1)

# 通过Series来增加，通过索引来指定给DataFrame赋值
val = Series([1,3,-2],index = ['one','three','four'])
frame2['new3_Series'] = val

# 对原来DataFrmae进行条件判断结果赋值给新列
frame2['eastern'] = frame2.state == 'Ohio'
frame2

运行结果

yearstatepopdebtnewnew2new3_Serieseasternone2000Ohio1.5NaN9931.0Truetwo2001Ohio1.7NaN994NaNTruethree2002Ohio3.6NaN9953.0Truefour2001Nevada2.4NaN996-2.0Falsefive2002Nevada2.9NaN997NaNFalsesix2003Nevada3.2NaN998NaNFalse

②删除列

代码块

del frame2['new2']
frame2

运行结果

yearstatepopdebtnewnew3_Serieseasternone2000Ohio1.5NaN991.0Truetwo2001Ohio1.7NaN99NaNTruethree2002Ohio3.6NaN993.0Truefour2001Nevada2.4NaN99-2.0Falsefive2002Nevada2.9NaN99NaNFalsesix2003Nevada3.2NaN99NaNFalse

③可以输入给DataFrame的数据

python数学建模资料用python做数学建模_pandas

4index对象

Index对象是不可变的，因此用户不能对其进行修改。
pandas的索引对象负责管理轴标签和其他元数据（比如轴名称等）。构建Series或DataFrame时，所用到的任何数组或其他序列的标签都会被转换成一个Index。

index的方法和属性

python数学建模资料用python做数学建模_python_02

5reindex

Series的reindex将会根据新索引进行重排。如果某个索引值当前不存在，就引入缺失值。

代码块

obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
print(obj)
# 如果某个索引值当前不存在，就引入缺失值
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
print(obj2)

# 进行插值处理
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
obj4 = obj3.reindex(range(6), method='ffill')
print(obj4)

# 修改行列
frame = pd.DataFrame(np.arange(9).reshape((3, 3)),index=['a', 'c', 'd'],columns=['Ohio', 'Texas', 'California'])
frame2 = frame.reindex(['a', 'b', 'c', 'd'])

# 列修改
states = ['Texas', 'Utah', 'California']
frame3 = frame.reindex(columns=states)
frame
frame2
frame3

运行结果

d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object
frame
OhioTexasCaliforniaa012c345d678
frame2
OhioTexasCaliforniaa0.01.02.0bNaNNaNNaNc3.04.05.0d6.07.08.0
frame3
TexasUtahCaliforniaa1NaN2c4NaN5d7NaN8

python数学建模资料用python做数学建模_python数学建模资料_03

reindex的(插值)method选项

参数	说明
ffill或pad	向前填充(或搬运)值
bfill或backfill	向后填充(或搬运)值

③

1删除指定轴上的值

Series

obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj.drop(['c','d'])

a 0.0 b 1.0 e 4.0 dtype: float64

DataFrame

# DataFrame默认删除
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['Ohio', 'Colorado', 'Utah', 'New York'],columns=['one', 'two', 'three', 'four'])
data.drop(['Colorado', 'Ohio'])

# 指定删除轴
data.drop(['two', 'four'], axis='columns')
# 或者是  data.drop('two', axis=1)

onetwothreefourUtah891011New York12131415onethreeOhio02Colorado46Utah810New York1214

2索引

代码块

obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
# 它可以使用自己的index来索引
print(obj)
print("obj['b']",obj['b'])
print("obj[1]",obj[1])
print("obj[2:4]",obj[2:4])
print("obj[['b', 'a', 'd']]",obj[['b', 'a', 'd']])
# 值可以用来选取
print("obj[[1, 3]]",obj[[1, 3]])
print("obj[obj < 2]",obj[obj < 2])

运行结果

a 0.0 b 1.0 c 2.0 d 3.0 dtype: float64 obj['b'] 1.0 obj[1] 1.0 obj[2:4] c 2.0 d 3.0 dtype: float64 obj[['b', 'a', 'd']] b 1.0 a 0.0 d 3.0 dtype: float64 obj[[1, 3]] b 1.0 d 3.0 dtype: float64

3loc方法和iloc方法

在新版本中ix已经被删掉了。所以就不要再根据书本中的内容去学习那块内容了。还好我是看书的时候同时实操一遍的，差点就做了无用功。

两者的区别

①loc是用名字来索引的

②iloc是用下标来索引的(index location)

data = pd.DataFrame(np.arange(16).reshape((4, 4)),
   .....:                     index=['Ohio', 'Colorado', 'Utah', 'New York'],
   .....:                     columns=['one', 'two', 'three', 'four'])

print(data.loc[['Utah','Ohio'],'one'])
print(data.iloc[[0,2],[1,3]])

Utah 8 Ohio 0 Name: one, dtype: int32 two four Ohio 1 3 Utah 9 11

print(data)

# 一些转化
data.columns	# 获取所有的列
data.columns.get_loc('two')	# 获取列名为two的下标
data.iloc[-1, data.columns.get_loc('two')]	# -1代表最后一行，把刚刚转化成下标的拿过来放这里

one two three four Ohio 0 1 2 3 Colorado 4 5 6 7 Utah 8 9 10 11 New York 12 13 14 15Index(['one', 'two', 'three', 'four'], dtype='object')113

python数学建模资料用python做数学建模_数据分析_04

4算术相加的一些对齐问题

两个长度不同的DataFrame对象相加。没有重叠的部分就会自动赋值为NaN

df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
   .....:                    index=['Ohio', 'Texas', 'Colorado'])
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
   .....:                    index=['Utah', 'Ohio', 'Texas', 'Oregon'])
print(df1)
print(df2)
df1+df2

b c d Ohio 0.0 1.0 2.0 Texas 3.0 4.0 5.0 Colorado 6.0 7.0 8.0 b d e Utah 0.0 1.0 2.0 Ohio 3.0 4.0 5.0 Texas 6.0 7.0 8.0 Oregon 9.0 10.0 11.0b c d Ohio 0.0 1.0 2.0 Texas 3.0 4.0 5.0 Colorado 6.0 7.0 8.0 b d e Utah 0.0 1.0 2.0 Ohio 3.0 4.0 5.0 Texas 6.0 7.0 8.0 Oregon 9.0 10.0 11.0
Out[100]:
bcdeColoradoNaNNaNNaNNaNOhio3.0NaN6.0NaNOregonNaNNaNNaNNaNTexas9.0NaN12.0NaNUtahNaNNaNNaNNaN

解决这个问题可以使用add的方法传入一个fill_value参数

df2.add(df1,fill_value=0)

Out[105]:
bcdeColorado6.07.08.0NaNOhio3.01.06.05.0Oregon9.0NaN10.011.0Texas9.04.012.08.0Utah0.0NaN1.02.0

tips这里fill_value只能对一者有一者没有的进行填充，对两者都没有的依然是NaN

如果想要把NaN都处理成0可以使用df2[np.isnan(df2)] = 0

python数学建模资料用python做数学建模_python_05

5DataFrame+Series

会采用向下广播的方法，也就是一直向下搜索知道知道匹配的index，如果没找到在最后新增一行。

frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
   .....:                      columns=list('bde'),
   .....:                      index=['Utah', 'Ohio', 'Texas', 'Oregon'])
series = frame.iloc[0]
print(frame)
print(series)
frame + series

b d e Utah 0.0 1.0 2.0 Ohio 3.0 4.0 5.0 Texas 6.0 7.0 8.0 Oregon 9.0 10.0 11.0 b 0.0 d 1.0 e 2.0 Name: Utah, dtype: float64
Out[114]:
bdeUtah0.02.04.0Ohio3.05.07.0Texas6.08.010.0Oregon9.011.013.0

6排序和排名

对轴进行排序

frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
   .....:                      index=['three', 'one'],
   .....:                      columns=['d', 'a', 'b', 'c'])
print(frame)
frame = frame.sort_index(axis=0, ascending=False)
print(frame)

d a b c three 0 1 2 3 one 4 5 6 7 d a b c one 4 5 6 7 three 0 1 2 3

对值进行排序

frame.sort_values(by='d')
print(frame)
obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])
obj.sort_values()

d a b c three 0 1 2 3 one 4 5 6 7 d a b c one 4 5 6 7 three 0 1 2 3 d a b c one 4 5 6 7 three 0 1 2 3
Out[124]:
4 -3.0 5 2.0 0 4.0 2 7.0 1 NaN 3 NaN dtype: float64

rank

obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
obj.rank(method='first')

0 6.0 1 1.0 2 7.0 3 4.0 4 3.0 5 2.0 6 5.0 dtype: float64

python数学建模资料用python做数学建模_数据分析_06

7统计方法

df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
   .....:                    [np.nan, np.nan], [0.75, -1.3]],
   .....:                   index=['a', 'b', 'c', 'd'],
   .....:                   columns=['one', 'two'])
print(df)
df.mean(axis='columns', skipna=True)

one two a 1.40 NaN b 7.10 -4.5 c NaN NaN d 0.75 -1.3
Out[128]:
a 1.400 b 1.300 c NaN d -0.275 dtype: float64

python数学建模资料用python做数学建模_pandas_07

8唯一值、值计数以及成员资格

python数学建模资料用python做数学建模_数据分析_08

obj = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
uniques = obj.unique()
counts = pd.value_counts(obj.values, sort=False)
mask = obj.isin(['b', 'c'])
print("uniques",uniques)
print("counts",counts)
print("mask",mask)
obj[mask]

uniques ['c' 'a' 'd' 'b'] counts b 2 d 1 a 3 c 3 dtype: int64 mask 0 True 1 False 2 False 3 False 4 False 5 True 6 True 7 True 8 True dtype: bool
Out[130]:
0 c 5 b 6 b 7 c 8 c dtype: object

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：python 调度框架 python调度器

下一篇：redis set放入重复数据 redis set可以重复吗

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯