python分类频数图 python 频数分布

转载

mob6454cc63081f 2024-02-05 21:11:45

文章标签 python分类频数图概率论直方图 Text 数理统计 文章分类 Python 后端开发

《概率论与数理统计》作业一，python画频率分布表

2：
5：
6：

2：
3:

3:
4:
5：
8:
10:
13:
24:
28:

频率分布表画图函数(按照分割区间大小/按照分组

(1)按照分组数
（2）按照分割区间大小

5.1

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WSLzZT8S-1601898549192)(C:\Users\10539\Desktop\课程\数理统计\作业\1\5.1.PNG)]$

2：

总体：全体成年男子的抽烟情况
样本：50个同学调查到的全部5000名男子
总体分布：Bernoulli分布

5：

总体：某场生产的所有电容器
样本：抽出的n件产品
样本分布：
假设每个样本的分布iid,且都服从指数分布
$python分类频数图 python 频数分布_python分类频数图_02$

6：

我认为这个结论是不合理的，因为总体是所有毕业生，但是样本是返校毕业生，工资低混的不好的毕业生不太愿意返校，抽样不随机。毕业生平均工资低于5万美金。

平均工资，平均年龄等样本数据一般有偏，样本均值不适合代表平均水平。

5.2

python分类频数图 python 频数分布_概率论_03

python分类频数图 python 频数分布_概率论_04

2：

3+4+8+3+2=20

分布函数要求右连续
$python分类频数图 python 频数分布_Text_05$

3:

#顺序排列
import numpy as np
import pandas as pd
t2=[909,1086,1120,999,1320,1091,1071,1081,
    1130,1336,967,1572,825,914,992,1232,950,
    775,1203,1025,1096,808,1224,1044,871,1164,971,950,866,738]
t2=np.sort(t2)#排序
print(t2.shape,t2,(np.max(t2)-np.min(t2))/6)

(30,) [ 738  775  808  825  866  871  909  914  950  950  967  971  992  999
 1025 1044 1071 1081 1086 1091 1096 1120 1130 1164 1203 1224 1232 1320
 1336 1572] 139.0

#频率分布表
#取间隔为140
t22=pd.cut(t2,6, labels=[u"(737,877]",u"(877,1017]",u"(1017,1157]",u"(1157,1297]",u"(1297,1437]",u"(1437,1577]"])
t22=t22.value_counts()
t22=pd.DataFrame(t22)
t22['分组区间'] = t22.index
t22.columns = ['频数','分组区间']
t22.reset_index(drop=True, inplace=True)  
t22['组中值'] =[807,947,1087,1227,1367,1507]
t22['频率']=t22['频数']/30
##计算累计频率
ljpl=[0]
for i in t22['频率']:
    ljpl.append(i+ljpl[-1])
t22['累计频率']=ljpl[1:]
t22=t22[['分组区间','组中值','频数','频率','累计频率']]
t22

	分组区间	组中值	频数	频率	累计频率
0	(737,877]	807	6	0.200000	0.200000
1	(877,1017]	947	8	0.266667	0.466667
2	(1017,1157]	1087	9	0.300000	0.766667
3	(1157,1297]	1227	4	0.133333	0.900000
4	(1297,1437]	1367	2	0.066667	0.966667
5	(1437,1577]	1507	1	0.033333	1.000000

#画直方图
import matplotlib.pyplot as plt  
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False

plt.hist(t2, bins=6)
plt.title('第三题直方图')

Text(0.5, 1.0, '第三题直方图')

python分类频数图 python 频数分布_直方图_06

5:

t5=[5954,5022,14667,6582,6870,1840,2662,4508,
   1208,3852,618,3008,1268,1978,7963,2048,
   3077,993,353,14263,1714,11127,6926,2047,
   714,5923,6006,14267,1697,13867,4001,2280,
   1223,12579,13588,7315,4538,13304,1615,8612]
t5=np.sort(t5)
print(t5.shape,t5)

(40,) [  353   618   714   993  1208  1223  1268  1615  1697  1714  1840  1978
  2047  2048  2280  2662  3008  3077  3852  4001  4508  4538  5022  5923
  5954  6006  6582  6870  6926  7315  7963  8612 11127 12579 13304 13588
 13867 14263 14267 14667]

(14667-353)/1700

8.42

ran=[]
for i in range(10):ran.append(352+i*1700)

lable=[]
for i in range(9):
    lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
lable

['(352,2052]',
 '(2052,3752]',
 '(3752,5452]',
 '(5452,7152]',
 '(7152,8852]',
 '(8852,10552]',
 '(10552,12252]',
 '(12252,13952]',
 '(13952,15652]']

t55=pd.cut(t5,ran, labels=lable)
t55=t55.value_counts()
t55=pd.DataFrame(t55)
t55['分组区间'] = t55.index
t55.columns = ['频数','分组区间']
t55.reset_index(drop=True, inplace=True)  
#组中值
zzz=[]
for i in range(9):
    zzz.append(ran[i]+1700/2)
t55['组中值'] =zzz
t55['频率']=t55['频数']/40
##计算累计频率
ljpl=[0]
for i in t55['频率']:
    ljpl.append(i+ljpl[-1])
t55['累计频率']=ljpl[1:]
t55=t55[['分组区间','组中值','频数','频率','累计频率']]
t55

	分组区间	组中值	频数	频率	累计频率
0	(352,2052]	1202.0	14	0.350	0.350
1	(2052,3752]	2902.0	4	0.100	0.450
2	(3752,5452]	4602.0	5	0.125	0.575
3	(5452,7152]	6302.0	6	0.150	0.725
4	(7152,8852]	8002.0	3	0.075	0.800
5	(8852,10552]	9702.0	0	0.000	0.800
6	(10552,12252]	11402.0	1	0.025	0.825
7	(12252,13952]	13102.0	4	0.100	0.925
8	(13952,15652]	14802.0	3	0.075	1.000

plt.hist(t5, bins=ran)
plt.title('第五题直方图')

Text(0.5, 1.0, '第五题直方图')

python分类频数图 python 频数分布_python分类频数图_07

5.3

python分类频数图 python 频数分布_数理统计_08

python分类频数图 python 频数分布_数理统计_09

python分类频数图 python 频数分布_数理统计_10

python分类频数图 python 频数分布_数理统计_11

python分类频数图 python 频数分布_直方图_12

3:

$python分类频数图 python 频数分布_Text_13$

$python分类频数图 python 频数分布_Text_14$

4:

pf:
$python分类频数图 python 频数分布_python分类频数图_15$
左右同时除以n+1即得所证

pf:
$ns_{n+1}^2-(n-1)s_{n}2=\sum_{i=1}^{{n+1}(x_i-\bar{x}_{n+1})}2-\sum_{i=1}^{{n}(x_i-\bar{x}_n)}2
=x_{n+1}^{2-2(\sum_{i=1}}{n+1}x_i \bar{x}{n+1}-\sum{i=1}^{n}x_i \bar{x}{n})+((n+1)\bar{x}{n+1}^2-n\bar{x}_n2)=x_{n+1}^{2-2[x_{n+1}\bar{x}_{n+1}-\sum_{i=1}}{n}x_i(\bar{x}{n+1}-\bar{x}{n})]+((n+1)\bar{x}{n+1}^2-n\bar{x}_n2)=x{n+1}^{2-2[x_{n+1}\bar{x}_{n+1}-\frac{n}{n+1}(x_{n+1}-\bar{x}_n)\bar{x}_n]+((n+1)\bar{x}_{n+1}}2-n\bar{x}_n^2)
$
把 $python分类频数图 python 频数分布_python分类频数图_16$ 带入上一条证明中的 $python分类频数图 python 频数分布_python分类频数图_17$
可得 $python分类频数图 python 频数分布_直方图_18$
两边同时除以n即为所求

remark:这道题说明随着抽样样本的增加可逐次计算样本均值与方差

5：

pf:
$python分类频数图 python 频数分布_概率论_19$

其中
$python分类频数图 python 频数分布_概率论_20$ 表示容量为n的样本中的样本的取值
$python分类频数图 python 频数分布_数理统计_21$ 表示容量为m的样本中的样本的取值

pf:

$python分类频数图 python 频数分布_python分类频数图_22$

$python分类频数图 python 频数分布_python分类频数图_23$

$python分类频数图 python 频数分布_Text_24$

$python分类频数图 python 频数分布_python分类频数图_25$

由上式记得所求。

8:

$python分类频数图 python 频数分布_概率论_26$

$python分类频数图 python 频数分布_数理统计_27$

$python分类频数图 python 频数分布_概率论_28$

$python分类频数图 python 频数分布_直方图_29$

10:

$python分类频数图 python 频数分布_概率论_30$

13:

由正态分布的再生性
$python分类频数图 python 频数分布_直方图_31$
$python分类频数图 python 频数分布_Text_32$
记 $python分类频数图 python 频数分布_数理统计_33$ 为标准正态分布的分布函数
解 $python分类频数图 python 频数分布_直方图_34$ 得 $python分类频数图 python 频数分布_直方图_35$

24:

$python分类频数图 python 频数分布_Text_36$
$python分类频数图 python 频数分布_python分类频数图_37$

28:

(1)

pf:
$python分类频数图 python 频数分布_Text_38$
$python分类频数图 python 频数分布_直方图_39$
由 $python分类频数图 python 频数分布_Text_40$
从而 $python分类频数图 python 频数分布_直方图_41$
上述概率密度函数也是n个i.i.d.且服从 $python分类频数图 python 频数分布_Text_42$ 的随机变量的次序统计量的概率密度函数。

(2)

$python分类频数图 python 频数分布_数理统计_43$

$python分类频数图 python 频数分布_数理统计_44$
$python分类频数图 python 频数分布_数理统计_45$

(3)

协方差矩阵A，其中 $python分类频数图 python 频数分布_直方图_46$ ,从而只证明 $python分类频数图 python 频数分布_概率论_47$
先求 $python分类频数图 python 频数分布_直方图_48$ 的联合分布密度函数:
不妨设 $python分类频数图 python 频数分布_概率论_49$ ,则 $python分类频数图 python 频数分布_数理统计_50$

$python分类频数图 python 频数分布_Text_51$

$python分类频数图 python 频数分布_概率论_52$

$python分类频数图 python 频数分布_数理统计_53$

$python分类频数图 python 频数分布_概率论_54$

$python分类频数图 python 频数分布_概率论_55$

$python分类频数图 python 频数分布_Text_56$

对于上述积分:
$python分类频数图 python 频数分布_直方图_57$
关于 $python分类频数图 python 频数分布_概率论_58$ 的积分:把积分对应到某种概率分布，利用概率密度函数的正则性计算积分。

频率分布表画图函数(按照分割区间大小/按照分组

(1)按照分组数

import numpy as np
import pandas as pd

def fredistable_zushu(t,n):#t是数组，n是组数
    t=np.sort(t)
    mi=np.min(t)
    ma=np.max(t)
    ran=[]
    #不需要分割区间为整数时:cut=(ma-mi)/n
    cut=int((ma-mi)/n)+1
    for i in range(n+1):
        ran.append(mi-1+i*cut)#ran.append(mi+i*cut)直接从最小值开始
    lable=[]
    for i in range(n):
        lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
    t1=pd.cut(t,ran, labels=lable)
    t1=t1.value_counts()
    t1=pd.DataFrame(t1)
    t1['分组区间'] = t1.index
    t1.columns = ['频数','分组区间']
    t1.reset_index(drop=True, inplace=True)  
    #组中值
    zzz=[]
    for i in range(n):
        zzz.append(ran[i]+float(cut)/2)
    t1['组中值'] =zzz
    t1['频率']=t1['频数']/np.shape(t)[0]
    ##计算累计频率
    ljpl=[0]
    for i in t1['频率']:
        ljpl.append(i+ljpl[-1])
    t1['累计频率']=ljpl[1:]
    t1=t1[['分组区间','组中值','频数','频率','累计频率']]
    return(t1)


t5=[5954,5022,14667,6582,6870,1840,2662,4508,
   1208,3852,618,3008,1268,1978,7963,2048,
   3077,993,353,14263,1714,11127,6926,2047,
   714,5923,6006,14267,1697,13867,4001,2280,
   1223,12579,13588,7315,4538,13304,1615,8612];
fredistable_zushu(t5,9)

	分组区间	组中值	频数	频率	累计频率
0	(352,1943]	1147.5	11	0.275	0.275
1	(1943,3534]	2738.5	7	0.175	0.450
2	(3534,5125]	4329.5	5	0.125	0.575
3	(5125,6716]	5920.5	4	0.100	0.675
4	(6716,8307]	7511.5	4	0.100	0.775
5	(8307,9898]	9102.5	1	0.025	0.800
6	(9898,11489]	10693.5	1	0.025	0.825
7	(11489,13080]	12284.5	1	0.025	0.850
8	(13080,14671]	13875.5	6	0.150	1.000

（2）按照分割区间大小

def fredistable_fenge(t,cut):#t是数组，cut是分割间隔
    t=np.sort(t)
    mi=np.min(t)
    ma=np.max(t)
    ran=[]
    n=int((ma-mi)/cut)+1
    for i in range(n+1):
        ran.append(mi-1+i*cut)#ran.append(mi+i*cut)直接从最小值开始
    lable=[]
    for i in range(n):
        lable.append('('+str(ran[i])+','+str(ran[i+1])+']')
    t1=pd.cut(t,ran, labels=lable)
    t1=t1.value_counts()
    t1=pd.DataFrame(t1)
    t1['分组区间'] = t1.index
    t1.columns = ['频数','分组区间']
    t1.reset_index(drop=True, inplace=True)  
    #组中值
    zzz=[]
    for i in range(n):
        zzz.append(ran[i]+float(cut)/2)
    t1['组中值'] =zzz
    t1['频率']=t1['频数']/np.shape(t)[0]
    ##计算累计频率
    ljpl=[0]
    for i in t1['频率']:
        ljpl.append(i+ljpl[-1])
    t1['累计频率']=ljpl[1:]
    t1=t1[['分组区间','组中值','频数','频率','累计频率']]
    return(t1)

fredistable_fenge(t5,1700)

	分组区间	组中值	频数	频率	累计频率
0	(352,2052]	1202.0	14	0.350	0.350
1	(2052,3752]	2902.0	4	0.100	0.450
2	(3752,5452]	4602.0	5	0.125	0.575
3	(5452,7152]	6302.0	6	0.150	0.725
4	(7152,8852]	8002.0	3	0.075	0.800
5	(8852,10552]	9702.0	0	0.000	0.800
6	(10552,12252]	11402.0	1	0.025	0.825
7	(12252,13952]	13102.0	4	0.100	0.925
8	(13952,15652]	14802.0	3	0.075	1.000

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：使用java实现ftp服务器 java搭建ftp服务器

下一篇：python 创建list放入dataframe list写入dataframe

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯