条形图

条形图通过垂直的或者水平的条形展示了类别型变量的分布

barplot(height) height是一个矩阵或者向量

接下来的示例中我们用到的数据是包含在vcd包中的Arthitis数据集,是关于风湿性关节炎新疗法的研究结果

在首次运行时,先安装这个包

> install.packages("vcd")

其中Arthitis数据集的Improved变量记录了安慰剂或治疗药物对病人的治疗结果

> library(vcd)
载入需要的程辑包:grid
> Arthritis
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
7 75 Treated Male 59 None
8 39 Treated Male 59 Marked
9 33 Treated Male 63 None
10 55 Treated Male 63 None
11 30 Treated Male 64 None
12 5 Treated Male 64 Some
13 63 Treated Male 69 None
14 83 Treated Male 70 Marked
15 66 Treated Female 23 None
16 40 Treated Female 32 None
17 6 Treated Female 37 Some
18 7 Treated Female 41 None
19 72 Treated Female 41 Marked
20 37 Treated Female 48 None
21 82 Treated Female 48 Marked
22 53 Treated Female 55 Marked
23 79 Treated Female 55 Marked
24 26 Treated Female 56 Marked
25 28 Treated Female 57 Marked
26 60 Treated Female 57 Marked
27 22 Treated Female 57 Marked
28 27 Treated Female 58 None
29 2 Treated Female 59 Marked
30 59 Treated Female 59 Marked
31 62 Treated Female 60 Marked
32 84 Treated Female 61 Marked
33 64 Treated Female 62 Some
34 34 Treated Female 62 Marked
35 58 Treated Female 66 Marked
36 13 Treated Female 67 Marked
37 61 Treated Female 68 Some
38 65 Treated Female 68 Marked
39 11 Treated Female 69 None
40 56 Treated Female 69 Some
41 43 Treated Female 70 Some
42 9 Placebo Male 37 None
43 14 Placebo Male 44 None
44 73 Placebo Male 50 None
45 74 Placebo Male 51 None
46 25 Placebo Male 52 None
47 18 Placebo Male 53 None
48 21 Placebo Male 59 None
49 52 Placebo Male 59 None
50 45 Placebo Male 62 None
51 41 Placebo Male 62 None
52 8 Placebo Male 63 Marked
53 80 Placebo Female 23 None
54 12 Placebo Female 30 None
55 29 Placebo Female 30 None
56 50 Placebo Female 31 Some
57 38 Placebo Female 32 None
58 35 Placebo Female 33 Marked
59 51 Placebo Female 37 None
60 54 Placebo Female 44 None
61 76 Placebo Female 45 None
62 16 Placebo Female 46 None
63 69 Placebo Female 48 None
64 31 Placebo Female 49 None
65 20 Placebo Female 51 None
66 68 Placebo Female 53 None
67 81 Placebo Female 54 None
68 4 Placebo Female 54 None
69 78 Placebo Female 54 Marked
70 70 Placebo Female 55 Marked
71 49 Placebo Female 57 None
72 10 Placebo Female 57 Some
73 47 Placebo Female 58 Some
74 44 Placebo Female 59 Some
75 24 Placebo Female 59 Marked
76 48 Placebo Female 61 None
77 19 Placebo Female 63 Some
78 3 Placebo Female 64 None
79 67 Placebo Female 65 Marked
80 32 Placebo Female 66 None
81 42 Placebo Female 66 None
82 15 Placebo Female 66 Some
83 71 Placebo Female 68 Some
84 1 Placebo Female 74 Marked
> counts
> counts
None Some Marked
42 14 28

先来绘制简单的条形图此时counts是一个向量

> barplot(counts,main = "Simple Bar Plot",xlab = "Improvement",ylab = "Frequency")
image.png
> barplot(counts,main = "Simple Bar Plot",xlab = "Improvement",ylab = "Frequency",horiz = TRUE) #水平条形图,默认为垂直,添加horiz=TRUE可以改为水平
image.png
如果counts是一个矩阵而不是向量,那么绘图结果将是堆砌条形图或分组条形图
> counts
> counts
Placebo Treated
None 29 13
Some 7 7
Marked 7 21
> barplot(counts,main = "Stacked Bar Plot",xlab = "Treatment",ylab = "Frequency",col = c("yellow","red","green"),legend=rownames(counts))
image.png
分组条形图(较为常用)
> barplot(counts,main = "Stacked Bar Plot",xlab = "Treatment",ylab = "Frequency",col = c("yellow","red","green"),legend=rownames(counts),beside = TRUE) #beside=TRUE 选定为分组条形图,默认为堆砌条形图
image.png

饼图

还有一类图可以用来表示类别型变量,但是相对于饼图,多数统计学家更推荐使用条形图或者点图,因为相对于面积,人们对于长度的判断往往更精确,所以这里就先不学习饼图了

直方图

直方图通过在X轴上将值域分割为一定数量的组,在Y轴上显示相应值的频数,展示了连续型变量的分布

hist(x),x是一个由数值组成的数值向量

在下面的示例中,我们用到的数据集是R语言的内置数据集mtcars

> par(mfrow=c(1,2))
> hist(mtcars$mpg)
> hist(mtcars$mpg,breaks = 12,col = "red",xlab = "Miles Per Gallon",main = "Colored histogram with 12 bins")
image.png

核密度图

核密度图是估计随机变量概率密度的一种非参数方法

plot(density(x))
> plot(density(mtcars$mpg))
image.png

箱线图

箱线图通过绘制连续型变量的五数,即最小值,下四分位数,中位数,上四分位数,最大值来描述连续型变量的分布

boxplot(x)
> boxplot(mtcars$mpg,main="Box plot",ylab="Miles Per Gallon")
image.png

从下到上五条线依次为最小值,下四分位数,中位数,上四分位数,最大值

默认情况下,两条须的延申不会超过盒型各端加1.5倍四分位的距离,此外的离群点将以点来表示

并列箱线图

箱线图可以展示单个变量或分组变量

boxplot(formula,data=dataframe)
formual是一个公式,dataframe则是提供的数据框

一个示例公式YA,表示类别型变量A的每个值生成数值型变量Y的箱线图,就是Y作为Y轴,A作为X轴,YA*B,由A和B所有水平的两两组合生成Y

> boxplot(mpg~cyl,data = mtcars,main="Car Mileage Data",xlab="Number of cylinders",ylab="Miles per gallon")
image.png
从图可以看书随着发动机缸数的增多,每加仑油行驶的英里数下降,耗油增加
点图
点图提供了一种在简单水平上绘制大量有标签值的方法
dotchart(x,labels=)
x是一个数值向量,labels是一个由每个点的标签组成的向量
可以通过添加参数groups来选定一个因子,用以指定x中的分组方式
gcolor参数可以控制不同组的标签的颜色
cex控制标签大小
> dotchart(mtcars$mpg,labels = rownames(mtcars),cex = .7,main = "Gas Mileage for car models",xlab = "Miles per gallon")
image.png

通常经过排序,分组后的点图更具有观察效果,下面我们根据mpg数值从小到大进行排序并根据汽车发动机缸数(cyl)的个数来分组

> x
> x$cyl
#在x中新建一个color的字符向量
> x$color[x$cyl==4]
> x$color[x$cyl==6]
> x$color[x$cyl==8]
> dotchart(x$mpg,labels = rownames(x),cex = .7,groups = x$cyl,gcolor = "black",color = x$color,pch = 19,main = "Gas Mileage for Car Models",xlab = "Mile Per Gallon")
image.png

这个图相比上一个图结果更加明显,缸数少的汽车每加仑行驶的英里数增加,耗油量减少