R高级画图0210
上面这个图看起来很复杂,其实只要把握正确的细节,就很简单。后来我看了源码,发现就是两个图进行加在一起,上面随机点是一个图,下面四个图是一个图,然后将两个图加在一起。就成了现在这样的图
安装相关包
在之前有的包在CARN上是没有的,可以下面代码安装
install.packages("remotes")
remotes::install_github("jkaupp/jkmisc")
remotes::install_github("wilkelab/ggtext")
别的包基本上都是在CARN上的
下面就是加载包,以及读取数据
数据在我的github上,也可以看我的百度网盘。
数据链接为:
全部复制到浏览器,然后右键保存到本地
library(tidyverse)
library(ggplot2)
library(ggtext)
library(here)
library(ggforce)
library(jkmisc)
library(patchwork)
library(viridisLite)
library(colorspace)
games <- read_csv('games.csv')
more_games <- read_csv('spreadspoke_scores.csv')
接下来就是数据处理部分
一个好的图片,重要的就是图的数据的处理,接下来,就是对代码进行解释以及理解。
数据part 1
hist_games <- more_games %>%
rename(year = schedule_season) %>%
filter(year < 2000) %>%
select(year, home_team = team_home, away_team = team_away, score_home, score_away) %>%
mutate(total = score_home score_away,
pts_win = map2_dbl(score_home, score_away, ~max(.x, .y)),
pts_loss = map2_dbl(score_home, score_away, ~min(.x, .y))) %>%
select(year, home_team, away_team, total, pts_win, pts_loss)
上面代码我们一行一行的解释,首先是%>%
管道函数就不介绍了,不懂得百度一下。
rename(year = schedule_season)
这句话是:因为more_games这个数据框里面有个schedule_season
变量,将这个变量名字改为year
。感觉很怪异,是不是有一点!!!。
filter(year < 2000)
这意思就是将前面一步一步处理的数据框,然后再筛选year < 2000
的行。
select(year, home_team = team_home, away_team = team_away, score_home, score_away))
这句话意思是:再对数据框的一些列进行选择,第一个year
代表选择这一列,第二个home_team = team_home
意思是:再选择team_home
变量,再将这个变量名字改为home_team
。 同样的away_team = team_away
代表选择team_away
,再将这个名字改为away_team
。这个score_home, score_away
也要选择。
mutate(total = score_home score_away, pts_win = map2_dbl(score_home, score_away, ~max(.x, .y)), pts_loss = map2_dbl(score_home, score_away, ~min(.x, .y)))
这句话就是说,建立三个变量:第一个变量叫total
,他的意义就是直接将score_home
和score_away
相加。第二个是pts_win
,他的意义有点复杂,pts_win = map2_dbl(score_home, score_away, ~max(.x, .y))
这句话就是选择score_home
和score_away
,判断这两个变量哪一个更大,哪个大选哪个。更加细一点,就是score_home
对应的是x
的位置,score_away
对应的位置y
,然后map2_dbl
这个函数的第三部分的就是函数,前面加个~
。用来识别函数。更加相似的的就是第三行代码pts_loss = map2_dbl(score_home, score_away, ~min(.x, .y))
。就是为了选择两个中间最小的那一个。这样以看是不是很高效!!!
最后一行还是选择,然后将这系列处理的数据最终结果给hist_games
。这个就是我们的hist_game
。
数据part 2
full_games <- games %>%
select(year, home_team, away_team, pts_win, pts_loss) %>%
mutate(total = pts_win pts_loss) %>%
select(year, home_team, away_team, total, pts_win, pts_loss) %>%
bind_rows(hist_games) %>%
arrange(year)
这个代码块不介绍了,比上面那个简单多了。
areas_data <- full_games %>%
mutate(total_bin = cut(total, c(0, 20, 40, 60, 80, Inf),
labels = c("0 - 2 0 p t s", "2 1 - 4 0 p t s", "4 1 - 6 0 p t s", "6 1 - 8 0 p t s", "8 1 - 1 0 0 p t s"))) %>%
count(year, total_bin)
这个代码就是说,对dull_games
变量数据框,对这个数据框的total
变量进行分组,分组的各个标签就是上面label
对应的。然后再根据year
和total_bin
变量进行分组计数。最后得到areas_data
数据。
代码块part 1
areas <- ggplot(areas_data, aes(x = year, y = n, group = total_bin, fill = total_bin))
geom_area(show.legend = FALSE)
geom_point(data = filter(areas_data, year %in% c(1972, 1974, 1978, 1994)),
aes(fill = total_bin), color = "#ECEFF4",
show.legend = FALSE, shape = 21, stroke = 0.5)
scale_y_continuous(breaks = c(0, 25, 50, 75, 100, 125, 150))
scale_x_continuous(breaks = c(1970, 1980, 1990, 2000, 2010, 2020))
labs(x = NULL, y = "Combined Score")
facet_wrap(~total_bin, nrow = 1)
scale_fill_viridis_d(option = "plasma")
scale_color_viridis_d(option = "plasma", )
theme_jk(dark = TRUE,
grid = "XY")
theme(plot.background = element_rect(colour = NA, fill = "#2E3440", size = 0))
这行代码没什么好介绍的,为什么,因为真的很简单,有的我也看不懂,但是我教你怎么做,
就是一行一行代码运行,不断的扩大范围。geom_area(show.legend = FALSE)
这个就是画一个填充图,show.legend = FALSE
就是让图例不要显示出来,scale_y_continuous
就是让y轴的标签分隔开,各个标签就是对应的breaks。最后知道了total_bin
是分类变量了,然后按照这个变量进行分面,这样一下就将一个图分成4个图。nrow = 1
就是设置只有一行图。scale_fill_viridis_d
就是设置为一个系列的颜色。然后再对这个图,设置一个主题,也就是theme_jk
设置主题,最后再对一些细节进行更改。
数据块part 3
annotations <- full_games %>%
group_by(year) %>%
summarize(total = mean(total)) %>%
mutate(total = ifelse(year %in% c(1972, 1978), total - 2, total 2)) %>%
filter(year %in% c(1972, 1974, 1978, 1994)) %>%
mutate(year2 = if_else(year == 1978, 1984, year),
label_y = c(-2, 135, -2, 135),
arrow_y = c(-10 ,110, -10, 110),
label = c("**1 9 7 2**<br>First major rule change to boost scoring and excitement: Hashmarks set at goal-post width to widen the short-side of the field.",
"**1 9 7 4**<br>Second major rule change: Goalposts moved back and offensive penalties reduced to 10 from 15 yards.",
"**1 9 7 8**<br>Third major rule change: Introduced illegal contact rules and provided more freedom to pass-blocking linemen.",
"**1 9 9 4**<br>Fourth major rule change: Introduced two-point conversions, longer kickoffs and field goal changes."))
上面的代码还是没有第一个简单,第一个更加复杂。
这一个数据是这样的:
数据块part 4
means <- full_games %>%
group_by(year) %>%
summarize(total = mean(total)) %>%
filter(year %in% c(1972, 1974, 1978, 1994))
这个数据如下:
代码块part 2
dots <- full_games %>%
ggplot(aes(x = year, y = total, color = total))
geom_sina(aes(group = year))
scale_x_continuous(breaks = c(1966, 1970, 1980, 1990, 2000, 2010, 2020))
expand_limits(y = c(-40, 150))
scale_y_continuous(breaks = c(0, 25, 50, 75, 100, 125))
stat_summary(fun.y = "mean", geom = "line", color = "#ECEFF4")
stat_summary(fun.y = "mean", geom = "point", color = "#ECEFF4")
geom_point(data = means, fill = "#CC4678FF", color = "#ECEFF4",
show.legend = FALSE, shape = 21, stroke = 1)
geom_curve(data = filter(annotations, !year %in% c(1972, 1978)),
aes(x = year, xend = year, y = arrow_y, yend = total),
color = "#ECEFF4", arrow = arrow(type = "closed", length = unit(3, "mm")),
curvature = 0.2)
geom_curve(data = filter(annotations, year %in% c(1972, 1978)),
aes(x = year2, xend = year, y = arrow_y, yend = total),
color = "#ECEFF4", arrow = arrow(type = "closed", length = unit(3, "mm")),
curvature = -0.2)
geom_textbox(data = annotations,
aes(x = year2, y = label_y, label = label),
family = "Oswald", color = "#ECEFF4", fill = "#2E3440")
scale_color_viridis_c(option = "plasma")
labs(x = NULL, y = "Combined Score")
theme_jk(grid = "XY",
dark = TRUE)
theme(legend.position = "none")
theme(plot.background = element_rect(colour = NA, fill = "#2E3440", size = 0))
就是有时候看不懂代码的时候,就一个一个运行,就是运行加号之前的东西,一步一步增加,每一个加号就代表一部分。你只要看代码增加了,你的图怎么变得,反复看看,不停的比较,更加重要的就是看别人怎么写的,以及函数什么意思,只要数据对了,数据传递给正确的函数参数,基本上就没有问题。
代码块 part 3
plot <- wrap_plots(dots, areas, ncol = 1, heights = c(0.75, 0.25))
plot_annotation(title = "Did NFL Rule Changes to Boost Scoring and Make the Game More Exciting Actually Work?",
subtitle = "Illustrated below is a sina plot (combined strip and violin plots)
showing the combined score of each NFL game from 1966 to 2019 seasons.
The white line indicates the average combined score for each<br>season.
The area charts below show binned combined scores, circles indicating
seasons the rules were changed.
From both charts we can see that average scores have been rising,
specifically in the 41-60 and 61-80 categories rising<br>since the rule changes.",
caption = "**Data**: Pro Football Reference | 微信公众号: pypi",
theme = theme_jk(dark = TRUE))
wrap_plots这个函数就是将上面两个图进行合并,然后设置各自所占用的宽的的比例,然后再加上标题,副标题等,就可以看懂了。
代码块 part 4
保存下来:
ggsave("tw6_plot.png", plot, width = 16, height = 10)