不同分组之间的交互效应R语言分组回归和交互项

转载

mob64ca140beea5 2023-11-20 21:57:11

文章标签 不同分组之间的交互效应R语言其他 ci 数据 High 文章分类 R语言后端开发

STATA学习笔记：分组统计和分组回归

1. 分组统计

（1）对一个类别变量进行统计时

tabulate命令

tabulate oneway
//for one-way tables of frequencies

tabulate twoway 
//for two-way tables of frequencies

graph bar命令

graph bar  yvars [if] [in] [weight] [, options]

*graph bar draws vertical bar charts.
*In a vertical bar chart, the y axis is numerical,
*and the x axis is categorical.

graph hbar yvars [if] [in] [weight] [, options]

*graph hbar draws horizontal bar charts.
*In a horizontal bar chart, the numerical axis is still called the y axis,
*and the categorical axis is still
*called the x axis, but y is presented horizontally, and x vertically.

sysuse nlsw88.dta, clear
tabulate industry

不同分组之间的交互效应R语言分组回归和交互项_ci

graph bar (count) idcode,over (industry)

不同分组之间的交互效应R语言分组回归和交互项_其他_02

graph hbar (count) idcode,over (industry) blabel(bar,position(outside))
//blabel选项为添加

不同分组之间的交互效应R语言分组回归和交互项_High_03

sysuse citytemp
graph bar (mean) tempjuly tempjan, over(region)   ///
bargap(-30)   ///
legend( label(1 "July") label(2 "January") )   ///
ytitle("Degrees Fahrenheit")   ///
title("Average July and January temperatures")   ///
subtitle("by regions of the United States")   ///
note("Source:  U.S. Census Bureau, U.S. Dept. of Commerce")

不同分组之间的交互效应R语言分组回归和交互项_High_04

graph pie命令

graph pie,over(industry)

不同分组之间的交互效应R语言分组回归和交互项_ci_05

graph pie,over(industry) sort
//按照饼块大小排序，更清晰

不同分组之间的交互效应R语言分组回归和交互项_High_06

（2）对多个类别变量进行统计时

tabulate命令

sysuse nlsw88.dta, clear
tabulate industry collgrad
//显示每一个行业下没有大学文凭和有大学文凭的妇女的人数

不同分组之间的交互效应R语言分组回归和交互项_数据_07

tabulate industry collgrad,colum nofreq
//不显示frequency，计算每一个行业下的有文凭的妇女和没有文凭的妇女分别占妇女总人数的比重

不同分组之间的交互效应R语言分组回归和交互项_High_08

tabulate industry collgrad,colum
 //既显示人数，也显示比重

不同分组之间的交互效应R语言分组回归和交互项_其他_09

sysuse nlsw88.dta, clear
 tabulate industry collgrad,sum (wage) means
 //以行业和是否大学毕业作为分组依据，统计每一组的wage均值

不同分组之间的交互效应R语言分组回归和交互项_数据_10

sysuse census
tabulate region, gen(reg)
//基于地区分组，对每个地区分别产生一个虚拟变量reg

不同分组之间的交互效应R语言分组回归和交互项_ci_11

不同分组之间的交互效应R语言分组回归和交互项_其他_12

graph bar命令

sysuse nlsw88.dta, clear
graph hbar (mean) wage, over(industry) over(collgrad)
//graph bar基于两个类别变量分组时，要用两个over

不同分组之间的交互效应R语言分组回归和交互项_High_13

sysuse nlsw88.dta, clear
graph hbar (mean) wage, over(industry) over(collgrad)  ///
blabel(bar, position(outside))
//注意：当使用“///”对一条过长的命令进行分行时，每一行末尾的“///”和该行最后的字母之间要有两个空格以上的距离

不同分组之间的交互效应R语言分组回归和交互项_ci_14

外部命令tabplot

Description

tabplot plots a table of numerical values (for example, frequencies, fractions, or percents) in graphical form as a bar chart.  It is mainly intended
for representing contingency tables for one, two, or three categorical variables.  It also has uses for producing multiple histograms and graphs for
general one-, two-, or three-way tables.

tabplot industry collgrad

不同分组之间的交互效应R语言分组回归和交互项_High_15

tabplot industry collgrad, horizontal

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_16

选项showval
showval specifies that numeric values be shown beneath (or if horizontal is specified, to the left of) bars.

tabplot, showval(offset(#)) 
//specifies an offset between the base (or left-hand edge) of the bar 
//and the position of the numeric value.  Default is 0.1
//with two variables or 0.02 with one variable.

tabplot, percent()

选项percent
percent(varlist) indicates that all frequencies should be shown as percents (with sum 100) of the total frequency for each distinct category defined by the combinations of varlist. Usually, varlist will be one or more of the variables specified.

tabplot industry collgrad, horizontal   ///
percent(collgrad)  showval(offset(0.05))

不同分组之间的交互效应R语言分组回归和交互项_ci_17

2. 将连续变量转化为类别变量，继而分组统计/回归

（1）group命令

a. 分组

sysuse nlsw88, clear
sort wage
gen g_wage = group(5)
//将工资按大小排列后，等分为5组
br wage g_wage
tab g_wage
//显示每一个工资档共有多少数据

不同分组之间的交互效应R语言分组回归和交互项_High_18

不同分组之间的交互效应R语言分组回归和交互项_数据_19

b. 为新生成的类别变量g_wage添加value labels

复习：STATA学习笔记-变量-变量标签

sysuse nlsw88, clear
sort wage
gen g_wage = group(5)
br wage g_wage
tab g_wage
//为g_wage的1，2，3，4，5添加值标签
//分两步走：
//第一步：label define
label define la_gwage 1 "low" 5 "high"
//第二步：label values
label values g_wage la_gwage

不同分组之间的交互效应R语言分组回归和交互项_ci_20

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_21

c. 统计分析

c.1 help tabstat

tabstat命令

sysuse auto,clear
tabstat price weight mpg rep78
//Show the mean (by default) of price, weight, mpg, and rep78

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_22

sysuse auto,clear
tabstat price weight mpg rep78,by(foreign)
//Show the mean (by default) of price, weight, mpg, and rep78 by categories of foreign

不同分组之间的交互效应R语言分组回归和交互项_其他_23

sysuse auto,clear
tabstat price weight mpg rep78, by(foreign) stat(mean sd min max)
//In addition to mean, show standard deviation, minimum, and maximum

不同分组之间的交互效应R语言分组回归和交互项_其他_24

c.2 tabstat,by(g_wage)

sysuse nlsw88, clear
sort wage
gen g_wage = group(5)
br wage g_wage
tab g_wage
//为g_wage的1，2，3，4，5添加值标签
//分两步走：
//第一步：label define
label define la_gwage 1 "low" 5 "high"
//第二步：label values
label values g_wage la_gwage
tabstat wage age married union collgrad south,by(g_wage)

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_25

（2）外部命令 quantiles分组

Description

quantiles creates a new variable (newvar) that categorizes varname by its quantiles. It differs from xtile because the categories are defined by the
ideal size of the quantile rather than by the cutpoints, therefore yielding less unequaly sized categories when the cutpoint value is frequent, when
using weights or when the number of observations in the dataset is not a product of the number of quantiles. It may not work properly on small
datasets or if calculated for small groups. If the number of observations in the dataset or group is small, tabulate newvar afterwards to check the
results.

quantiles "var", gen "new_varname"
//gencatvar(newvar) specifies the new variable to be generated

quantiles var,nquant()
// specifies the number of quantiles; # must be an integer and greater than or equal to 2. If not specified default is 10 quantiles

bysort industry: quantiles wage, gen(p_wage) nq(3) 
//按行业分组之后，把每个行业内部的wage按33th,66th分成3组
//每一个行业对应一个类别变量p_wage,有三个值：1,2,3

不同分组之间的交互效应R语言分组回归和交互项_High_26

3.分组回归

sysuse nlsw88, clear
sort wage
gen g_wage = group(5)
br wage g_wage
tab g_wage
//为g_wage的1，2，3，4，5添加值标签
//分两步走：
//第一步：label define
label define la_gwage 1 "low" 5 "high"
//第二步：label values
label values g_wage la_gwage
tabstat wage age married union collgrad south,by(g_wage)

local x "age ttl_exp married union collgrad south c_city"
//对g_wage=1的数据进行回归
reg wage `x'  if g_wage == 1,robust
est store Low
//对g_wage=3的数据进行回归
reg wage `x' if g_wage == 3,robust
est store Mid
//对g_wage=5的数据进行回归
reg wage `x' if g_wage == 5,robust
est store High
//对所有数据统一进行回归
reg wage `x',robust
est store Full

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_27

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_28

不同分组之间的交互效应R语言分组回归和交互项_其他_29

不同分组之间的交互效应R语言分组回归和交互项_High_30

local m "Low Mid High Full"
esttab `m'

不同分组之间的交互效应R语言分组回归和交互项_不同分组之间的交互效应R语言_31

local m "Low Mid High Full"
esttab `m',mtitle(`m')
//将定义的low mid high full作为表格第一行列名

不同分组之间的交互效应R语言分组回归和交互项_High_32

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：python协程面试题 python 协程性能

下一篇：热成像 python 热成像仪便携式

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯