python line Python linearmodels

转载

AI大梦想家 2023-08-16 09:19:28

文章标签 python line python 回归数据数据结构 文章分类 Python 后端开发

文章目录

一、导入相关库
二、获取面板数据
三、固定效应模型

（一）使用ols估计固定效应模型
（二）使用PanelOLS估计固定效应模型

我们以伍德里奇《计量经济学导论：现代方法》的”第14章高级面板数据方法“的案例14.1为例，使用jtrain中的数据来进行固定效应估计。

一、导入相关库

import wooldridge as woo
import pandas as pd
import statsmodels.formula.api as smf
from linearmodels.panel import PanelOLS

二、获取面板数据

在使用面板数据分析工具之前，首先要改变一下数据结构，使数据集体现出面板数据的特征，表明每个观察个体的身份信息和时间信息。具体的方法是增加两个索引，也就是说通过这两个索引可以确定唯一的一条数据。

jtrain = woo.dataWoo('jtrain')
jtrain = jtrain.set_index(['fcode', 'year'])

获得数据如下所示：

employ       sales   avgsal  ...     d89_w   grant_w  grant_1_w
fcode    year                               ...                               
410032.0 1987   100.0  47000000.0  35000.0  ... -0.333333  0.000000        0.0
         1988   131.0  43000000.0  37000.0  ... -0.333333  0.000000        0.0
         1989   123.0  49000000.0  39000.0  ...  0.666667  0.000000        0.0
410440.0 1987    12.0   1560000.0  10500.0  ... -0.333333  0.000000        0.0
         1988    13.0   1970000.0  11000.0  ... -0.333333  0.000000        0.0
              ...         ...      ...  ...       ...       ...        ...
419483.0 1988   108.0  11500000.0  14810.0  ... -0.333333  0.000000        0.0
         1989   129.0  12000000.0  14227.0  ...  0.666667  0.000000        0.0
419486.0 1987    80.0   7000000.0  16000.0  ... -0.333333 -0.333333        0.0
         1988    90.0   8500000.0  17000.0  ... -0.333333 -0.333333        0.0
         1989   100.0   9900000.0  18000.0  ...  0.666667  0.666667        0.0

三、固定效应模型

固定效应，即面板数据中随个体变化但不随时间变化的一类变量方法。

消除固定效应的一种方法是去除时间均值法，即将每个变量的每条数据都减去该变量按照时间的均值，得到的差值称为去除时间均值数据，然后利用去除时间均值数据进行回归估计参数的方法。

考虑只有一个解释变量的模型：对每个i，有：
$python line Python linearmodels_数据$
对每个i求方程在时间上的平均，即: $python line Python linearmodels_python_02$ ， $python line Python linearmodels_回归_03$

于是得到如下方程：
$python line Python linearmodels_python line_04$
进而得到去除固定效应而又保留参数的方程：
$python line Python linearmodels_python_05$
这样处理虽然可以消除固定效应，但是把不随时间变化的解释变量也剔除了，如种族、性别等变量。

（一）使用ols估计固定效应模型

#求被解释变量、解释变量的去除时间均值
jtrain['lscrap_w'] = jtrain['lscrap'] - jtrain.groupby('fcode').mean()['lscrap']
jtrain['d88_w'] = jtrain['d88'] - jtrain.groupby('fcode').mean()['d88']
jtrain['d89_w'] = jtrain['d89'] - jtrain.groupby('fcode').mean()['d89']
jtrain['grant_w'] = jtrain['grant'] - jtrain.groupby('fcode').mean()['grant']
jtrain['grant_1_w'] = jtrain['grant_1'] - jtrain.groupby('fcode').mean()['grant_1']

#用OLS方程对去除时间均值进行估计
results_man = smf.ols(formula='lscrap_w ~ 0 + d88_w + d89_w + grant_w + grant_1_w', data=jtrain).fit()
print(results_man.summary())

结果如下：

OLS Regression Results                                
======================================================================================
Dep. Variable:              lscrap_w   R-squared (uncentered):                   0.201
Model:                           OLS   Adj. R-squared (uncentered):              0.181
Method:                Least Squares   F-statistic:                              9.940
Date:               Thu, 14 Jul 2022   Prob (F-statistic):                    3.36e-07
Time:                        20:30:58  Log-Likelihood:                         -80.946
No. Observations:                 162  AIC:                                      169.9
Df Residuals:                     158  BIC:                                      182.2
Df Model:                           4                                                 
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
d88_w         -0.0802      0.089     -0.903      0.368      -0.256       0.095
d89_w         -0.2472      0.108     -2.287      0.024      -0.461      -0.034
grant_w       -0.2523      0.122     -2.065      0.041      -0.494      -0.011
grant_1_w     -0.4216      0.171     -2.472      0.014      -0.758      -0.085
==============================================================================
Omnibus:                       65.791   Durbin-Watson:                   2.294
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              770.661
Skew:                          -1.074   Prob(JB):                    4.50e-168
Kurtosis:                      13.467   Cond. No.                         4.01
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

（二）使用PanelOLS估计固定效应模型

模块linearmodels提供PandelOLS进行固定效应模型。entity_effects=True表示特定因素。

exog_vars=['d88','d89','grant','grant_1']
exog=jtrain[exog_vars]
reg_fe = PanelOLS(jtrain.lscrap, exog, entity_effects=True)
results_fe = reg_fe.fit()
print(results_fe)

根据固定效应方法的估计方程可以知道，固定效应方程是没有截距项的，所以在以上代码中不增加常数项。

或者也可以采用from_formula方法，代码如下：

reg_fe = PanelOLS.from_formula(formula='lscrap ~ d88 + d89 + grant + grant_1 + EntityEffects', data=jtrain)
results_fe = reg_fe.fit()
print(results_fe)

结果如下：

PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:                 lscrap   R-squared:                        0.2010
Estimator:                   PanelOLS   R-squared (Between):             -0.1103
No. Observations:                 162   R-squared (Within):               0.2010
Date:                Thu, Jul 14 2022   R-squared (Overall):             -0.0839
Time:                        13:48:36   Log-likelihood                   -80.946
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      6.5426
Entities:                          54   P-value                           0.0001
Avg Obs:                       3.0000   Distribution:                   F(4,104)
Min Obs:                       3.0000                                           
Max Obs:                       3.0000   F-statistic (robust):             6.5426
                                        P-value                           0.0001
Time periods:                       3   Distribution:                   F(4,104)
Avg Obs:                       54.000                                           
Min Obs:                       54.000                                           
Max Obs:                       54.000                                           
                                                                                
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
d88           -0.0802     0.1095    -0.7327     0.4654     -0.2973      0.1369
d89           -0.2472     0.1332    -1.8556     0.0663     -0.5114      0.0170
grant         -0.2523     0.1506    -1.6751     0.0969     -0.5510      0.0464
grant_1       -0.4216     0.2102    -2.0057     0.0475     -0.8384     -0.0048
==============================================================================

F-test for Poolability: 24.661
P-value: 0.0000
Distribution: F(53,104)

Included effects: Entity

python line Python linearmodels_回归_06