Paper Review
- 1. Estimation and Inference of Heterogeneous Treatment Effects using Random Forest
- 1.1 Asymptotic analysis
- 1.2 Double-Sample Trees
- 2. Generalized Random Forests
- 2.1 Algorithm
- 1. Forest-based local estimation
- 2. Splitting to maximize heterogeneity
- 3. The gradient tree algorithm
- 2.2 Asymptotic analysis
- 2.3 Experiments
- 1. CAPE
- 2. Quantile Regression Forest
- 3. Orthogonal Random Forest for Causal Inference
- 3.1 Introduction
- 3.2 Algorithm
- 1. first stage
- 2. second stage
- 3.3 Experiments
- 4. Decision trees for uplift modeling with single and multiple treatments
- 4.1 Single Treatment
- 4.2 Multiple treatment
Uplift model with multiple treatments
1. Estimation and Inference of Heterogeneous Treatment Effects using Random Forest
二元干预情形下估计
1.1 Asymptotic analysis
- Under some condition,
- 可以用infinitesimal jackknife估计
其中,系数项只能对无放回的子抽样做修正
证明过程分为两步:
- 先证明偏差的bound
- 再证明近似正态
利用Hajek projection和k-PNN先证明T is ν-incremental
1.2 Double-Sample Trees
回归树T分裂准则为最小化MSE,
考虑到,上式等价于最大化的方差
2. Generalized Random Forests
2.1 Algorithm
1. Forest-based local estimation
目的:给定, 估计,如估计HTE时,。
方法:求解方程,其中,分别是感兴趣的参数和无关参数
- 权重估计阶段:衡量和的相似程度,将同一叶子结点中的"“共现频率”"作为其权重
其中为第b棵树所在叶子结点的所有数据 - 加权求解
例子:求解,取 ,则有,方程的解为
2. Splitting to maximize heterogeneity
针对某一节点P和数据J,参数的估计方法为
将结点P分裂为两个子节点,目标为最小化
在某些条件下, ,所以分裂等价于最大化节点间的异质性,即
3. The gradient tree algorithm
为减少计算量,采用梯度近似
其中,取出的值,消去无关参数,近似
注:当不可导时,可以采用分位数回归
故分裂阶段可以分为以下2步
- labeling step:计算父节点的 ,以及每个样本的伪值
- regression step:最大化近似分裂准则
回归过程中,分裂准则的近似误差在一定范围内
2.2 Asymptotic analysis
定义expected score function
- consistency
- approximate normality
2.3 Experiments
1. CAPE
目标是估计,score function
此时 相当于:
- Forest
其中,
GRF算法实施时,权重可自动求解,但需要计算对应的伪结果,注意只关注
对比实际代入的表达式,实际对W做centering
- Local Centering
提前对Y和W做中心化处理,类似残差,使得估计效果更好
2. Quantile Regression Forest
3. Orthogonal Random Forest for Causal Inference
3.1 Introduction
DML的优势:即使第一阶段的估计有误差,第二阶段的估计仍可以近似正态;劣势:HTE预设参数形式。CF的优势:非参数估计;劣势:很大程度上要求低维度W。ORF在GRF的基础上,参考DML新增对无关参数的正交估计(First stage),减少误差。
At a high level, ORF can be viewed as an orthogonalized version of GRF that is more robust to the nuisance estimation error. The key modification to GRF’s tree learner is our incorporation of orthogonal nuisance estimation in the splitting criterion.
3.2 Algorithm
建树时每次分裂的过程two-stage
1. first stage
2. second stage
- split
具体执行分裂的算法类似GRF的gradient tree algorithm,但考虑到honesty,集合略有改动,其中是用于分裂的数据,是first stage估计的无关参数
where
- labeling step:计算父节点的 ,以及每个样本的伪值
- regression step:maximize proxy heterogeneity score
- Predict
同样仅限于的估计样本
以下定理保证了在x邻域内非零
3.3 Experiments
- DML Partially Linear Regression(PLR, Robinson, 1988)
则score function为 - ORF
数据,其中T是连续或离散的Treatment,Y是outcome,是potential confounders/controls,是特征
confounders分别通过和影响outcome和treatment
为treatment effect function,目标是估计CATE
基于DML思想,残差化
定义,, , 则有
则score function
其中是的估计
4. Decision trees for uplift modeling with single and multiple treatments
4.1 Single Treatment
- Split rule:maximize the differences between class distributions
- Normalising:C4.5对gain除以info避免bias,而本文的norm主要惩罚两边子节点中treatment和control组比例不平衡的,这和随机试验的假设相悖
下式第一项系数考虑比例不平衡,后两项考虑相对样本大小
(1) D=KL:
(2) D=欧式/卡方
4.2 Multiple treatment
- Split rule
- Normalizing