概述

双因素方差分析(Double factor variance analysis) 有两种类型:一个是无交互作用的双因素方差分析,它假定因素A和因素B的效应之间是相互独立的,不存在相互关系;另一个是有交互作用的双因素方差分析,它假定因素A和因素B的结合会产生出一种新的效应。例如,若假定不同地区的消费者对某种品牌有与其他地区消费者不同的特殊偏爱,这就是两个因素结合后产生的新效应,属于有交互作用的背景;否则,就是无交互作用的背景。

双因素方差检验(Two factor variance test)_统计

 

双因素方差检验(Two factor variance test)_双因素方差检验_02

双因素方差检验(Two factor variance test)_双因素方差检验_03

 

 

双因素方差检验(Two factor variance test)_Two factor variance_04

 

数据来源

https://github.com/thomas-haslwanter/statsintro_python/tree/master/ISP/Code_Quantlets/08_TestsMeanValues/anovaTwoway

双因素方差检验(Two factor variance test)_数学_05

双因素方差检验(Two factor variance test)_Two factor variance_06

双因素方差检验(Two factor variance test)_数学_07

原数据

双因素方差检验(Two factor variance test)_Two factor variance_08

H0假设

双因素方差检验(Two factor variance test)_双因素方差检验_09

平方和总计=因素1平方和+因素2平方和+因素1*因素2+组内误差平方和

双因素方差检验(Two factor variance test)_Two factor variance_10

计算的F分数表

红色区间就拒绝H0

双因素方差检验(Two factor variance test)_双因素方差检验_11

根据两个因素,把原始数据分为六个组,并计算六个组的均值,组成一个矩阵

双因素方差检验(Two factor variance test)_数学_12

双因素方差检验(Two factor variance test)_python_13

计算性别因素的平方和

双因素方差检验(Two factor variance test)_统计_14

计算年龄因素平方和

双因素方差检验(Two factor variance test)_Two factor variance_15

计算组内误差平方和

双因素方差检验(Two factor variance test)_双因素方差检验_16

总平方和

双因素方差检验(Two factor variance test)_数学_17

两个因素平方和=总平方和 - 第一个因素平方和 - 第二个因素平方和 - 组内误差平方和

算出来为7

双因素方差检验(Two factor variance test)_python_18

双因素方差检验(Two factor variance test)_python_19

计算F分数,

F_第一因素=第一因素平方和/组内误差平方和

F_第二因素=第二因素平方和/组内误差平方和

F_第一第二因素交互=第一第二因素交互平方和/组内误差平方和

双因素方差检验(Two factor variance test)_双因素方差检验_20

双因素方差检验(Two factor variance test)_统计_21

spss应用

R**2=0.518,年龄和性别对分数影响只占了一半,还有其他因素造成分数的波动。

双因素方差检验(Two factor variance test)_数学_22

双因素方差检验(Two factor variance test)_双因素方差检验_23

双因素方差检验(Two factor variance test)_Two factor variance_24

双因素方差检验(Two factor variance test)_Two factor variance_25

双因素方差检验(Two factor variance test)_python_26

python代码测试结果和spss一致

双因素方差检验(Two factor variance test)_数学_27

方差分析样本量:

方差分析前提是样本符合正态分布,样本量越大,正态分布可能就越高。

if we suppose that you have k groups, N is the total sample size for all groups, then n-k  should exceeds zero. Otherwise, there is no minimum size for each group except you need 2 elements for each to enable calculating the variance, but this is just a theoretical criteria.

However, to use ANOVA you need the check the Normal distribution for each group, so the higher size of your groups sizes, the more opportunity to have the Normal distribution.


Is there a minimum number per group neccessary for an ANOVA?. Available from: https://www.researchgate.net/post/Is_there_a_minimum_number_per_group_neccessary_for_an_ANOVA [accessed Jun 2, 2017].

由于分组的样本量太小,单独两两检验时,发现与双因素方差检验结果不一致,年龄有显著差异,性别无显著差异

双因素方差检验:年龄,性别都有显著不同

三种广告和两种媒体的双因素方差检验

数据

双因素方差检验(Two factor variance test)_统计_28

spss结果

 双因素方差检验(Two factor variance test)_python_29

python结果和spss结果一致

广告方案 VS 销量  有显著差异

广告媒体 VS销量   无显著差异

python使用了参数检验和非参数检验

双因素方差检验(Two factor variance test)_统计_30

 

#原创公众号(python风控模型)
from scipy.stats.mstats import kruskalwallis
import scipy.stats as stats
import numpy as np
import scipy as sp
list_paper=[8,12,22,14,10,18]
list_TV=[12,8,26,30,18,14]
list_group=[list_paper,list_TV]
def Kruskawallis_test(list_group):
    print"Use kruskawallis test:"
    h, p = kruskalwallis(list_group)
    print"H value:",h
    print"p",p
    if p<0.05:
        print('There is a significant difference.')
        return True
    else:
        print('No significant difference.')
        return False

def Mannwhitneyu(group1, group2):     
    if np.int(sp.__version__.split('.')[1]) > 16:
        u, p_value = stats.mannwhitneyu(group1, group2, alternative='two-sided')
    else:
        u, p_value = stats.mannwhitneyu(group1, group2, use_continuity=True)
        p_value *= 2   
    print(("Mann-Whitney test", p_value))

    if p_value<0.05:
        print "there is significant difference"
    else:
        print "there is no significant difference"

print(stats.ttest_ind(list_paper,list_TV))
print(Mannwhitneyu(list_paper,list_TV))

list_adPlan1=[8,12,12,8]
list_adPlan2=[22,14,26,30]
list_adPlan3=[10,18,18,14]
list_group=[list_adPlan1,list_adPlan2,list_adPlan3]
print(Kruskawallis_test(list_group))
print(stats.f_oneway(list_adPlan1,list_adPlan2,list_adPlan3))

 

超市位置 竞争者数量 销售

数据

双因素方差检验(Two factor variance test)_双因素方差检验_31

分析结果:超市位置,竞争者数量,两者交互都具有显著关系,R**2=0.78,三个因素占了方差差异的78%

双因素方差检验(Two factor variance test)_Two factor variance_32

双因素方差检验(Two factor variance test)_python_33                                                  

双因素方差检验(Two factor variance test)_python_34

 python 与spss结果一致

 variance_check.py

import scipy,math
from scipy.stats import f
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from statsmodels.stats.diagnostic import lillifors
from statsmodels.sandbox.stats.multicomp import multipletests
import itertools

a=0.05
def check_normality(testData):
    if 20<len(testData) <50:
       p_value= stats.normaltest(testData)[1]
       if p_value<0.05:
           print"use normaltest"
           print"p of normal:",p_value
           print "data are not normal distributed"
           return  False
       else:
           print"use normaltest"
           print"p of normal:",p_value
           print "data are normal distributed"
           return True
    if len(testData) <50:
       p_value= stats.shapiro(testData)[1]
       if p_value<0.05:
           print "use shapiro:"
           print"p of normal:",p_value
           print "data are not normal distributed"
           return  False
       else:
           print "use shapiro:"
           print"p of normal:",p_value
           print "data are normal distributed"
           return True

    if 300>=len(testData) >=50:
       p_value= lillifors(testData)[1]
       if p_value<0.05:
           print "use lillifors:"
           print"p of normal:",p_value
           print "data are not normal distributed"
           return  False
       else:
           print "use lillifors:"
           print"p of normal:",p_value
           print "data are normal distributed"
           return True
    if len(testData) >300:
       p_value= stats.kstest(testData,'norm')[1]
       if p_value<0.05:
           print "use kstest:"
           print"p of normal:",p_value
           print "data are not normal distributed"
           return  False
       else:
           print "use kstest:"
           print"p of normal:",p_value
           print "data are normal distributed"
           return True

def NormalTest(list_groups):
    for group in list_groups:
        status=check_normality(group)
        if status==False :
            return False
    return True

               

def Combination(list_groups):
    combination= []
    for i in range(1,len(list_groups)+1):
        iter = itertools.combinations(list_groups,i)
        combination.append(list(iter))
    return combination[1:-1][0]


def Levene_test(group1,group2,group3):
    leveneResult=scipy.stats.levene(group1,group2,group3)
    p=leveneResult[1]
    print"levene test:"
    if p<0.05:
        print"variances of groups are not equal"
        return False
    else:
        print"variances of groups are equal"
        return True

def Equal_lenth(list_groups):
    list1=list_groups[0]
    list2=list_groups[1]
    list3=list_groups[2]
    list1_removeNan=[x for x in list1 if str(x) != 'nan' and str(x)!= '-inf']
    list2_removeNan=[x for x in list2 if str(x) != 'nan' and str(x)!= '-inf']
    list3_removeNan=[x for x in list3 if str(x) != 'nan' and str(x)!= '-inf']
    len1=len(list1_removeNan)
    len2=len(list2_removeNan)
    len3=len(list3_removeNan)
    if len1==len2==len3:
        return True
    else:
        return False

版权声明:文章来自公众号(python风控模型),未经许可,不得抄袭。遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。

欢迎各位同学学习更多的相关知识:

python机器学习生物信息学,博主录制,2k超清
https://edu.51cto.com/sd/3a516
双因素方差检验(Two factor variance test)_统计_35
(微信二维码扫一扫报名)