最近看完了空间计量经济学的理论部分,因此打算开始学习一下实战,实战所使用的主要是GEODA家族的软件包们,首先还是打算先学习python的pysal包,毕竟还是更喜欢代码,而且相较于GEODA和GEODASPACE,写代码还是会更灵活一点。这一部分也打算写一个系列,这是第一块,数据读取及预处理,以及权重矩阵的一些知识和代码,这个系列主要侧重于代码,理论的话基本就不涉及啦,需要的可以学习下沈体雁,于瀚辰老师写的《空间计量经济学》。主要是借助《空间计量分析软件》一本书来学习,书中的pysal的版本应该是1.x的,而现在pysal更新到2.x了,很多书中的代码不一定可以,但是为了方便学习,我还是安装了pysal1.14.4,学会了之后改到2.x想必也是轻松的。
import numpy as np
import pysal as ps
import libpysal as lps
E:\Anaconda\lib\site-packages\pysal\__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysalnext` package. The API changes and a guide on how to change imports is provided at https://migrating.pysal.org
), VisibleDeprecationWarning)
E:\Anaconda\lib\site-packages\libpysal\examples\remotes.py:26: UserWarning: Remote data sets not available. Check connection.
warnings.warn("Remote data sets not available. Check connection.")
1.读取数据
读取数据并将数据转换成numpy格式。
db = ps.open(ps.examples.get_path('NAT.dbf'),'r')
db
DataTable: E:\Anaconda\lib\site-packages\pysal\examples\nat\NAT.dbf
print(len(db))
db.header
3085
['NAME',
'STATE_NAME',
'STATE_FIPS',
'CNTY_FIPS',
'FIPS',
'STFIPS',...]
db.by_col('HR90')
[0.0,
15.885623511,
6.4624531472,
6.9965017491,
...]
y = np.array([db.by_col('HR90')]).T
y
array([[ 0. ],
[15.88562351],
[ 6.46245315],
...,
[ 4.36732988],
[ 3.72771194],
[ 2.04885495]])
x_names = ['RD90','UE90']
x1=np.array([db.by_col(var) for var in x_names]).T
x1
array([[-0.80277374, 3.89479009],
[-0.13548304, 16.8115942 ],
[-0.27654392, 10.70079408],
...,
[-1.33977047, 4.1028878 ],
[-1.74736678, 3.35398107],
[-0.36114466, 5.48855304]])
x1.shape
(3085, 2)
2.邻接权重矩阵
下面建立的邻接权重矩阵的拓扑图如下图所示:
# 对一个3*3的方针从0开始按顺序编号,此外,加一个独立的9方块
# 邻接权重矩阵的结构,字典的形式,键是观测值id,值是与观测值id相邻接的观测值id
neighbors = {0:[3,1],1:[0,4,2],2:[1,5],3:[0,6,4],4:[1,3,7,5],5:[2,4,8],6:[3,7],7:[4,6,8],8:[5,7],9:[]}
# 与邻接权重矩阵结构相一致的权重,也是字典的形式,键是观测值id,值是和neighbors值对应的权重
weights = {0:[5,3],1:[3,2,4],2:[4,1],3:[5,7,2],4:[2,2,3,1],5:[1,1,8],6:[7,3],7:[3,3,8],8:[8,8],9:[]}
# 观测值序列的顺序
id_order = [9,0,3,6,1,4,7,2,5,8]
#创建pysal中的邻接权重矩阵
w = ps.W(neighbors=neighbors ,weights=weights,id_order=id_order)
w
E:\Anaconda\lib\site-packages\pysal\weights\weights.py:186: UserWarning: There is one disconnected observation (no neighbors)
warnings.warn("There is one disconnected observation (no neighbors)")
E:\Anaconda\lib\site-packages\pysal\weights\weights.py:187: UserWarning: Island id: 9
warnings.warn("Island id: %s" % str(self.islands[0]))
<pysal.weights.weights.W at 0x31a1e1b388>
print("邻接权重矩阵的权重:",w.weights)
print("邻接权重矩阵的观测序列:",w.id_order)
print("邻接权重矩阵的观测点个数:",w.n)
邻接权重矩阵的权重: {0: [5, 3], 1: [3, 2, 4], 2: [4, 1], 3: [5, 7, 2], 4: [2, 2, 3, 1], 5: [1, 1, 8], 6: [7, 3], 7: [3, 3, 8], 8: [8, 8], 9: []}
邻接权重矩阵的观测序列: [9, 0, 3, 6, 1, 4, 7, 2, 5, 8]
邻接权重矩阵的观测点个数: 10
# 输出字典,键值对的键按照观测序列顺序排列,键是观测序列id,值是该观测id的顺序是第几个
w.id2i
{9: 0, 0: 1, 3: 2, 6: 3, 1: 4, 4: 5, 7: 6, 2: 7, 5: 8, 8: 9}
# transform输出当前的权重值类型,默认为'O',即二进制
w.transform
'O'
w.transform = 'R' # R是行标准化
w.weights
('WARNING: ', 9, ' is an island (no neighbors)')
{0: [0.625, 0.375],
1: [0.3333333333333333, 0.2222222222222222, 0.4444444444444444],
2: [0.8, 0.2],
3: [0.35714285714285715, 0.5, 0.14285714285714285],
4: [0.25, 0.25, 0.375, 0.125],
5: [0.1, 0.1, 0.8],
6: [0.7, 0.3],
7: [0.21428571428571427, 0.21428571428571427, 0.5714285714285714],
8: [0.5, 0.5],
9: []}
w.transformations # 输出历史改动信息
{'O': {0: [5, 3],
1: [3, 2, 4],
2: [4, 1],
3: [5, 7, 2],
4: [2, 2, 3, 1],
5: [1, 1, 8],
6: [7, 3],
7: [3, 3, 8],
8: [8, 8],
9: []},
'R': {0: [0.625, 0.375],
1: [0.3333333333333333, 0.2222222222222222, 0.4444444444444444],
2: [0.8, 0.2],
3: [0.35714285714285715, 0.5, 0.14285714285714285],
4: [0.25, 0.25, 0.375, 0.125],
5: [0.1, 0.1, 0.8],
6: [0.7, 0.3],
7: [0.21428571428571427, 0.21428571428571427, 0.5714285714285714],
8: [0.5, 0.5],
9: []}}
# 创建一个完整的numpy数组,包含两个元素,第一个是空间权重矩阵,第二个是id_order信息
# 第0行就是id为9的node的邻接权重信息,第0行的顺序也是按照id_order的
w.full()
(array([[0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.625 , 0. , 0.375 ,
0. , 0. , 0. , 0. , 0. ],
[0. , 0.35714286, 0. , 0.5 , 0. ,
0.14285714, 0. , 0. , 0. , 0. ],
[0. , 0. , 0.7 , 0. , 0. ,
0. , 0.3 , 0. , 0. , 0. ],
[0. , 0.33333333, 0. , 0. , 0. ,
0.22222222, 0. , 0.44444444, 0. , 0. ],
[0. , 0. , 0.25 , 0. , 0.25 ,
0. , 0.375 , 0. , 0.125 , 0. ],
[0. , 0. , 0. , 0.21428571, 0. ,
0.21428571, 0. , 0. , 0. , 0.57142857],
[0. , 0. , 0. , 0. , 0.8 ,
0. , 0. , 0. , 0.2 , 0. ],
[0. , 0. , 0. , 0. , 0. ,
0.1 , 0. , 0.1 , 0. , 0.8 ],
[0. , 0. , 0. , 0. , 0. ,
0. , 0.5 , 0. , 0.5 , 0. ]]),
[9, 0, 3, 6, 1, 4, 7, 2, 5, 8])
# 将字典存储的空间权重对象转换成稀疏空间权重对象(WSP类)
w.towsp()
<pysal.weights.weights.WSP at 0x31a1e44dc8>
# 直接从shape文件读入邻接权重矩阵,且读入的空间邻接是queen邻接的
wq = ps.queen_from_shapefile(ps.examples.get_path('NAT.shp'),idVariable='FIPSNO')
wq
<pysal.weights.Contiguity.Queen at 0x319b3e0908>
wq.weights
{27077: [1.0, 1.0, 1.0],
27007: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
...}
wq.transform
'O'
wq.transform='R'
wq.s0 #权重矩阵元素之和
3085.0
wq.towsp()
<pysal.weights.weights.WSP at 0x31a39d39c8>
# 生成空间滞后变量
print("使用之前生成的x1做实验:\n",x1)
vlag = ps.lag_spatial(wq , x1)
vlag
使用之前生成的x1做实验:
[[-0.80277374 3.89479009]
[-0.13548304 16.8115942 ]
[-0.27654392 10.70079408]
...
[-1.33977047 4.1028878 ]
[-1.74736678 3.35398107]
[-0.36114466 5.48855304]]
array([[-0.3140736 , 6.35152828],
[-0.18933189, 8.36436575],
[-0.24911753, 10.81645517],
...,
[-0.11334696, 5.52220645],
[-1.8902103 , 2.58195495],
[-0.60620542, 4.46367014]])
print(x1.shape)
print(vlag.shape)
(3085, 2)
(3085, 2)