python实现地理数据网格化 python地理数据处理pdf

转载

mob6454cc636c54 2023-10-24 10:32:53

文章标签 python实现地理数据网格化 python 字段数据 ci 文章分类 Python 后端开发

第5章 Python与地理信息系统

shapefile文件对于GIS数据交换和GIS分析来说是一种基础数据格式，详情请见。对于shapefile文件的编辑和其他操作，只需要关注两种类型即可：.shp和.dbf文件。

.shp文件包含几何图形，.dbf文件包含几何图形相关的属性信息。shapefile文件中的每一条几何图像记录，都会有一条对应的dbf记录信息。这些记录并没有编号或者以其他方式标记。这也意味着从shapefile文件中添加或者删除信息时，你必须确保相关的文件中也做了相应的修改。

本次使用PyShp包

5.5Shapefile文件编辑

5.5.1Shapefile文件访问

在用PyShp库打开这个Shapefile文件

import  shapefile
# 创建了一个Shapefile文件读取器对象实例，并且将其赋值给了变量shapefile_data
shapefile_data = shapefile.Reader(r"MSCities_Geo_Pts\MSCities_Geo_Pts.shp")
# 可以获取一些相关的地理空间信息
# 对象获取Shapefile文件的边框、形状类型和记录总数
print("Shapefile文件的边框:" ,shapefile_data.bbox)
# 1代表点，3代表线，5代表多边形
print("Shapefile文件的形状类型：" , shapefile_data.shapeType)
print("Shapefile文件的记录总数：" , shapefile_data.numRecords)
print("Shapefile文件的编码：" , shapefile_data.encoding)
print("Shapefile文件的encodingErrors：" , shapefile_data.encodingErrors)
print("Shapefile文件的字段属性：" , shapefile_data.fields)
print("Shapefile文件的mbox：" , shapefile_data.mbox)
print("Shapefile文件的numShapes：" , shapefile_data.numShapes)
print("Shapefile文件的名称：" , shapefile_data.shapeName)
print("Shapefile文件的shpLength：" , shapefile_data.shpLength)
print("Shapefile文件的shp对象：" , shapefile_data.shp)
print("Shapefile文件的shx对象：" , shapefile_data.shx)
print("Shapefile文件的dbf文件对象：" , shapefile_data.dbf)

运行结果
Shapefile文件的边框: [-91.38804855553174, 30.29314882296931, -88.18631833931401, 34.96091138678437]
Shapefile文件的形状类型： 1
Shapefile文件的记录总数： 298
Shapefile文件的编码： utf-8
Shapefile文件的encodingErrors： strict
Shapefile文件的字段属性： [(‘DeletionFlag’, ‘C’, 1, 0), [‘STATEFP10’, ‘C’, 2, 0], [‘PLACEFP10’, ‘C’, 5, 0], [‘PLACENS10’, ‘C’, 8, 0], [‘GEOID10’, ‘C’, 7, 0], [‘NAME10’, ‘C’, 100, 0], [‘NAMELSAD10’, ‘C’, 100, 0], [‘LSAD10’, ‘C’, 2, 0], [‘CLASSFP10’, ‘C’, 2, 0], [‘PCICBSA10’, ‘C’, 1, 0], [‘PCINECTA10’, ‘C’, 1, 0], [‘MTFCC10’, ‘C’, 5, 0], [‘FUNCSTAT10’, ‘C’, 1, 0], [‘ALAND10’, ‘N’, 14, 0], [‘AWATER10’, ‘N’, 14, 0], [‘INTPTLAT10’, ‘C’, 11, 0], [‘INTPTLON10’, ‘C’, 12, 0]]
Shapefile文件的mbox： [0.0, 0.0]
Shapefile文件的numShapes： None
Shapefile文件的名称： MSCities_Geo_Pts\MSCities_Geo_Pts
Shapefile文件的shpLength： 8444
Shapefile文件的shp对象： <_io.BufferedReader name=‘MSCities_Geo_Pts\MSCities_Geo_Pts.shp’>
Shapefile文件的shx对象： <_io.BufferedReader name=‘MSCities_Geo_Pts\MSCities_Geo_Pts.shx’>
Shapefile文件的dbf文件对象： <_io.BufferedReader name=‘MSCities_Geo_Pts\MSCities_Geo_Pts.dbf’>

5.5.2Shapefile文件属性读取

dbf文件是一种简单的数据库格式，它和行列式的电子表格的结构类似，其中每一列通过标签定义了它包含哪些信息。我们可以通过读取器对象的字段属性查看这些信息：

# 创建了一个Shapefile文件读取器对象实例，并且将其赋值给了变量shapefile_data
shapefile_data = shapefile.Reader(r"MSCities_Geo_Pts\MSCities_Geo_Pts.shp")
# dbf文件是一种简单的数据库格式，它和行列式的电子表格的结构类似，其中每一列通过标
# 签定义了它包含哪些信息。我们可以通过读取器对象的字段属性查看这些信息：
print(shapefile_data.fields)

运行结果
[(‘DeletionFlag’, ‘C’, 1, 0), [‘STATEFP10’, ‘C’, 2, 0], [‘PLACEFP10’, ‘C’, 5, 0], [‘PLACENS10’, ‘C’, 8, 0], [‘GEOID10’, ‘C’, 7, 0], [‘NAME10’, ‘C’, 100, 0], [‘NAMELSAD10’, ‘C’, 100, 0], [‘LSAD10’, ‘C’, 2, 0], [‘CLASSFP10’, ‘C’, 2, 0], [‘PCICBSA10’, ‘C’, 1, 0], [‘PCINECTA10’, ‘C’, 1, 0], [‘MTFCC10’, ‘C’, 5, 0], [‘FUNCSTAT10’, ‘C’, 1, 0], [‘ALAND10’, ‘N’, 14, 0], [‘AWATER10’, ‘N’, 14, 0], [‘INTPTLAT10’, ‘C’, 11, 0], [‘INTPTLON10’, ‘C’, 12, 0]]

字段属性返回了丰富的信息：

**Field name：**字段的文本名称，它由10个以内的任意字符构成。
**Field type：**字段的类型可以是文本、数字、日期、浮点数以及通过字母C、N、D、F和L表达的布尔值。Shapefile文件规范中建议使用的dbf格式是dBASE III，但是目前大部分GIS软件支持dBASE IV。在第四版中，数字型和浮点型是一样的。
**Field length：**以字符数或者数字规定了数据的长度。
**Decimal length：**规定了数字或者浮点数字段的小数位数。

通常第一个字段描述信息“(‘DeletionFlag’, ‘C’, 1, 0)”# 是隐藏的，因为它是作为dfb文件格式规范的一部分而存在的。删除标记允许软件将数据记录标记为已删除状态，但实际上并没有执行删除操作。通过执行上述操作，信息仍然在文件中，但是不会在记录列表或者查询结果中显示。

可以使用Python的列表推导式返回文件中描述信息集合中的第一个元素，并且忽略删除标记

[print(item[0]) for item in shapefile_data.fields[1:]]

运行结果：
STATEFP10
PLACEFP10
PLACENS10
GEOID10
NAME10
NAMELSAD10
LSAD10
CLASSFP10
PCICBSA10
PCINECTA10
MTFCC10
FUNCSTAT10
ALAND10
AWATER10
INTPTLAT10
INTPTLON10

接下来，将分析其中某些字段对应的数据记录，可以使用**shapfiledata.record()**方法获得单个数据记录，此文件298条记录,以第3条记录为例,它的索引值是2

print("shapefile_data.record(2)结果:",shapefile_data.record(2))
# 字段名和实际数据记录是分开存储的,
# 获取其中的记录值，必须使用它的索引值获取它，
# 每条记录对应的城市名的索引值是4
print("shapefile_data.record(2)[4]结果:",shapefile_data.record(2)[4])

shapefile_data.record(2)结果:
Record #2: [‘28’, ‘16620’, ‘02406337’, ‘2816620’, ‘Crosby’, ‘Crosby town’, ‘43’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 5489412, 21336, ‘+31.2742552’, ‘-091.0614840’]
shapefile_data.record(2)[4]结果:
Crosby

推荐使用字段访问方法一:在Python的List中使用index()方法

fieldNames = [item[0] for item in shapefile_data.fields[1:]]
name10 = fieldNames.index("NAME10")
print(name10)
print(shapefile_data.record(2)[name10])

运行结果：
4
Crosby

推荐使用字段访问方法二:使用Python内置的zip方法

使用Python内置的zip方法将字段名和数据记录关联起来，该方法是通过两个或者多个List合并为一个元组List实现的。之后可以通过遍历这些List并根据名字获取相关的数值，相关代码如下：

fieldNames = [item[0] for item in shapefile_data.fields[1:]]
rec = shapefile_data.record(2)

zipRec = zip(fieldNames, rec)
zipReclist = list(zipRec)
print(zipReclist)
for z in zipReclist:
    #print(z)
    if z[0] == "NAME10":
        #z ('NAME10', 'Crosby')
        print(z[1])

Crosby

推荐使用字段访问方法三:enumerate()方法

shapefile_data.records()方法遍历整个dbf文件将遍历根据records()方法返回的记录，不过这些记录使用了Python的数组分片功能只包含3条记录信息。
Shapefile文件中不包含记录编号，所以会先枚举记录列表，然后为其创建一组编号，这样更方便地访问它们了。在接下来的示例中，将使用enumerate()方法，它会返回包含记录索引的一个元组。

for rec in enumerate(shapefile_data.records()[:3]):
    print(rec[0]+1,":",rec[1])

结果
1 : Record #0: [‘28’, ‘59560’, ‘02404554’, ‘2859560’, ‘Port Gibson’, ‘Port Gibson city’, ‘25’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 4550230, 0, ‘+31.9558031’, ‘-090.9834329’]
2 : Record #1: [‘28’, ‘50440’, ‘02404351’, ‘2850440’, ‘Natchez’, ‘Natchez city’, ‘25’, ‘C1’, ‘Y’, ‘N’, ‘G4110’, ‘A’, 34175943, 1691489, ‘+31.5495016’, ‘-091.3887298’]
3 : Record #2: [‘28’, ‘16620’, ‘02406337’, ‘2816620’, ‘Crosby’, ‘Crosby town’, ‘43’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 5489412, 21336, ‘+31.2742552’, ‘-091.0614840’]

推荐使用字段访问方法二:PyShp库

方法四：如果你需要处理一些非常大的Shapefile文件，PyShp库的迭代器方法能够帮助你高效地访问数据。
默认的records()方法会一次性将所有记录读入内存中，这种方式对于小型的dbf文件还好，但是对于那些包含几千条纪录的dbf文件来说，就会变得非常难于管理。
在你使用records()方法的同时，还可以使用r.iterRecords()对其进行替代。
该方法不会一次性读取所有数据，而是根据需要读取一定数量的数据。

在下面的示例中，将使用**iterRecords()**方法来统计数据记录数目从而和头文件中的记录进行比对验证。

counter = 0
for rec in shapefile_data.iterRecords():
    counter += 1
    print("rec",rec)
print(counter)
#298

1 : Record #0: [‘28’, ‘59560’, ‘02404554’, ‘2859560’, ‘Port Gibson’, ‘Port Gibson city’, ‘25’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 4550230, 0, ‘+31.9558031’, ‘-090.9834329’]
2 : Record #1: [‘28’, ‘50440’, ‘02404351’, ‘2850440’, ‘Natchez’, ‘Natchez city’, ‘25’, ‘C1’, ‘Y’, ‘N’, ‘G4110’, ‘A’, 34175943, 1691489, ‘+31.5495016’, ‘-091.3887298’]
3 : Record #2: [‘28’, ‘16620’, ‘02406337’, ‘2816620’, ‘Crosby’, ‘Crosby town’, ‘43’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 5489412, 21336, ‘+31.2742552’, ‘-091.0614840’]
rec Record #0: [‘28’, ‘59560’, ‘02404554’, ‘2859560’, ‘Port Gibson’, ‘Port Gibson city’, ‘25’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 4550230, 0, ‘+31.9558031’, ‘-090.9834329’]
rec Record #1: [‘28’, ‘50440’, ‘02404351’, ‘2850440’, ‘Natchez’, ‘Natchez city’, ‘25’, ‘C1’, ‘Y’, ‘N’, ‘G4110’, ‘A’, 34175943, 1691489, ‘+31.5495016’, ‘-091.3887298’]
rec Record #2: [‘28’, ‘16620’, ‘02406337’, ‘2816620’, ‘Crosby’, ‘Crosby town’, ‘43’, ‘C1’, ‘N’, ‘N’, ‘G4110’, ‘A’, 5489412, 21336, ‘+31.2742552’, ‘-091.0614840’]
…
rec Record #297: [‘28’, ‘29700’, ‘02403771’, ‘2829700’, ‘Gulfport’, ‘Gulfport city’, ‘25’, ‘C1’, ‘Y’, ‘N’, ‘G4110’, ‘A’, 143982054, 21759487, ‘+30.4160583’, ‘-089.0718450’]

5.5.3Shapefile文件几何图形读取

之前已经介绍过根据Shapefile文件的头文件信息可以确定该文件是一个点Shapefile文件。因此，该文件中的每条记录都包含一个点。来看看第一条几何图形记录：

import shapefile

# 创建了一个Shapefile文件读取器对象实例，并且将其赋值给了变量shapefile_data
shapefile_data = shapefile.Reader(r"MSCities_Geo_Pts\MSCities_Geo_Pts.shp")
geom = shapefile_data.shape(0)

# 1代表点，3代表线，5代表多边形
print(geom.shapeType)
print(geom.points)