随着基于位置服务(LBS)、全球定位系统和移动电子设备的快速发展,带有空间位置信息的数据急速增长,产生了大量的空间数据。空间数据挖掘旨在从海量、高维的空间数据中挖掘潜在有用的和有价值的信息。空间并置(co-location)模式挖掘作为空间数据挖掘的一个重要研究方向,在环境保护、城市计算、公共交通等领域具有重要和广泛的应用。空间并置模式是一组空间特征的子集,它们的实例在邻域内频繁并置出现。例如,医院附近往往存在药店和花店,根瘤菌往往长在豆科植物旁等等。
问题描述. 给定空间特征集合\(O\),空间实例集合\(S\),一个实例集\(S\)的模糊邻近关系的隶属度函数\(\mu\),一个邻近度阈值\(\alpha(0≤a≤1)\),一个最小模糊参与度阈值\(min\_fprev\)和影响比阈值\(min\_fir\),挖掘频繁\(co-location\)模式中的所有主导特征及主导特征模式.
挖掘算法.含主导特征的单一邻近度阈值co-location模式挖掘算法(ADFSPTCM).
算法包含如下四个步骤:
- 采用模糊星型模型物化空间实例集。根据给定模糊邻近关系的隶属函数,使用网格划分技术,计算空间数据集的模糊邻近关系,获取满足邻近度阈值的模糊邻近对,并生成模糊星型邻居集。
- 生成2阶频繁\(co-location\)模式。由空间特征集生成2阶候选\(co-location\)模式,从模糊邻居对中生成候选模式的表实例,筛选出模糊参与度不小于模糊参与度阈值的频繁\(co-location\)模式。
- 迭代生成高阶\(co-locaion\)模式。循环执行以下过程:
- 如果\(k-1(k>2)\)阶频繁\(co-location\)模式集不为空,则由\(k-1\)阶频繁\(co-location\)模式互相连接生成\(k\)阶频繁\(co-location\)模式;
- 从模糊星型邻居集中获取\(k\)阶候选模式的模糊星型实例;
- 对于候选\(co-location\)模式的星型实例的团关系进行检验,得到候选\(co-location\)模式的模糊表实例;
- 计算候选模式的模糊参与度,筛选出模糊参与度不小于模糊参与度阈值的\(co-location\)模式。
- 挖掘含有主导特征的co-locaion模式。对于满足模糊参与度阈值的模式,第12行与该模式的\(k-1\)阶子模式集合计算模糊损失度并得到每个特征的特征模糊影响度。第15-19行取出模式中最小特征影响度的特征,并将模式中特征与最小影响度特征\(f_min\)的影响度比值大于特征影响比阈值\(min\_fir\)的特征放入集合\(DF\_set(c)\)。第20-22行如果\(DF\_set(c)\)不为空则加入\(DFCP\)集合中。
算法 ADFSPTCM
输入:
\(O\): 空间特征集,\(S\): 空间实例集,\(m\): 模糊邻近关系的隶属度函数,\(\alpha\): 邻近度阈值
\(min\_fprev\): 最小模糊参与度阈值,\(min\_fir\): 最小特征影响比阈值.
变量:
\(k\): \(co-location\)模式的阶,\(FSN\): 模糊星型邻居集,\(C_k\): \(k\)阶候选\(co-location\)模式,
\(FS_k\): \(k\)阶候选模式的模糊星型实例集,\(FTk\): \(k\)阶候选\(co-location\)模式的模糊表实例,
\(FPR_c\): \(k\)阶\(co-location\)模式\(c\)的模糊参与率集,\(P_k\): \(k\)阶频繁\(co-location\)模式集,
\(P\): 频繁\(co-location\)模式集
输出:
含主导特征的\(co-location\)频繁模式集\(DFCP\_set\)及所有\(DFCP\)的主导特征集
步骤
- \(FNR = get\_fuzzy\_neighbor\_relationship(S, \mu)\); //生成模糊邻近关系
- \(FSN = get\_star\_neighbor(O, S, FNR_a)\); //生成模糊星型邻居集(满足\(\alpha\)成团)
- \(C_2 = gen\_candidate\_co-location(O)\); //二阶候选\(co-location\)模式
- \(FT_2 = get\_star\_instances(C_2, FNR_a)\); //二阶模糊表实例
- \(P_2 = select\_prevalent\_co-locations(C_2, FT_2, min\_fprev)\); //二阶频繁\(co-location\)模式
- \(P = P \cup P_2; k=3\);
- \(while(not\ empty\ P_{k-1})\ do\)
- \(\quad C_k = gen\_candidate\_colocation(P_{k-1});\)
- \(\quad FS_k = get\_star\_instances(C_k, FSN);\)
- \(\quad FT_k = check\_clique\_instance(C_k, FS_k)\); //检查是否成团
- \(\quad P_k = select\_prevalent\_co-location(C_k, FT_k, min\_fprev)\); //生成k阶频繁模式
- \(\quad P= P \cup P_k;\)
- \(\quad k = k+1;\)
- \(\quad for\ each\ c \in C_k\ do\)
- \(\quad \quad if\ calculate\ FPI(c) ≥ min\_fprev\ do\)
- \(\quad \quad \quad for\ each\ p \in P_{k-1}(c)\ and\ FPR_c do\)
- \(\quad \quad \quad \quad FLI(p,c)=calculate\_FLI(FPR(p),FPR(c))\); //计算模式p到模式c的模糊损失度
- \(\quad \quad \quad \quad FII_set(c)←{1-FLI(p,c),c-p}\)
- \(\quad \quad \quad end\ do\)
- \(\quad \quad \quad o_{min}=arg_{min}\{FII\_set(c)\};\)
- \(\quad \quad \quad for\ each\ o_i \in c\ do\)
- \(\quad \quad \quad \quad if\ FIR(o_i,o_min)≥min\_fir\ do\)
- \(\quad \quad \quad \quad \quad DF\_set(c)←o_i\); //将主导特征放入主导特征集中
- \(\quad \quad \quad \quad end\ do\)
- \(\quad \quad \quad \quad if\ DF_set(c) \ne \varnothing \ do\)
- \(\quad \quad \quad \quad \quad DFCP←\{c,DF\_set(c)\}\); //含主导特征的频繁模式集
- \(\quad \quad \quad \quad end\ do\)
- \(\quad \quad \quad end\ do\)
- \(\quad \quad end\ do\)
- \(\quad end\ do\)
- \(end\ do\)
程序实现
主函数
import utils
sum_list = utils.load_data_set(r"05.xlsx")
FNR = utils.get_fuzzy_neighbor_relationship(sum_list)
ET, CNT, FSN, S = utils.get_star_neighbor(sum_list, FNR)
utils.S = S
P = ET
k = 2
FPR_container = []
min_fir = 0.1
DF_set = {}
while (len(P) > 0):
candidate_list = utils.gen_candidate_co_location(P, k) # k阶候选co-location模式
FT = utils.get_star_instances(candidate_list, FSN, k)
P, FPR_list = utils.select_prevalent_co_locations(FT, 0.1, FNR, CNT)
FPR_container.append(FPR_list)
if k > 2:
FLI = {}
for mode in FPR_list.keys():
FII_set = {}
sub_modes = utils.gen_sub_mode(list(mode))
min_FII = 100
for sub_mode in sub_modes:
FLI[(sub_mode, mode)] = utils.calculate_FLI(FPR_container[k - 3][sub_mode], FPR_container[k - 2][mode])
ele = utils.get_element(sub_mode, mode)
FII_set[ele] = 1 - FLI[(sub_mode, mode)]
if FII_set[ele] < min_FII:
min_FII = FII_set[ele]
o_min = min_FII
for c in list(mode):
if utils.FIR_func(FII_set[c], o_min) >= min_fir:
if mode in DF_set.keys():
DF_set[mode].append(c)
else:
DF_set[mode] = [c]
k += 1
print(DF_set)
utils.py
import re
import xlrd
import numpy
d1 = 500
d2 = 1500
alpha = 0.9
min_fprev = 0.9
S = []
def load_data_set(path):
"""加载数据集
:param path: 数据集全路径
:return: dataSet
"""
sum_list = []
temp_list = []
workbook = xlrd.open_workbook(path)
worksheet = workbook.sheet_by_name(u'Sheet1')
num_rows = worksheet.nrows
num_cols = worksheet.ncols
for i in range(num_rows):
for j in range(num_cols):
num = worksheet.cell_value(i, j)
temp_list.append(num)
sum_list.append(temp_list)
temp_list = []
return sum_list
def cal_distance(x1, y1, x2, y2):
"""计算两点之间的距离
:param x1: x1坐标
:param y1: y1坐标
:param x2: x2坐标
:param y2: y2坐标
:return: distance between (x1,y1) and (x2,y2)
"""
v1 = [x1, y1]
v2 = [x2, y2]
v1 = numpy.array(v1)
v2 = numpy.array(v2)
dist = numpy.sqrt(numpy.sum(numpy.square(v1 - v2)))
return dist
def mu_func(d: float, d1: float, d2: float):
"""
:param d:
:param d1:
:param d2:
:return:
"""
if d <= d1:
return 1
elif d1 < d <= d2:
return 1 - (d - d1) / (d2 - d1)
else:
return 0
def get_fuzzy_neighbor_relationship(sum_list):
fnr = {}
for i in range(len(sum_list)):
feature1 = sum_list[i][1] # 提取出英文
id1 = str(int(sum_list[i][0])) # 提取出数字
for j in range(i + 1, len(sum_list)):
feature2 = sum_list[j][1] # 提取出英文
id2 = str(int(sum_list[j][0])) # 提取出数字
if (feature1 != feature2): # 如果特征不相同
distance = cal_distance(sum_list[i][2], sum_list[i][3], sum_list[j][2], sum_list[j][3])
a = mu_func(distance, d1, d2)
p1 = feature1 + id1
p2 = feature2 + id2
if a > alpha:
fnr[p1, p2] = a
return fnr
def get_star_neighbor(sum_list, fnr: dict):
ET = []
CNT = {}
SN = {}
S = {}
for i in range(len(sum_list)):
feature1 = sum_list[i][1] # 提取出英文
id1 = str(int(sum_list[i][0])) # 提取出数字
# print('#####################################', feature + id)
if feature1 not in ET:
CNT[feature1] = 1 # 计算特征的实例数
ET.append(feature1)
S[feature1] = [id1]
else:
CNT[feature1] += 1
S[feature1].append(id1)
for j in range(i + 1, len(sum_list)):
feature2 = sum_list[j][1] # 提取出英文
id2 = str(int(sum_list[j][0])) # 提取出数字
p1 = feature1 + id1
p2 = feature2 + id2
if feature1 != feature2: # 如果特征不相同
if (p1, p2) in fnr.keys():
if p1 not in SN:
SN[p1] = [p2]
else:
SN[p1].append(p2)
SN[p1].sort()
else:
continue
ET.sort()
return ET, CNT, SN, S
def subset(featureSet, P):
for i in featureSet:
rlist = []
for j in featureSet:
if j != i:
rlist.append(j)
if rlist not in P:
return 0
return 1
def gen_candidate_co_location(P, k): # 生成候选集,传入上一阶频繁
candidate_list = []
if (k == 2):
for i in range(len(P)):
for j in range(i + 1, len(P)):
featureSet = []
featureSet.append(P[i])
featureSet.append(P[j])
featureSet.sort()
candidate_list.append(featureSet)
else:
for i in range(len(P)):
for j in range(i + 1, len(P)):
T1 = P[i][:k - 2] # P[i][0]取出频繁特征集
T2 = P[j][:k - 2]
if T1 == T2:
featureSet = list(set(P[i]).union(set(P[j])))
featureSet.sort()
if subset(featureSet, P):
candidate_list.append(featureSet) # 查看子集是否频繁
return candidate_list
def get_star_instances(candidate_list, fsn, k):
star_instances = []
for mode in candidate_list: # 循环提出候选模式
# get_FT(candidate,fsn)
FT = FeatureID(fsn, S, mode) # 找出所有满足模式为mode的模糊表实例FT
if len(FT) != 0:
star_instances.append((mode, FT))
return star_instances
def FeatureID(FSN, S, item):
table = [] # 用来存放所有模式行实例的所有
for id in S[item[0]]: # 循环候选模式第一个特征的实例号,如B1,B2,B3
ins = item[0] + id # 形成B1,B2,B3
if ins in FSN: # 如果该实例有星型邻居
star = FSN[ins]
l = [ins] # L用来存储该模式下分别特征的实例号集合(例如以B1开头的所有)
dfs_res = []
dfs_item(star, item, 1, l, dfs_res, FSN)
else:
continue
if (len(dfs_res) != 0):
table.extend(dfs_res)
return table
def dfs_item(star, item, x, inst: list, dfs_res, FSN):
if x == len(item):
if check_clique_instance(inst, FSN):
dfs_res.append(inst)
return
for instance in star: # 循环查找星型邻居中该特征的实例号
feature = ''.join(re.findall(r'[A-Za-z]', instance)) # 提取出英文
if (feature == item[x]):
inst.append(instance)
dfs_item(star, item, x + 1, inst.copy(), dfs_res, FSN)
inst = inst[:-1]
return
def check_clique_instance(instance, FSN: dict):
if len(instance) < 3:
return True
for i in range(len(instance) - 1):
if instance[i] in FSN.keys():
neighbors = FSN[instance[i]]
else:
return False
instance_neighbors = instance[i + 1:]
if not set(instance_neighbors) < set(neighbors):
return False
return True
def select_prevalent_co_locations(FT, min_fprev, FNR, CNT):
P = []
FPR_list = {}
for mode, instances in FT:
flag = True
FPR = {}
for i in range(len(mode)):
FPR[mode[i]] = 0
for instance in instances:
p1 = i
sum = 0
for j in range(len(mode) - 1):
p2 = (i + j + 1) % len(mode)
sum += FNR[instance[min(p1, p2)], instance[max(p1, p2)]]
sum = sum / (len(mode) - 1)
FPR[mode[i]] += sum
fprev = FPR[mode[i]] / CNT[mode[i]]
if fprev < min_fprev:
flag = False
FPR[mode[i]] = fprev
if flag:
P.append(mode)
FPR_list[tuple(mode)] = FPR
return P, FPR_list
def calculate_FLI(sub_mode_FPR:dict, mode_FPR:dict):
min = 100
for key in sub_mode_FPR.keys():
FLR = sub_mode_FPR[key]-mode_FPR[key]
if FLR<min:
min = FLR
return min
def gen_sub_mode(mode: list):
res = []
for i in range(len(mode)):
temp = mode.copy()
del temp[i]
res.append(tuple(temp))
res.sort()
return res
def get_element(a:tuple,b:tuple):
for i in range(len(b)):
if b[i] in a:
continue
else:
return b[i]
def FIR_func(i:float,j:float):
return 1-(j/i)
参考文献
[1] 冯时,王丽珍,方圆. 基于模糊邻近关系挖掘含主导特征的空间并置模式.Computer Science and Application Vol.11 No. 01 ( 2021 ), Article ID: 40109 , 19 pages 10.12677/CSA.2021.111019