随着基于位置服务(LBS)、全球定位系统和移动电子设备的快速发展,带有空间位置信息的数据急速增长,产生了大量的空间数据。空间数据挖掘旨在从海量、高维的空间数据中挖掘潜在有用的和有价值的信息。空间并置(co-location)模式挖掘作为空间数据挖掘的一个重要研究方向,在环境保护、城市计算、公共交通等领域具有重要和广泛的应用。空间并置模式是一组空间特征的子集,它们的实例在邻域内频繁并置出现。例如,医院附近往往存在药店和花店,根瘤菌往往长在豆科植物旁等等。

问题描述. 给定空间特征集合\(O\),空间实例集合\(S\),一个实例集\(S\)的模糊邻近关系的隶属度函数\(\mu\),一个邻近度阈值\(\alpha(0≤a≤1)\),一个最小模糊参与度阈值\(min\_fprev\)和影响比阈值\(min\_fir\),挖掘频繁\(co-location\)模式中的所有主导特征及主导特征模式.
挖掘算法.含主导特征的单一邻近度阈值co-location模式挖掘算法(ADFSPTCM).
算法包含如下四个步骤:

  1. 采用模糊星型模型物化空间实例集。根据给定模糊邻近关系的隶属函数,使用网格划分技术,计算空间数据集的模糊邻近关系,获取满足邻近度阈值的模糊邻近对,并生成模糊星型邻居集。
  2. 生成2阶频繁\(co-location\)模式。由空间特征集生成2阶候选\(co-location\)模式,从模糊邻居对中生成候选模式的表实例,筛选出模糊参与度不小于模糊参与度阈值的频繁\(co-location\)模式。
  3. 迭代生成高阶\(co-locaion\)模式。循环执行以下过程:
  1. 如果\(k-1(k>2)\)阶频繁\(co-location\)模式集不为空,则由\(k-1\)阶频繁\(co-location\)模式互相连接生成\(k\)阶频繁\(co-location\)模式;
  2. 从模糊星型邻居集中获取\(k\)阶候选模式的模糊星型实例;
  3. 对于候选\(co-location\)模式的星型实例的团关系进行检验,得到候选\(co-location\)模式的模糊表实例;
  4. 计算候选模式的模糊参与度,筛选出模糊参与度不小于模糊参与度阈值的\(co-location\)模式。
  1. 挖掘含有主导特征的co-locaion模式。对于满足模糊参与度阈值的模式,第12行与该模式的\(k-1\)阶子模式集合计算模糊损失度并得到每个特征的特征模糊影响度。第15-19行取出模式中最小特征影响度的特征,并将模式中特征与最小影响度特征\(f_min\)的影响度比值大于特征影响比阈值\(min\_fir\)的特征放入集合\(DF\_set(c)\)。第20-22行如果\(DF\_set(c)\)不为空则加入\(DFCP\)集合中。

算法 ADFSPTCM

输入:
    \(O\): 空间特征集,\(S\): 空间实例集,\(m\): 模糊邻近关系的隶属度函数,\(\alpha\): 邻近度阈值
    \(min\_fprev\): 最小模糊参与度阈值,\(min\_fir\): 最小特征影响比阈值.
变量:
    \(k\): \(co-location\)模式的阶,\(FSN\): 模糊星型邻居集,\(C_k\): \(k\)阶候选\(co-location\)模式,
    \(FS_k\): \(k\)阶候选模式的模糊星型实例集,\(FTk\): \(k\)阶候选\(co-location\)模式的模糊表实例,
    \(FPR_c\): \(k\)阶\(co-location\)模式\(c\)的模糊参与率集,\(P_k\): \(k\)阶频繁\(co-location\)模式集,
    \(P\): 频繁\(co-location\)模式集
输出:
    含主导特征的\(co-location\)频繁模式集\(DFCP\_set\)及所有\(DFCP\)的主导特征集
步骤

  1. \(FNR = get\_fuzzy\_neighbor\_relationship(S, \mu)\); //生成模糊邻近关系
  2. \(FSN = get\_star\_neighbor(O, S, FNR_a)\); //生成模糊星型邻居集(满足\(\alpha\)成团)
  3. \(C_2 = gen\_candidate\_co-location(O)\); //二阶候选\(co-location\)模式
  4. \(FT_2 = get\_star\_instances(C_2, FNR_a)\); //二阶模糊表实例
  5. \(P_2 = select\_prevalent\_co-locations(C_2, FT_2, min\_fprev)\); //二阶频繁\(co-location\)模式
  6. \(P = P \cup P_2; k=3\);
  7. \(while(not\ empty\ P_{k-1})\ do\)
  8. \(\quad C_k = gen\_candidate\_colocation(P_{k-1});\)
  9. \(\quad FS_k = get\_star\_instances(C_k, FSN);\)
  10. \(\quad FT_k = check\_clique\_instance(C_k, FS_k)\); //检查是否成团
  11. \(\quad P_k = select\_prevalent\_co-location(C_k, FT_k, min\_fprev)\); //生成k阶频繁模式
  12. \(\quad P= P \cup P_k;\)
  13. \(\quad k = k+1;\)
  14. \(\quad for\ each\ c \in C_k\ do\)
  15. \(\quad \quad if\ calculate\ FPI(c) ≥ min\_fprev\ do\)
  16. \(\quad \quad \quad for\ each\ p \in P_{k-1}(c)\ and\ FPR_c do\)
  17. \(\quad \quad \quad \quad FLI(p,c)=calculate\_FLI(FPR(p),FPR(c))\); //计算模式p到模式c的模糊损失度
  18. \(\quad \quad \quad \quad FII_set(c)←{1-FLI(p,c),c-p}\)
  19. \(\quad \quad \quad end\ do\)
  20. \(\quad \quad \quad o_{min}=arg_{min}\{FII\_set(c)\};\)
  21. \(\quad \quad \quad for\ each\ o_i \in c\ do\)
  22. \(\quad \quad \quad \quad if\ FIR(o_i,o_min)≥min\_fir\ do\)
  23. \(\quad \quad \quad \quad \quad DF\_set(c)←o_i\); //将主导特征放入主导特征集中
  24. \(\quad \quad \quad \quad end\ do\)
  25. \(\quad \quad \quad \quad if\ DF_set(c) \ne \varnothing \ do\)
  26. \(\quad \quad \quad \quad \quad DFCP←\{c,DF\_set(c)\}\); //含主导特征的频繁模式集
  27. \(\quad \quad \quad \quad end\ do\)
  28. \(\quad \quad \quad end\ do\)
  29. \(\quad \quad end\ do\)
  30. \(\quad end\ do\)
  31. \(end\ do\)

程序实现

主函数

import utils

sum_list = utils.load_data_set(r"05.xlsx")

FNR = utils.get_fuzzy_neighbor_relationship(sum_list)

ET, CNT, FSN, S = utils.get_star_neighbor(sum_list, FNR)
utils.S = S

P = ET
k = 2
FPR_container = []
min_fir = 0.1
DF_set = {}
while (len(P) > 0):
    candidate_list = utils.gen_candidate_co_location(P, k)  # k阶候选co-location模式
    FT = utils.get_star_instances(candidate_list, FSN, k)
    P, FPR_list = utils.select_prevalent_co_locations(FT, 0.1, FNR, CNT)
    FPR_container.append(FPR_list)
    if k > 2:
        FLI = {}
        for mode in FPR_list.keys():
            FII_set = {}
            sub_modes = utils.gen_sub_mode(list(mode))
            min_FII = 100
            for sub_mode in sub_modes:
                FLI[(sub_mode, mode)] = utils.calculate_FLI(FPR_container[k - 3][sub_mode], FPR_container[k - 2][mode])
                ele = utils.get_element(sub_mode, mode)
                FII_set[ele] = 1 - FLI[(sub_mode, mode)]
                if FII_set[ele] < min_FII:
                    min_FII = FII_set[ele]
            o_min = min_FII
            for c in list(mode):
                if utils.FIR_func(FII_set[c], o_min) >= min_fir:
                    if mode in DF_set.keys():
                        DF_set[mode].append(c)
                    else:
                        DF_set[mode] = [c]

    k += 1

print(DF_set)

utils.py

import re
import xlrd
import numpy

d1 = 500
d2 = 1500
alpha = 0.9
min_fprev = 0.9
S = []


def load_data_set(path):
    """加载数据集

    :param path: 数据集全路径
    :return: dataSet
    """
    sum_list = []
    temp_list = []
    workbook = xlrd.open_workbook(path)
    worksheet = workbook.sheet_by_name(u'Sheet1')
    num_rows = worksheet.nrows
    num_cols = worksheet.ncols
    for i in range(num_rows):
        for j in range(num_cols):
            num = worksheet.cell_value(i, j)
            temp_list.append(num)
        sum_list.append(temp_list)
        temp_list = []
    return sum_list


def cal_distance(x1, y1, x2, y2):
    """计算两点之间的距离

    :param x1: x1坐标
    :param y1: y1坐标
    :param x2: x2坐标
    :param y2: y2坐标
    :return: distance between (x1,y1) and (x2,y2)
    """
    v1 = [x1, y1]
    v2 = [x2, y2]
    v1 = numpy.array(v1)
    v2 = numpy.array(v2)
    dist = numpy.sqrt(numpy.sum(numpy.square(v1 - v2)))
    return dist


def mu_func(d: float, d1: float, d2: float):
    """
    
    :param d:
    :param d1:
    :param d2:
    :return:
    """
    if d <= d1:
        return 1
    elif d1 < d <= d2:
        return 1 - (d - d1) / (d2 - d1)
    else:
        return 0


def get_fuzzy_neighbor_relationship(sum_list):
    fnr = {}
    for i in range(len(sum_list)):
        feature1 = sum_list[i][1]  # 提取出英文
        id1 = str(int(sum_list[i][0]))  # 提取出数字
        for j in range(i + 1, len(sum_list)):
            feature2 = sum_list[j][1]  # 提取出英文
            id2 = str(int(sum_list[j][0]))  # 提取出数字
            if (feature1 != feature2):  # 如果特征不相同
                distance = cal_distance(sum_list[i][2], sum_list[i][3], sum_list[j][2], sum_list[j][3])
                a = mu_func(distance, d1, d2)
                p1 = feature1 + id1
                p2 = feature2 + id2
                if a > alpha:
                    fnr[p1, p2] = a
    return fnr


def get_star_neighbor(sum_list, fnr: dict):
    ET = []
    CNT = {}
    SN = {}
    S = {}
    for i in range(len(sum_list)):
        feature1 = sum_list[i][1]  # 提取出英文
        id1 = str(int(sum_list[i][0]))  # 提取出数字
        # print('#####################################', feature + id)
        if feature1 not in ET:
            CNT[feature1] = 1  # 计算特征的实例数
            ET.append(feature1)
            S[feature1] = [id1]
        else:
            CNT[feature1] += 1
            S[feature1].append(id1)
        for j in range(i + 1, len(sum_list)):
            feature2 = sum_list[j][1]  # 提取出英文
            id2 = str(int(sum_list[j][0]))  # 提取出数字
            p1 = feature1 + id1
            p2 = feature2 + id2
            if feature1 != feature2:  # 如果特征不相同
                if (p1, p2) in fnr.keys():
                    if p1 not in SN:
                        SN[p1] = [p2]
                    else:
                        SN[p1].append(p2)
                        SN[p1].sort()
            else:
                continue
    ET.sort()
    return ET, CNT, SN, S


def subset(featureSet, P):
    for i in featureSet:
        rlist = []
        for j in featureSet:
            if j != i:
                rlist.append(j)
        if rlist not in P:
            return 0
    return 1


def gen_candidate_co_location(P, k):  # 生成候选集,传入上一阶频繁
    candidate_list = []
    if (k == 2):
        for i in range(len(P)):
            for j in range(i + 1, len(P)):
                featureSet = []
                featureSet.append(P[i])
                featureSet.append(P[j])
                featureSet.sort()
                candidate_list.append(featureSet)
    else:
        for i in range(len(P)):
            for j in range(i + 1, len(P)):
                T1 = P[i][:k - 2]  # P[i][0]取出频繁特征集
                T2 = P[j][:k - 2]
                if T1 == T2:
                    featureSet = list(set(P[i]).union(set(P[j])))
                    featureSet.sort()
                    if subset(featureSet, P):
                        candidate_list.append(featureSet)  # 查看子集是否频繁

    return candidate_list


def get_star_instances(candidate_list, fsn, k):
    star_instances = []
    for mode in candidate_list:  # 循环提出候选模式
        # get_FT(candidate,fsn)
        FT = FeatureID(fsn, S, mode)  # 找出所有满足模式为mode的模糊表实例FT
        if len(FT) != 0:
            star_instances.append((mode, FT))

    return star_instances


def FeatureID(FSN, S, item):
    table = []  # 用来存放所有模式行实例的所有
    for id in S[item[0]]:  # 循环候选模式第一个特征的实例号,如B1,B2,B3
        ins = item[0] + id  # 形成B1,B2,B3
        if ins in FSN:  # 如果该实例有星型邻居
            star = FSN[ins]
            l = [ins]  # L用来存储该模式下分别特征的实例号集合(例如以B1开头的所有)
            dfs_res = []
            dfs_item(star, item, 1, l, dfs_res, FSN)
        else:
            continue
        if (len(dfs_res) != 0):
            table.extend(dfs_res)

    return table


def dfs_item(star, item, x, inst: list, dfs_res, FSN):
    if x == len(item):
        if check_clique_instance(inst, FSN):
            dfs_res.append(inst)
        return
    for instance in star:  # 循环查找星型邻居中该特征的实例号
        feature = ''.join(re.findall(r'[A-Za-z]', instance))  # 提取出英文
        if (feature == item[x]):
            inst.append(instance)
            dfs_item(star, item, x + 1, inst.copy(), dfs_res, FSN)
            inst = inst[:-1]
    return


def check_clique_instance(instance, FSN: dict):
    if len(instance) < 3:
        return True
    for i in range(len(instance) - 1):
        if instance[i] in FSN.keys():
            neighbors = FSN[instance[i]]
        else:
            return False
        instance_neighbors = instance[i + 1:]
        if not set(instance_neighbors) < set(neighbors):
            return False

    return True


def select_prevalent_co_locations(FT, min_fprev, FNR, CNT):
    P = []
    FPR_list = {}
    for mode, instances in FT:
        flag = True
        FPR = {}
        for i in range(len(mode)):
            FPR[mode[i]] = 0
            for instance in instances:
                p1 = i
                sum = 0
                for j in range(len(mode) - 1):
                    p2 = (i + j + 1) % len(mode)
                    sum += FNR[instance[min(p1, p2)], instance[max(p1, p2)]]
                sum = sum / (len(mode) - 1)
                FPR[mode[i]] += sum
            fprev = FPR[mode[i]] / CNT[mode[i]]
            if fprev < min_fprev:
                flag = False
            FPR[mode[i]] = fprev
        if flag:
            P.append(mode)
            FPR_list[tuple(mode)] = FPR
    return P, FPR_list


def calculate_FLI(sub_mode_FPR:dict, mode_FPR:dict):
    min = 100
    for key in sub_mode_FPR.keys():
        FLR = sub_mode_FPR[key]-mode_FPR[key]
        if FLR<min:
            min = FLR
    return min


def gen_sub_mode(mode: list):
    res = []
    for i in range(len(mode)):
        temp = mode.copy()
        del temp[i]
        res.append(tuple(temp))
    res.sort()
    return res

def get_element(a:tuple,b:tuple):
    for i in range(len(b)):
        if b[i] in a:
            continue
        else:
            return b[i]

def FIR_func(i:float,j:float):
    return 1-(j/i)

参考文献

[1] 冯时,王丽珍,方圆. 基于模糊邻近关系挖掘含主导特征的空间并置模式.Computer Science and Application Vol.11 No. 01 ( 2021 ), Article ID: 40109 , 19 pages 10.12677/CSA.2021.111019