Python实现共现矩阵及networkx可视化结果

  • 共现矩阵
  • 代码实现
  • networkx可视化
  • 代码实现
  • 问题记录
  • 参考文章


共现矩阵

共现矩阵:也称为共词矩阵,能表明两个词之间的关系程度

  • 首先假设我们有两句话,如下图所示,通过jieba分词和停用词词表过滤,我们可以得到以下结果:
test = ["E的B的C", "B的C的D"]

python中矩阵正交对角化 矩阵对角化算法python_python中矩阵正交对角化

  • 接着我们可以通过关键词来构建共现矩阵,可以看到,BE同时出现一次,则其权重为1,BC同时出现两次,则其权重为2,以此类推
  • 由此可以看出,共现矩阵是一个对角矩阵。
  • 共现矩阵的[0][0]为空。
  • 共现矩阵的第一行第一列是关键词。
  • 对角线全为0。
  • 共现矩阵其实是一个对称矩阵。

代码实现

# -*- coding: utf-8 -*-
import networkx as nx
import matplotlib.pyplot as plt
import jieba
import numpy as np

test = ["E的B的C", "B的C的D"]

stopwords = [line.strip() for line in
             open('stopwords_unduplicated.txt', encoding='UTF-8').readlines()]  # 停用词词表
cut_text1 = jieba.cut(test[0].replace(' ', ''))
cut_text2 = jieba.cut(test[1].replace(' ', ''))
results1 = []
for word in cut_text1:
    if word not in stopwords:
        if word != '\t':
            results1.append(word)
print("result1 is :", results1)

results2 = []
for word in cut_text2:
    if word not in stopwords:
        if word != '\t':
            results2.append(word)
print("result2 is :", results2)

# 合并列表
result = list(set(results1).union(set(results2)))
print("union result is :", result)
x = len(result)
# 创建二维矩阵
matrix = [[0 for x in range(x+1)] for y in range(x+1)]
weight = 0
for i in range(0, x):
    matrix[0][i+1] = result[i]
for j in range(0, x):
    matrix[j+1][0] = result[j]


# print(np.array(matrix))

for i in range(1, x+1):  # i的范围为 1 到 词数
    for j in range(1, x + 1 - i):  # n的范围为 1到(词数-i)   i+n的范围为 i 到 词数
        word1 = result[i - 1]
        word2 = result[i + j - 1]
        print("In %d iteration, for No.%d word pair:" % (i, j), word1, word2)
        Common_weight = 0

        if word1 in results1 and word1 in results2 and word2 in results1 and word2 in results2:
            # 如果word1和word2同时出现在两个句子中,权重为2
            Common_weight = 2
        elif (word1 in results2) and (word2 in results2):
            Common_weight = 1
        elif (word1 in results1) and (word2 in results1):
            Common_weight = 1
        matrix[i][i + j] = Common_weight    # 该矩阵为对角矩阵
        matrix[i + j][i] = Common_weight
        print("For (%s,  %s), the common_weight is : %d" % (word1, word2, Common_weight))
print("The co-occurrence matrix is:")
# np.array() 将二维数组换行输出
print(np.array(matrix))

networkx可视化

代码实现

# 定义有向图
DG = nx.Graph()
# 添加五个节点(列表)
DG.add_nodes_from(['B', 'C', 'D', 'E'])
print(DG.nodes())
# 添加边(列表)
DG.add_edge('B', 'D', weight=1)
DG.add_edge('B', 'C', weight=2)
DG.add_edge('B', 'E', weight=1)
DG.add_edge('C', 'D', weight=1)
DG.add_edge('C', 'E', weight=1)
DG.add_edge('D', 'E', weight=1)
# DG.add_edges_from([('B', 'C'), ('B', 'D'), ('B', 'E'), ('C','D'),('C','E'),('D','E')])
print("The edges for this graph are: ", DG.edges())
# 绘制图形 设置节点名显示\节点大小\节点颜色
colors = ['red', 'green', 'pink', 'orange']
# 按权重划分为重权值得边和轻权值的边
# 按权重划分为重权值得边和轻权值的边
edge_large = [(u, v) for (u, v, d) in DG.edges(data=True) if d['weight'] > 1.5]
edge_small = [(u, v) for (u, v, d) in DG.edges(data=True) if d['weight'] <= 1.5]
# 节点位置
pos = nx.spring_layout(DG)  # positions for all nodes
# 首先画出节点位置
# nodes
nx.draw_networkx_nodes(DG, pos, node_size=500, node_color=colors)
# 根据权重,实线为权值大的边,虚线为权值小的边
# edges
nx.draw_networkx_edges(DG, pos, edgelist=edge_large,
                       width=6)
nx.draw_networkx_edges(DG, pos, edgelist=edge_small,
                       width=6, alpha=0.5, edge_color='b', style='dashed')

# labels标签定义
nx.draw_networkx_labels(DG, pos, font_size=20, font_family='sans-serif')

plt.axis('off')
plt.savefig('fig.png', bbox_inches='tight')

python中矩阵正交对角化 矩阵对角化算法python_python中矩阵正交对角化_02

问题记录

  1. pycharm画图:warnings.warn("This figure includes Axes that are not
    compatible "
    报错原因在于plt.tight_layout在某些情况下不能顺利工作
    解决方法:删掉plt.show(),加上plt.savefig(‘fig.png’,bbox_inches=‘tight’)

参考文章

python networkx 根据图的权重画图实现