Python判断两个单词的相似度

原创

董付国 2023-06-10 04:42:35 ©著作权

文章标签 算法 python java 正则表达式机器学习 文章分类 HarmonyOS 后端开发

©著作权归作者所有：来自51CTO博客作者董付国的原创作品，请联系作者获取转载授权，否则将追究法律责任

本文要点在于算法的设计：如果两个单词中不相同的字母足够少，并且随机选择几个字母在两个单词中具有相同的前后顺序，则认为两个单词是等价的。

目前存在的问题：可能会有误判。

from random import sample, randint
def oneInAnother(one, another):
    '''用来测试单词one中有多少字母不属于单词another'''
    return sum((1 for ch in one if ch not in another))
def testPositions(one, another, positions):
    '''用来测试单词one中位置positions上的字母是否
       与单词another中的相同字母具有同样的前后顺序'''
    #获取单词one中指定位置上的字母
    lettersInOne = [one[p] for p in positions]
    print(lettersInOne)
    #这些字母在单词another中的位置
    positionsInAnother = [another[p:].index(ch)+p for p, ch in zip(positions,lettersInOne) if ch in another[p:]]
    print(positionsInAnother)
    #如果这些字母在单词another中也具有相同的前后位置关系，返回True
    if sorted(positionsInAnother)==positionsInAnother:
        return True
    return False

def main(one, another, rateNumber=1.0):
    c1 = oneInAnother(one, another)
    c2 = oneInAnother(another, one)
    #计算比例，测试两个单词有多少字母不相同
    r = abs(c1-c2) / len(one+another)
    #测试单词one随机位置上的字母是否在another中具有相同的前后顺序
    minLength = min(len(one), len(another))
    positions = sample(range(minLength), randint(minLength//2, minLength-1))
    positions.sort()
    flag = testPositions(one, another, positions)
    #两个单词具有较高相似度
    if flag and r<rateNumber:
        return True
    return False
#测试效果
print(main('beautiful', 'beaut', 0.2))
print(main('beautiful', 'beautiful', 0.2))
print(main('beautiful', 'btuaeiflu', 0.2))

某次运行结果如下：

['a', 'u']
[2, 3]
False
['a', 'u', 'f', 'u']
[2, 3, 6, 7]
True
['b', 'e', 'a', 'u', 't', 'f']
[0, 4, 3, 8, 6]
False

上一篇：使用Python判断文件是否为PE文件

下一篇：时间都去哪儿了之Python程序测试与优化

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯