问题:给定一个文本文件,找出以d为结尾的所有单词(无重复)?


文本内容1.txt,所在目录为:c:/users/15011/desktop,内容为:

Sliced noodles are known as one of the five famous noodles in China. There is an old story about sliced noodles. In the Yuan Dynasty, they confiscated all the metal from every family, and required 10 households to use a kitchen knife. One day an old man went to get a knife, but the knife was taken away by others. The whole family waited for the knife to cut noodles, but the knife didn’t come back. Suddenly, she remembered the iron sheet in her hand and said: cut noodles with this iron sheet! But the iron sheet is too thin and soft to cut the noodles. So she put the dough on a board, left hand holding board, right hand holding iron, standing at the edge of the pot to hack noodles, One by one the noodles fell into the pot , cooked and fished into the bowl, poured soup for the family to eat. That’s why sliced noodles are created.


程序:

import os, re

os.chdir(r'c:/users/15011/desktop') # 将目录切换到文件1.txt的所在的目录

with open('1.txt','r',encoding='utf8') as file_project: # 打开文件
txt = file_project.read() # 读取文件对象,将内容返回作为一个字符串保存到txt变量中
a = txt.split() # 将字符串按空格(空字符)分割成一个个单词字符串,然后构成一个列表存储到变量a中
new_words = [] # 创建一个空列表,用于存放最终结果
for item in a:
tmps = re.findall('[a-zA-Z]+', item) # 因为在txt.split()的结果中存在一个句子的单词与其他符号结合的情况,如'asd@#good.',所以需要去除这些符号,留下真正只由字母组成的单词
for tmp in tmps:
try:
t = re.search('[a-zA-Z]*d$', tmp).group() # 看单词是否以d结尾
new_words.append(t)
except AttributeError: # 跳过匹配失败的单词
pass

print(sorted(set(new_words)))

运行结果

['Sliced', 'and', 'board', 'confiscated', 'cooked', 'created', 'fished', 'hand', 'old', 'poured', 'remembered', 'required', 'said', 'sliced', 'waited']

END