开始学习python,习题需要统计单词个数和频次。百度找到的代码好像都有问题。自己写了一个,调试通过。

环境:python: 3.9.1 64bit ;  pycharm: 2020.2  电脑 win10  64

测试文章:70篇短文突破中考英语词汇

思路如下:

1. 打开文件,将所有字符读入list:s

2. 使用正则表达式,以非英文字符为间隔符,切片,生成一个以单词为基本元素的list

3.由于原先非字母可能连续,故生成的list可能存在空单词,所以需要去除空串

4.全部转成小写,并排序。

5. 将单词顺次存入dict,如果单词存在,则个数加一,如果不存在,则将此单词存入dict,个数设置为1

6. 输出

代码和注释如下:

import re  #re模块主要功能是通过正则表达式是用来匹配处理字符串
def main(fileName):
    try:
        inf= open(fileName,'r')
        s = inf.read()
        words =re.split(r'[^a-zA-Z]',s)         #以非英文字符为间隔生成list
        realWords0= list(filter(None,words))    #去除空串
        realWords1 =[]
        for word in realWords0:
            realWords1.append(word.lower())     #均转换为小写
        realWords1.sort()
        print("word NO: ",len(realWords1))
        dict1 =dict()
        for word in realWords1:
            if(word in dict1): dict1[word] =dict1[word]+1
            else: dict1[word]=1
        for item in dict1.items():
            #print(item)
            print(item[0],item[1])
    except IOError:
        exit("That file couldn't be opened.")
    return 1

main("word.txt")

把从网络上下载的初中阅读贯通词汇粘贴到txt里面,并命名为word.txt,测试发现不能打开,研究发现,需要指定文件编码. 将打开文件的代码加入文件编码信息

inf= open(fileName,'r',encoding='utf-8')

重新运行程序, 系统输出如下:

.......

younger 2
your 50
zarina 1
zebra 2
zoo 5
zoological 1

word NO:  2268

系统显示单词为2268个, 基本和初中单词2200的范围误差不大. 

 

附录: 70篇短文突破中考英语词汇文章样例. 

 

l. A Young Officer and an Old Soldier (1)
A very new, young officer was at a railway station He was going to visit his mother, and he wanted to telephone her to tell her the time of his train. He looked in all his pockets, but found that he did not have the coins for the telephone, so he went outside and looked around for someone to help him.
1.年轻军官与老兵(1)
一位新上任的(new)年轻军官(young officer)在火车站(railway station)候车。他要去看望(visit)他的母亲(mother)。他想打电话(want to telephone sb.)告诉(tell)母亲他的列车(train)到站的时间。但寻遍了所有的口袋(pocket),却发现(find)他没有打电话用的硬币(coin),于是他走到车站外面(outside),环顾四周(look around)想找人帮忙(help)。                 
1. A Young Officer and an Old Soldier (2)
 At last an old soldier came by, and the young officer stopped him and said,“Have you got change for ten pence?”
“Wait a moment,”the old soldier answered, beginning to put his hand in his pocket.“I'll see whether I can help you.”
“Don't you know how to speak to an officer?”the young man said angrily.“Now let's start again Have you got change for ten pence?”
“No, sir,”the old soldier answered quickly.
1.年轻军官与老兵(2)
最后(at last)有名老兵(old soldier)路过,年轻的军官拦住他道:“你有十便士(pence)的零钱(change)吗?”
“等会儿(wait a moment)。”老兵回答(answer),开始(begin)把手放(put)进口袋,“让我看看是否(whether)能帮助你。难道你不知道(know)该怎样跟一位长官说话(speak)吗?”年轻人生气地(angrily)说,“现在我们重新开始(start again),你有十美分的硬币吗?”
“没有,长官(sir)。”老兵迅速(quickly)答道。