Python中的项目频率计数

假设我有一个单词列表,并且我想查找每个单词出现在该列表中的次数。

一个明显的方法是:

words = "apple banana apple strawberry banana lemon"
uniques = set(words.split())
freqs = [(item, words.split().count(item)) for item in uniques]
print(freqs)

但是我发现这段代码不是很好,因为该程序两次在单词列表中运行一次,一次是建立集合,第二次是对出现次数进行计数。

当然,我可以编写一个函数来遍历列表并进行计数,但是那不是Pythonic。 那么,有没有更有效和Pythonic的方式呢?

11个解决方案

129 votes

collections模块中的collections类专门用于解决此类问题:

from collections import Counter
words = "apple banana apple strawberry banana lemon"
Counter(words.split())
# Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
sykora answered 2020-01-01T04:35:43Z
93 votes
defaultdict救援!
from collections import defaultdict
words = "apple banana apple strawberry banana lemon"
d = defaultdict(int)
for word in words.split():
d[word] += 1
这在O(n)中运行。
Triptych answered 2020-01-01T04:35:23Z
11 votes
标准方法:
from collections import defaultdict
words = "apple banana apple strawberry banana lemon"
words = words.split()
result = defaultdict(int)
for word in words:
result[word] += 1
print result
Groupby oneliner:
from itertools import groupby
words = "apple banana apple strawberry banana lemon"
words = words.split()
result = dict((key, len(list(group))) for key, group in groupby(sorted(words)))
print result
nosklo answered 2020-01-01T04:36:07Z
9 votes
freqs = {}
for word in words:
freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

我认为这与Triptych的解决方案相同,但不导入集合。 也有点像Selinap的解决方案,但更具可读性。 与Thomas Weigel的解决方案几乎相同,但未使用Exception。

但是,这可能比使用collections库中的defaultdict()慢。 由于获取了该值,因此增加并重新分配。 而不是只是增加。 但是,使用+ =可能在内部做同样的事情。

hopla answered 2020-01-01T04:36:32Z
7 votes

如果您不想使用标准的字典方法(在列表中循环以增加正确的字典键),则可以尝试以下操作:

>>> from itertools import groupby
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon']
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))]
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

它以O(n log n)时间运行。

Nick Presta answered 2020-01-01T04:36:57Z
3 votes
没有defaultdict:
words = "apple banana apple strawberry banana lemon"
my_count = {}
for word in words.split():
try: my_count[word] += 1
except KeyError: my_count[word] = 1
Thomas Weigel answered 2020-01-01T04:37:17Z
0 votes

你不能只使用计数吗?

words = 'the quick brown fox jumps over the lazy gray dog'
words.count('z')
#output: 1
Antonio answered 2020-01-01T04:37:36Z
0 votes

我碰巧正在做一些Spark练习,这是我的解决方案。

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']
print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**以上输出的**

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

javaidiot answered 2020-01-01T04:38:01Z

0 votes

使用reduce()将列表转换为单个字典。

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

退货

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}
Gadi answered 2020-01-01T04:38:25Z
0 votes
words = "apple banana apple strawberry banana lemon"
w=words.split()
e=list(set(w))
for i in e:
print(w.count(i)) #Prints frequency of every word in the list

希望这可以帮助!

Varun Shaandhesh answered 2020-01-01T04:38:45Z
-1 votes

下面的答案需要一些额外的周期,但这是另一种方法

def func(tup):
return tup[-1]
def print_words(filename):
f = open("small.txt",'r')
whole_content = (f.read()).lower()
print whole_content
list_content = whole_content.split()
dict = {}
for one_word in list_content:
dict[one_word] = 0
for one_word in list_content:
dict[one_word] += 1
print dict.items()
print sorted(dict.items(),key=func)
Prabhu S answered 2020-01-01T04:39:05Z