from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
topic_model = BERTopic()
print('start fit transform...')
topics, probs = topic_model.fit_transform(docs)
print('fit done')
print(topic_model.get_topic_info())
上面fetch_20newsgroups加载需要国外源因此很难下载,需要手动离线加载,加载方法参考文章
测试输出:
start fit transform...
fit done
Topic ... Representative_Docs
0 -1 ... [This is a periodic posting intended to answer...
1 0 ... [I thought I'd post my predicted standings sin...
2 1 ... [\nI am not an expert in the cryptography scie...
3 2 ... [Hello,, Hello,, ites:]
4 3 ... [*********************************************...
.. ... ... ...
227 226 ... [\n\nTrue, coach Matikainen is ready to keep a...
228 227 ... [Archive-name: typing-injury-faq/software\nVer...
229 228 ... [\n\nIn this era of AIDS, isn't someone's fuck...
230 229 ... [Hi, I am doing a term paper on the syringe an...
231 230 ... [\n\n\n\n\nSounds to me like your dealer reall...
[232 rows x 5 columns]