from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic()
print('start fit transform...')
topics, probs = topic_model.fit_transform(docs)
print('fit done')
print(topic_model.get_topic_info())

上面fetch_20newsgroups加载需要国外源因此很难下载,需要手动离线加载,加载方法参考文章

测试输出:

start fit transform...
fit done
     Topic  ...                                Representative_Docs
0       -1  ...  [This is a periodic posting intended to answer...
1        0  ...  [I thought I'd post my predicted standings sin...
2        1  ...  [\nI am not an expert in the cryptography scie...
3        2  ...                            [Hello,, Hello,, ites:]
4        3  ...  [*********************************************...
..     ...  ...                                                ...
227    226  ...  [\n\nTrue, coach Matikainen is ready to keep a...
228    227  ...  [Archive-name: typing-injury-faq/software\nVer...
229    228  ...  [\n\nIn this era of AIDS, isn't someone's fuck...
230    229  ...  [Hi, I am doing a term paper on the syringe an...
231    230  ...  [\n\n\n\n\nSounds to me like your dealer reall...

[232 rows x 5 columns]