Transformers学习笔记3. HuggingFace管道函数Pipeline

原创

编程圈子 2022-12-24 07:29:03 ©著作权

文章标签 学习情感分析文本分类翻译器 文章分类 虚拟化云计算

©著作权归作者所有：来自51CTO博客作者编程圈子的原创作品，请联系作者获取转载授权，否则将追究法律责任

Transformers学习笔记3. HuggingFace管道函数Pipeline

一、简介
二、一些管道模型示例

1. 情感分析
2. 零样本文本分类
3. 实体名称识别
4. 摘要
5. 文本生成
6. GPT2英文文本生成
7. 遮挡字还原
8. 问答
9. 翻译

一、简介

Hugging face提供了管道函数——Pipeline，可以使用极少的代码快速开启一个NLP任务。

Pipeline 具备了数据预处理、模型处理、模型输出后处理等步骤，可以直接输入原始数据，然后给出预测结果，十分方便。

给定一个任务之后，pipeline会自动调用一个预训练好的模型，然后根据你给的输入执行下面三个步骤：

预处理输入文本，让它可被模型读取
模型处理
模型输出的后处理，让预测结果可读

虽然Pipeline使用很简单，但对于专业人士缺乏灵活性。

当前在下面网址查到当前有效的Pipeline：
https://huggingface.co/docs/transformers/main_classes/pipelines

本文介绍其中一些管道模型的使用。

二、一些管道模型示例

1. 情感分析

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I am happy.")

输出：

[{'label': 'POSITIVE', 'score': 0.9998760223388672}]

也可以传列表作为参数。

2. 零样本文本分类

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    ["This is a course about the Transformers library",
        "New policy mix to propel turnaround in China's economy"],
    candidate_labels=["education", "politics", "business"],
)

[{'sequence': 'This is a course about the Transformers library', 'labels': ['education', 'business', 'politics'], 'scores': [0.8445969820022583, 0.11197575181722641, 0.0434272475540638]}, 
{'sequence': "New policy mix to propel turnaround in China's economy", 'labels': ['business', 'politics', 'education'], 'scores': [0.6015452146530151, 0.348330557346344, 0.05012420192360878]}]

3. 实体名称识别

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
print(ner("My name is Sylvain and I work at Hugging Face in Brooklyn."))

输出：

[{'entity_group': 'PER', 'score': 0.9981694, 'word': 'Sylvain', 'start': 11, 'end': 18}, 
{'entity_group': 'ORG', 'score': 0.9796019, 'word': 'Hugging Face', 'start': 33, 'end': 45}, 
{'entity_group': 'LOC', 'score': 0.9932106, 'word': 'Brooklyn', 'start': 49, 'end': 57}]

4. 摘要

from transformers import pipeline

# use bart in pytorch
summarizer = pipeline("summarization")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=8)

输出：

# max_length=8
[{'summary_text': ' Sam Shleifer writes'}]
# max_length=12
[{'summary_text': ' Sam Shleifer writes the best docstring'}]

5. 文本生成

from transformers import pipeline

generator = pipeline('text-generation', model='liam168/chat-DialoGPT-small-zh')
print(generator('今天早上早点到公司,', max_length=100))

6. GPT2英文文本生成

from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
print(generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
))

结果：

[{'generated_text': 'In this course, we will teach you how to write a powerful and useful resource for your students.\n\n\n\nHow can you understand your own'}, 
{'generated_text': 'In this course, we will teach you how to program a \u202an–n\u202f\u202f\u202f and learn how to use it.'}]

7. 遮挡字还原

from transformers import pipeline

unmasker = pipeline('fill-mask')

print(unmasker('What the <mask>?', top_k=3))

结果：

[{'score': 0.378376841545105, 'token': 17835, 'token_str': ' heck', 'sequence': 'What the heck?'}, 
{'score': 0.32931089401245117, 'token': 7105, 'token_str': ' hell', 'sequence': 'What the hell?'}, 
{'score': 0.1464540809392929, 'token': 26536, 'token_str': ' fuck', 'sequence': 'What the fuck?'}]

8. 问答

from transformers import pipeline

question_answerer = pipeline("question-answering")
print(question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
))

输出：

{'score': 0.6949763894081116, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

9. 翻译

一个中文到广东话翻译器：

from transformers import pipeline

translator = pipeline("translation", model="botisan-ai/mt5-translate-zh-yue")
print(translator("今天吃早饭没有？"))

输出：

[{'translation_text': '今日食早飯未?'}]

上一篇：Transformers学习笔记1. 一些基本概念和编码器、字典

下一篇：概率论学习四、条件概率与统计独立性

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯