nlp顶级期刊
(Natural Language Processing)
Natural Language Processing has been one of the most researched fields in deep learning in 2020, mostly due to its rising popularity, future potential, and support for a wide variety of applications.
自然语言处理一直是2020年深度学习领域研究最多的领域之一,这主要是由于其日益普及,未来的潜力以及对各种应用程序的支持。
If you have played around with deep learning before, you probably know conventional deep learning frameworks such as Tensorflow, Keras, and Pytorch. Assuming that you know these basic frameworks, this tutorial is dedicated to briefly guide you with other useful NLP libraries that you can learn and use in 2020. Depending on what you want to do, you might be able to take away a few names of the tools that interest you or didn't know exist!
如果你已经用深度学习之前发挥各地,你可能知道传统的深度学习框架,如Tensorflow , Keras和Pytorch 。 假定您了解这些基本框架,本教程将专门为您简要指导您在2020年可以学习和使用的其他有用的NLP库。根据您想做的事,您也许可以删除一些您感兴趣或不知道的工具存在!
(General Frameworks)
(AllenNLP)
- Popularity: ⭐⭐⭐⭐
- Official Website: https://allennlp.org/ 官方网站: https : //allennlp.org/
- Github: https://github.com/allenai/allennlp GitHub: https : //github.com/allenai/allennlp
- Explanation: AllenNLP is a general framework for deep learning for NLP, established by the world-famous Allen Institute for AI Lab. It contains state-of-the-art reference models that you can start implementing fast. It also supports a wide variety of tasks and datasets so there is no worry about that. It also includes a lot of cool demos that you can check out to see if you want to learn and use this framework!
说明:AllenNLP是用于NLP的深度学习的通用框架,该框架由举世闻名的Allen AI实验室研究所建立 。 它包含最新的参考模型,您可以开始快速实施它们。 它还支持各种各样的任务和数据集,因此不必担心。 它还包括许多很酷的演示,您可以查看这些演示,以了解是否要学习和使用此框架!
(Fairseq)
- Popularity: ⭐⭐⭐⭐
- Official Website: https://fairseq.readthedocs.io/en/latest 官方网站: https : //fairseq.readthedocs.io/en/latest
- Github: https://github.com/pytorch/fairseq GitHub: https : //github.com/pytorch/fairseq
- Explanation: Fairseq is a popular NLP framework developed by Facebook AI Research. It is a sequence modeling toolkit for machine translation, text summarization, language modeling, text generation, and other tasks. It contains built-in implementations for classic models, such as CNNs, LSTMs, and even the basic transformer with self-attention. Its CLI support also comes in really handy. I use Fairseq sometimes to train baselines to compare them with my own model, and I bet a lot of researchers use it to for the same purpose!
说明:Fairseq是由Facebook AI Research开发的流行的NLP框架。 它是用于机器翻译,文本摘要,语言建模,文本生成和其他任务的序列建模工具箱。 它包含用于经典模型的内置实现,例如CNN,LSTM甚至具有自我关注能力的基本转换器 。 它的CLI支持也非常方便。 有时,我使用Fairseq训练基线,以将其与我自己的模型进行比较,我敢打赌,很多研究人员将其用于同一目的!
(Fast.ai)
- Popularity: ⭐⭐⭐⭐
- Official Website: http://docs.fast.ai/ 官方网站: http : //docs.fast.ai/
- Github: https://github.com/fastai/fastai GitHub: https : //github.com/fastai/fastai
- Explanation: Fast.ai is built to make deep learning accessible to people without technical backgrounds through its free online courses and also easy-to-use software library. In fact, it’s co-founder Jeremy Howard just published (Aug. 2020) a completely new book called Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD, which it’s title is pretty self-explanatory. In the Fast.ai library, they have a specified Text section, which is for anything related to NLP. They have super high-level abstractions and easy implementations for NLP data preprocessing, model construction, training, and evaluation. I really recommend Fast.ai to anyone who prefers practice over theory and wants to solve a problem fast.
说明:Fast.ai的构建旨在通过其免费的在线课程和易于使用的软件库,使没有技术背景的人们可以进行深度学习。 实际上,它的联合创始人杰里米·霍华德(Jeremy Howard)刚刚出版(2020年8月)一本全新的书,名为“使用Fastai和PyTorch进行代码深度学习:没有博士学位的AI应用程序” ,其标题是不言而喻的。 在Fast.ai库中,它们具有指定的Text节 ,该节用于与NLP相关的任何内容。 它们具有用于NLP数据预处理,模型构建,培训和评估的超高级抽象,并且易于实现。 我真的向那些偏爱实践而不是理论并希望快速解决问题的人推荐Fast.ai。
(Preprocessing)
(Spacy)
- Popularity: ⭐⭐⭐⭐⭐
- Official Website: https://spacy.io/ 官方网站: https : //spacy.io/
- Github: https://github.com/explosion/spaCy GitHub: https : //github.com/explosion/spaCy
- Explanation: Spacy is the most popular text preprocessing library and most convenient one that you will ever find out there. It contains lots of easy-to-use functions for tokenization, part-of-speech tagging, named entity recognition, and much more. It also supports 59+ languages and several pretrained word vectors that you can get you started fast!
(NLTK)
- Popularity: ⭐⭐⭐⭐⭐
- Official Website: https://www.nltk.org/ 官方网站: https : //www.nltk.org/
- Github: https://github.com/nltk/nltk GitHub: https : //github.com/nltk/nltk
- Explanation: Similar to Spacy, it is another popular preprocessing library for modern NLP. Its function ranges from tokenization, stemming, tagging, to parsing and semantic reasoning. Personally, NLTK is my favorite preprocessing library of choice because I just like how easy NLTK is. It just gets the job done, and fast.
(TorchText)
- Popularity: ⭐⭐⭐⭐
- Official Website: https://torchtext.readthedocs.io/en/latest/ 官方网站: https : //torchtext.readthedocs.io/en/latest/
- Github: https://github.com/pytorch/text GitHub: https : //github.com/pytorch/text
- Explanation: TorchText is officially supported by Pytorch, and hence grew popularity. It contains convenient data processing utilities to process and prepare them in batches before you feed them into your deep learning framework. I use TorchText quite a lot for loading in my train, validation, and test datasets to do tokenization, vocab construction, and create iterators, which can be used later on by dataloaders. It really comes in as a handy tool that handles all the hefty work for you in a few simple lines. You can also easily use pretrained word embeddings, like Word2Vec or FastText, for your datasets, easily. You can see how I use TorchText by looking at my BERT Text Classification Using Pytorch article.
说明:TorchText得到Pytorch的正式支持,因此越来越受欢迎。 它包含便利的数据处理实用程序,可在批量处理和准备它们之前将其输入到深度学习框架中。 我将TorchText大量用于加载训练,验证和测试数据集,以进行标记化,vocab构造和创建迭代器,这些稍后可被数据加载器使用。 它确实是一个方便的工具,可以用几行简单的代码为您处理所有繁重的工作。 您还可以轻松地为数据集轻松使用经过预训练的单词嵌入,例如Word2Vec或FastText。 通过查看我的使用Pytorch进行的BERT文本分类,可以了解我如何使用TorchText 。
(Transformers)
(Huggingface)
- Popularity: ⭐⭐⭐⭐⭐
- Official Website: https://huggingface.co/ 官方网站: https : //huggingface.co/
- Github: https://github.com/huggingface/transformers GitHub: https : //github.com/huggingface/transformers
- Explanation: This is the most popular library out there that implements a wide variety of transformers, from BERT and GPT-2 to BART and Reformer. I use it on a daily basis, and from my own experience, their code readability and documentation are crispy clear. In their official github repo, they even organized their python scripts by different tasks, such as language modelling, text generation, question answering, multiple choice, etc. They have built-in scripts for running the baseline transformers for each of these tasks, so it’s really convenient to use them!
说明:这是最流行的库,它实现了从BERT和GPT-2到BART和Reformer的各种转换器。 我每天都使用它,根据我自己的经验,它们的代码可读性和文档清晰易读。 在他们的官方github存储库中 ,他们甚至通过不同的任务来组织python脚本,例如语言建模,文本生成,问题回答,多项选择等。他们具有内置的脚本,用于为每个任务运行基线转换器,因此使用它们真的很方便!
(Specific Tasks)
(Gensim)
- Popularity: ⭐⭐⭐
- Official Website: https://radimrehurek.com/gensim/ 官方网站: https : //radimrehurek.com/gensim/
- Github: https://github.com/RaRe-Technologies/gensim GitHub: https : //github.com/RaRe-Technologies/gensim
- Task: Topic Modeling, Text Summarization, Semantic Similarity
- Explanation: Gensim is a high-end, industry-level software for topic modeling of a specific piece of text. It is very robust, platform-independent, and scalable. I used it when I was doing my internship at an AI startup where we want to judge the semantic similarity between two newspaper articles. There’s a really simple function call that allows you to do just that and return their similarity score, so it’s extremely handy!
(OpenNMT)
- Popularity: ⭐⭐⭐
- Official Website: https://opennmt.net/ 官方网站: https : //opennmt.net/
- Github: https://github.com/OpenNMT/OpenNMT-py GitHub: https : //github.com/OpenNMT/OpenNMT-py
- Task: Machine Translation
- Explanation: OpenNMT is a convenient and powerful tool for the machine translation and sequence learning tasks. It contains highly configurable models and training procedures that make it a very simple framework to use. I have coworkers who would recommend using OpenNMT for different kinds of sequence learning tasks because it’s open-source and simple.
(ParlAI)
- Popularity: ⭐⭐⭐
- Official Website: https://parl.ai/ 官方网站: https : //parl.ai/
- Github: https://github.com/facebookresearch/ParlAI GitHub: https : //github.com/facebookresearch/ParlAI
- Task: Task-Oriented Dialogue, Chit-chat Dialogue, Visual Question Answering
- Explanation: ParlAI is Facebook’s #1 framework for sharing, training, and testing dialogue models for different kinds of dialogue tasks. It provides an all-in-one environment for supporting a wide variety of reference models, pretrained models, datasets, etc. Unlike most of the other tools on this list, ParlAI requires some level of coding and machine learning expertise, if you want to customize things on your own. In other words, it’s a bit more complicated to use but nevertheless a great tool to use if you’re into dialogue.
(DeepPavlov)
- Popularity: ⭐⭐⭐
- Official Website: http://deeppavlov.ai/ 官方网站: http : //deeppavlov.ai/
- Github: https://github.com/deepmipt/DeepPavlov GitHub: https : //github.com/deepmipt/DeepPavlov
- Task: Task-Oriented Dialogue, Chit-chat Dialogue
- Explanation: An alternative to ParlAI, I would say DeepPavlov is more for application and deployment rather than research, although you could definitely still do quite a lot of customization with DeepPavlov. I would argue that DeepPavlov to ParlAI is like Tensorflow to Pytorch. DeepPavlov is a framework mainly for chatbots and virtual assistants development, as it provides all the environment tools necessary for a production-ready and industry-grade conversational agent. I have used it once during a hackathon, fine-tuning a conversational agent to the restaurant domain (so that users can check the menu and order the food they want), and the end result works like a charm!
翻译自: https://towardsdatascience.com/top-nlp-libraries-to-use-2020-4f700cdb841f
nlp顶级期刊