本资源整理了2020年关于自然语言处理相关的所有主题的最新最全的资源,涉及自然语言处理相关的资源(相关的会议、经典论文、数据集、相关NLP任务及最新进展),NLP相关的信息、讨论会,优秀的博客资源,NLP相关任务最新benchmarks,NLP研究相关研究资源、工业资源,语言识别相关资源、主题建模资源等等。

    资源整理自网络,源地址:

​https://github.com/ivan-bilan/The-NLP-Pandect​


2020年最新NLP相关最新资源整理分享_sed

    Compendiums and awesome lists on the topic of NLP:

    •Awesome NLP by keon [GitHub ~10k stars]

    •Speech and Natural Language Processing Awesome List by elaboshira [GitHub ~2k stars]

    •Awesome Deep Learning for Natural Language Processing (NLP) [GitHub ~1k stars]

    •Text Mining and Natural Language Processing Resources by stepthom [GitHub ~300 stars]

    •Made with ML List by madewithml.com

    •Brainsources for #NLP enthusiasts by Philip Vollet


  NLP Conferences, Paper Summaries and Paper Compendiums:

    •NLP top 10 conferences Compendium by soulbliss [GitHub ~300 stars]

    •NLP Paper Summaries by dair-ai [GitHub ~1k stars]

    •Curated collection of papers for the NLP practitioner [GitHub ~1k stars]

    •Papers on Textual Adversarial Attack and Defense [GitHub ~500 stars]

    •NLP Conferences Calendar

    •ICLR 2020 Trends

    •The Most Influential NLP Research of 2019

    •Recent Deep Learning papers in NLU and RL by Valentin Malykh [GitHub ~300 stars]


    NLP Progress and NLP Tasks:

    •NLP Progress by sebastianruder [GitHub ~16k stars]

    •NLP Tasks by Kyubyong [GitHub ~3k stars]

    •Reading list for Awesome Sentiment Analysis papers by declare-lab [GitHub ~100 stars]

    •Awesome Sentiment Analysis by xiamx [GitHub ~800 stars]


    NLP Datasets:

    •NLP Datasets by niderhoff [GitHub ~4k stars]

    •Big Bad NLP Database

    •25 Best Parallel Text Datasets for Machine Translation Training

    •UWA Unambiguous Word Annotations - Word Sense Disambiguation Dataset

    •20 Best German Language Datasets for Machine Learning


    Word and Sentence embeddings:

    •Awesome Embedding Models by Hironsan [GitHub ~1.3k stars]

    •Awesome list of Sentence Embeddings by Separius [GitHub ~1.5k stars]

    •Awesome BERT by Jiakui [GitHub ~1.5k stars]


    Notebooks, Scripts and Repositories

    •The Super Duper NLP Repo [Website, 2020]

2020年最新NLP相关最新资源整理分享_sed_02

    •NLP Highlights [Years: 2017 - now, Status: active]

    •TWIML AI [Years: 2016 - now, Status: active]

    •Data Hack Radio [Years: 2018 - now, Status: active]

    •The Super Data Science Podcast [Years: 2016 - now, Status: active]

    •AI Game Changers [Years: 2020 - now, Status: active]


2020年最新NLP相关最新资源整理分享_sed_03

    •NLP News by Sebastian Ruder

    •dair.ai Newsletter by dair.ai

    •Papers with Code

    •The Batch by deeplearning.ai

    •Paper Digest by PaperDigest

    •NLP Cypher by QuantumStat

 

2020年最新NLP相关最新资源整理分享_github_04

 •NLP Zurich


2020年最新NLP相关最新资源整理分享_sed_05

    •Yannic Kilcher

    •HuggingFace

    •Kaggle Reading Group

    •Rasa Paper Reading

    •Stanford CS224N: NLP with Deep Learning

    •ML Explained - A.I. Socratic Circles - AISC

    •Deeplearning.ai

    •Machine Learning Street Talk


2020年最新NLP相关最新资源整理分享_ide_06

    •SQuAD - Stanford Question Answering Dataset (SQuAD)

    •GLUE - General Language Understanding Evaluation (GLUE) benchmark

    •SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks

    •XTREME - Massively Multilingual Multi-task Benchmark

    •decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models

    •RACE - ReAding Comprehension dataset collected from English Examinations


2020年最新NLP相关最新资源整理分享_github_07

    General

    •A Recipe for Training Neural Networks by Andrej Karpathy [Keywords: research, training, 2019]


    Embeddings

    Repositories

    •Pre-trained ELMo Representations for Many Languages [GitHub ~1k stars]

    •sense2vec - Contextually-keyed word vectors [GitHub ~1k stars]

    •wikipedia2vec [GitHub ~500 stars]

    •StarSpace [GitHub ~3k stars]

    •fastText [GitHub ~21k stars]


    Blogs

    •Language Models and Contextualised Word Embeddings by David S. Batista [Blog, 2018]

    •An Essential Guide to Pretrained Word Embeddings for NLP Practitioners by AnalyticsVidhya [Blog, 2020]

    •Polyglot Word Embeddings Discover Language Clusters [Blog, 2020]

    •The Illustrated Word2vec by Jay Alammar [Blog, 2019]


    Transformer-based Architectures

    General

    •The Transformer Family by Lilian Weng [Blog, 2020]

    •Keeping up with the BERTs: a review of the main NLP benchmarks by Manuel Tonneau [Blog, 2020]

    •Playing the lottery with rewards and multiple languages - about the effect of random initialization [ICLR 2020 Paper]

    •Attention? Attention! by Lilian Weng [Blog, 2018]

    •the transformer … “explained”? [Blog, 2019]

    •Attention is all you need; Attentional Neural Network Models by Łukasz Kaiser [Talk, 2017]

    •Understanding and Applying Self-Attention for NLP [Talk, 2018]


    Transformer

    •The Annotated Transformer by Harvard NLP [Blog, 2018]

    •The Illustrated Transformer by Jay Alammar [Blog, 2018]

    •Illustrated Guide to Transformers by Hong Jing [Blog, 2020]

    •Sequential Transformer with Adaptive Attention Span by Facebook. Blog [Blog, 2019]

    •Evolution of Representations in the Transformer by Lena Voita [Blog, 2019]

    •Reformer: The Efficient Transformer [Blog, 2020]

    •T5: the Text-To-Text Transfer Transformer [Blog, 2020]

    •Longformer — The Long-Document Transformer by Viktor Karlsson [Blog, 2020]

    •TRANSFORMERS FROM SCRATCH [Blog, 2019]

    •Universal Transformers by Mostafa Dehghani [Blog, 2019]


    BERT

    •A Visual Guide to Using BERT for the First Time by Jay Alammar [Blog, 2019]

    •The Dark Secrets of BERT by Anna Rogers [Blog, 2020]

    •Understanding searches better than ever before [Blog, 2019]

    •Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework [Blog, 2019]

    •SemBERT - Semantics-aware BERT for Language Understanding [Github ~100 stars]


    GPT-family

    General

    •The Illustrated GPT-2 by Jay Alammar [Blog, 2019]

    •The Annotated GPT-2 by Aman Arora

    •OpenAI’s GPT-2: the model, the hype, and the controversy by Ryan Lowe [Blog, 2019]

    •How to generate text by Patrick von Platen [Blog, 2020]


    GPT-3

    •Zero Shot Learning for Text Classification by Amit Chaudhary [Blog, 2020]

    •GPT-3 A Brief Summary by Leo Gao [Blog, 2020]

    •GPT-3, a Giant Step for Deep Learning And NLP by Yoel Zeldes [Blog, June 2020]

    •GPT-3 Language Model: A Technical Overview by Chuan Li [Blog, June 2020]

    •OpenAI API - API Demo to use GPT-3 for commercial applications


    Other

    •What is Two-Stream Self-Attention in XLNet by Xu LIANG [Blog, 2019]

    •Visual Paper Summary: ALBERT (A Lite BERT) by Amit Chaudhary [Blog, 2020]

    •Turing NLG by Microsoft

    •Multi-Label Text Classification with XLNet by Josh Xin Jie Lee [Blog, 2019]

    •ELECTRA [GitHub ~1k stars]


    Distillation, Pruning and Quantization

    •Distilling knowledge from Neural Networks to build smaller and faster models by FloydHub [Blog, 2019]

    •David over Goliath: towards smaller models for cheaper, faster, and greener NLP by Manuel Tonneau [Blog, 2020]


    Automated Summarization

    •PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization by Google AI [Blog, June 2020]


2020年最新NLP相关最新资源整理分享_github_08

    Transformer-based Architectures

    •Why BERT Fails in Commercial Environments by Intel AI [Blog, 2020]

    •Fine Tuning BERT for Text Classification with FARM by Sebastian Guggisberg [Blog, 2020]

    •Practical NLP for the Real World [Presentation, 2019]

    •From Paper to Product – How we implemented BERT by Christoph Henkelmann [Talk, 2020]


    Embeddings as a Service

    •embedding-as-service [GitHub, ~100 stars]

    •Bert-as-service [GitHub, ~8k stars]

    NLP Recipes Industrial Applications:

    •NLP Recipes by microsoft [GitHub ~5k stars]

    •NLP with Python by susanli2016 [GitHub ~1.5k stars]

    •Basic Utilities for PyTorch NLP by PetrochukM [GitHub ~2k stars]


    NLP Applications in Bio, Finance, Legal and other industries

    •Blackstone - A spaCy pipeline and model for NLP on unstructured legal text [GitHub ~300 stars]

    •Sci spaCy - spaCy pipeline and models for scientific/biomedical documents [GitHub ~600 stars]

    •FinBERT: Pre-Trained on SEC Filings for Financial NLP Tasks [GitHub ~100 stars]

    •LexNLP - Information retrieval and extraction for real, unstructured legal text [GitHub ~400 stars]


2020年最新NLP相关最新资源整理分享_ide_09


    General Speech Recognition

    •wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]

    •DeepSpeech - Baidu's DeepSpeech architecture [GitHub ~14k stars]

    •Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]

    •kaldi - Kaldi is a toolkit for speech recognition [GitHub ~9k stars]

    •awesome-kaldi - resources for using Kaldi [GitHub ~300 stars]


    Text to Speech

    •FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub ~500 stars]


2020年最新NLP相关最新资源整理分享_sed_10


    Blogs

    •Topic Modelling with PySpark and Spark NLP by Maria Obedkova [Spark, Blog, 2020]


    Repositories

    •Anchored Correlation Explanation Topic Modeling [GitHub ~300 stars]

    •Topic Modeling in Embedding Spaces [GitHub ~200 stars] Paper


    Data Augmentation

    •A Visual Survey of Data Augmentation in NLP [Blog, 2020]

    •Data augmentation for NLP [GitHub ~1k stars]

    •snorkel Framework to generate training data [GitHub ~4k stars]


    Ethics, Bias, and Equality in NLP

    •Computational Ethics for NLP - course resources from the Carnegie Mellon University [Lecture Notes, Spring 2020]

    •Ethics in NLP - resources from ACLs Ethics in NLP track


    General Purpose

    •transformers by HuggingFace [GitHub ~28k stars]

    •spaCy by Explosion AI [GitHub ~17k stars]

    •flair by Zalando [Github ~9k stars]

    •AllenNLP by AI2 [Github ~9k stars]

    •stanza (former Stanford NLP) [GitHub ~4k stars]

    •spaCy stanza [GitHub ~400 stars]

    •nltk [GitHub ~9k stars]

    •NLP Architect - A Deep Learning NLP/NLU library by Intel® AI Lab [GitHub ~2.5k stars]

    •Kashgari Transfer Learning with focus on Chinese [GitHub ~2k stars]

    •polyglot - Multi-lingual NLP Framework [Github ~2k stars]

    •FARM [GitHub ~1k stars]

    •gobbli by RTI International [GitHub ~200 stars]

    •headliner - training and deployment of seq2seq models [GitHub ~200 stars]

    •SyferText - A privacy preserving NLP framework [GitHub ~100 stars]


    Dialog Systems and Speech

    •DeepPavlov by MIPT [Github ~4k stars]

    •ParlAI by FAIR [Github ~6k stars]

    •rasa - Framework for Conversational Agents [GitHub ~9k stars]

    •wav2letter - Automatic Speech Recognition Toolkit [GitHub ~5k stars]


    Distributed NLP

    •Spark NLP [Github ~1k stars]


    Other NLP Topics

    General

    •NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks by HuggingFace [GitHub ~2k stars]

    Tokenization

    •tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub ~3k stars]

    •SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub ~4k stars]


2020年最新NLP相关最新资源整理分享_github_11

    Books

    •Dive into Deep Learning - An interactive deep learning book with code, math, and discussions

    •Natural Language Processing and Computational Linguistics - Speech, Morphology and Syntax (Cognitive Science)


    Courses

    •Choosing the right course for a Practical NLP Engineer

    •12 Best Natural Language Processing Courses & Tutorials to Learn Online


    Tutorials

    •Hands-On NLTK Tutorial [GitHub ~300 stars]

2020年最新NLP相关最新资源整理分享_github_12

    •r/LanguageTechnology - NLP Reddit forum


2020年最新NLP相关最新资源整理分享_ide_13