8 Innovative BERT Knowledge Distillation Papers That Have Changed The Landscape of NLP

原创

emanlee 2023-10-31 12:28:55 ©著作权

©著作权归作者所有：来自51CTO博客作者emanlee的原创作品，请联系作者获取转载授权，否则将追究法律责任

8 Innovative BERT Knowledge Distillation Papers That Have Changed The Landscape of NLP

Contemporary state-of-the-art NLP models are difficult to be utilized in production. Knowledge distillation offers tools for tackling such issues along with several others, but it has its quirks.

BERT’s inefficient nature has not gone unnoticed. Many researchers have pursued ways to reduce its cost and size. Some of the most active research is in model compression techniques such as smaller architectures (structured pruning), distillation, quantization, and unstructured pruning. A few of the more impactful papers include:

DistilBERT used knowledge distillation to transfer knowledge from a BERT base model to a 6-layer version.
TinyBERT implemented a more complicated distillation
The Lottery Ticket Hypothesis applied magnitude pruning during pre-training of a BERT model to create a sparse architecture
Movement Pruning applied a combination of the magnitude and gradient information to remove redundant parameters while fine-tuning with distillation.

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca

This post is about text classification on problems with a limited sample count.