LLM-Intro to Large Language Models

原创

wx6655d921adeca 2024-05-31 13:35:59 博主文章分类：AI ©著作权

©著作权归作者所有：来自51CTO博客作者wx6655d921adeca的原创作品，请联系作者获取转载授权，否则将追究法律责任

LLM

some LLM’s model and weight are not opened to user

what is?

Llama 270b model

lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)

what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way

LLM-Intro to Large Language Models_语言模型

prediction has strong relationship with compression
LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.
how does it work？

LLM-Intro to Large Language Models_人工智能_02

LLM-Intro to Large Language Models_LLM_03

training stage

cheaper
assistant model. get a assistant model
it’s about alighment
Q&A document
training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
focus on quality not amount

LLM-Intro to Large Language Models_自然语言处理_04

LLM-Intro to Large Language Models_自然语言处理_05

LLM-Intro to Large Language Models_LLM_06

LLM scaling laws：

LLM-Intro to Large Language Models_LLM_07

LLM-Intro to Large Language Models_自然语言处理_08

multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.
future directions of development in LLM