LLM

some LLM’s model and weight are not opened to user

what is?

Llama 270b model

  • 2 files
  • parameters file
  • parameter or weight of neural network
  • parameter – 2bytes, float number
  • code run parameters(inference)
  • c or python, etc
  • for c, 500 lines code without dependency to run
  • self contained package(no network need)
  • how to get parameters?
  • lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
  • what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way
  • LLM-Intro to Large Language Models_语言模型

  • prediction has strong relationship with compression
  • LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.
  • how does it work?

LLM-Intro to Large Language Models_人工智能_02


LLM-Intro to Large Language Models_LLM_03

training stage

  • pre-training
  • expensive
  • base model. get a document generator model
  • it’s about knowledge
  • internet documents

  • fine tuning
  • cheaper
  • assistant model. get a assistant model
  • it’s about alighment
  • Q&A document
  • training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
  • focus on quality not amount
  • stage 3(optional)
  • use comparison label
  • reenforcement learning from human feedback

LLM-Intro to Large Language Models_自然语言处理_04

  • labeling is a human-machine collaboration

LLM-Intro to Large Language Models_自然语言处理_05

  • rank of LLM

LLM-Intro to Large Language Models_LLM_06

LLM scaling laws:

  • more D and N will get better model

LLM-Intro to Large Language Models_LLM_07

LLM-Intro to Large Language Models_自然语言处理_08

  • multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.
  • future directions of development in LLM

give LLM system 2 ablility

LLM-Intro to Large Language Models_llama_09


LLM-Intro to Large Language Models_语言模型_10

  • LLM now only have system one(instinctive)
  • convert time to accuracy

self-improvement

LLM-Intro to Large Language Models_语言模型_11

  • in narrow domain it is possible to self-improve

customization

experts in certain domain

future of LLM

LLM-Intro to Large Language Models_llama_12