Customizing LLMs
TechFrontiers AI Meetup, Nov 2, 2023
Jim Steele
Customizing Large Language Models
LLMs allow humans to efficiently interact and
generate content with basically “all” annotated
publicly available human knowledge.
But what about incorporating proprietary, domain
specific, or private data into these models? 2
Custom
knowledge
?
Corpus of Human
Knowledge
(“The Internet”)
Three custom options for LLMs
3
Custom
knowledge
Corpus of Human
Knowledge
(“The Internet”)
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Custom options for LLMs (1 of 3)
4
Corpus of Human
Knowledge
(“The Internet”)
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Training requires immense resources only available to the few such
as Open AI, Google, and Meta. (Named “Foundation Models”)
Closed models: ChatGPT/GPT-4 (Open AI), Bard/PaLM2 (Google), …
Open models: LLaMA 2 (Meta), Claude 2 (Anthropic), …
Custom options for LLMs (2 of 3)
5
Corpus of Human
Knowledge
(“The Internet”)
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Fine-tuning can be done post-training to incorporate or
emphasize custom knowledge.
These techniques are used in foundation models already!
Fine-tuning for LLMs (1 of 2)
Prompt: “Write an essay about Alexander Hamilton.”
LLM: “Your essay should be at least five pages, double-spaced, and include at
least two citations.”
source 6
Supervised Fine-Tuning (SFT): correct gross errors to be
more in line with expected use-cases. Prioritization of
quality examples over quantity, as numerous reports show
that the use of high-quality data results in improved final
model performance.
Without tuning, LLMs would produce wrong intended responses, e.g.:
Fine-tuning for LLMs (2 of 2)
7
Prompt: What is Yann LeCun an expert in?
LLM possible results:
● Yann has many publications in artificial intelligence, computer
vision, and mobile robotics.
● Yann has worked in artificial intelligence, machine learning, and
mobile robotics.
● Yann has researched artificial intelligence, computer vision, and
computational neuroscience.
Reinforcement Learning with Human Feedback (RLHF):
corrects nuanced errors to be more in line with expected
responses. Examples are collected and annotators select
their preferred model outputs. This data is used to train a
reward model, where the focus is on helpfulness and safety.
source
Ways for the Rest of Us to
Customize Pre-trained LLMs
Parameter-efficient fine-tuning (PEFT): e.g. adapter
modules, prompt tuning, sparse update methods, provides
better accuracy with lower compute costs
Few-shot In-context Learning (ICL): feed a small number of
training examples as part of the input (computationally
intensive)
8
Custom Fine-tuning with LoRA
Observe that customization often changes a
small subset of the original LLM parameters.
LoRA = Low-Rank Adaptation of Large
Language Models
Fine-tune on a lower rank subset of parameters
before adding to the pretrained weights
Example to the left: Results showing LLaMA-2
can learn ViGGO (Video Game vernacular)
Source
9
← From
original
LoRA
research
paper
Parameter-efficient fine-tuning example
LLM can modify its behavior
based on previous prompts
Exhibiting chain-of-thought, or
reasoning, in prompts produces
better responses.
10
Few-shot In-context Learning example
Custom options for LLMs (3 of 3)
11
Corpus of Human
Knowledge
(“The Internet”)
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Modify prompts to incorporate or emphasize custom
knowledge.
These techniques are used in foundation models already too!
How do LLMs have a conversation?
Research on LLM long term memory not implemented yet.
Currently Chat LLM programs feed the previous queries and
responses into each new prompt to provide a sense of
conversation. Note:
● LLM input token limit: means earlier parts of the
conversation are eventually forgotten
● Transformers do not “change their minds”: responses are
auto-regressive (errors accumulate).
● Self-attention helps: More information in prompt leads to
more specialized responses.
12
LLM input token limit
ChatGPT allows ~4k words,
GPT-4 allows ~32k words,
Claude 2 allows ~75k words
Prompt expansion with Retrieval
Augmented Generation (RAG)
Rather than pass costly text with each prompt,
RAG adds an information retrieval mechanism to
augment the user prompt with relevant context
info otherwise not available to the LLM, e.g.:
● real-time context (weather, location, etc.)
● user-specific information (website orders,
status, etc.)
● relevant factual information (docs not in LLM
training data - either private or updated after
the LLM was trained).
13
This is accomplished by building a vector
embedding index around input data using
e.g., Langchain or Llama Index
source
Summary
Beyond an all-out retraining, there are two
main techniques to improve LLM output
relevance summarized as [ref]:
● Fine-tune model for form (e.g., LoRA)
● Prompt expansion for fact (e.g., RAG)
Also, more LLMs are providing tools for
custom data (see demo: NotebookLM)
14
Source
Additional
References
Personal LLAMA
Getting started with LlamaIndex
LlamaIndex Origin
15
Appendix
16
Example of Vector Embeddings
with LlamaIndex
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('my-directory-of-docs').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
results = query_engine.query("What is DeepHaiku? Be brief")
print(results)
17
source

Customizing LLMs

  • 1.
    Customizing LLMs TechFrontiers AIMeetup, Nov 2, 2023 Jim Steele
  • 2.
    Customizing Large LanguageModels LLMs allow humans to efficiently interact and generate content with basically “all” annotated publicly available human knowledge. But what about incorporating proprietary, domain specific, or private data into these models? 2 Custom knowledge ? Corpus of Human Knowledge (“The Internet”)
  • 3.
    Three custom optionsfor LLMs 3 Custom knowledge Corpus of Human Knowledge (“The Internet”) Train a custom LLM Tune general purpose LLM Prompt general purpose LLM
  • 4.
    Custom options forLLMs (1 of 3) 4 Corpus of Human Knowledge (“The Internet”) Train a custom LLM Tune general purpose LLM Prompt general purpose LLM Training requires immense resources only available to the few such as Open AI, Google, and Meta. (Named “Foundation Models”) Closed models: ChatGPT/GPT-4 (Open AI), Bard/PaLM2 (Google), … Open models: LLaMA 2 (Meta), Claude 2 (Anthropic), …
  • 5.
    Custom options forLLMs (2 of 3) 5 Corpus of Human Knowledge (“The Internet”) Train a custom LLM Tune general purpose LLM Prompt general purpose LLM Fine-tuning can be done post-training to incorporate or emphasize custom knowledge. These techniques are used in foundation models already!
  • 6.
    Fine-tuning for LLMs(1 of 2) Prompt: “Write an essay about Alexander Hamilton.” LLM: “Your essay should be at least five pages, double-spaced, and include at least two citations.” source 6 Supervised Fine-Tuning (SFT): correct gross errors to be more in line with expected use-cases. Prioritization of quality examples over quantity, as numerous reports show that the use of high-quality data results in improved final model performance. Without tuning, LLMs would produce wrong intended responses, e.g.:
  • 7.
    Fine-tuning for LLMs(2 of 2) 7 Prompt: What is Yann LeCun an expert in? LLM possible results: ● Yann has many publications in artificial intelligence, computer vision, and mobile robotics. ● Yann has worked in artificial intelligence, machine learning, and mobile robotics. ● Yann has researched artificial intelligence, computer vision, and computational neuroscience. Reinforcement Learning with Human Feedback (RLHF): corrects nuanced errors to be more in line with expected responses. Examples are collected and annotators select their preferred model outputs. This data is used to train a reward model, where the focus is on helpfulness and safety. source
  • 8.
    Ways for theRest of Us to Customize Pre-trained LLMs Parameter-efficient fine-tuning (PEFT): e.g. adapter modules, prompt tuning, sparse update methods, provides better accuracy with lower compute costs Few-shot In-context Learning (ICL): feed a small number of training examples as part of the input (computationally intensive) 8
  • 9.
    Custom Fine-tuning withLoRA Observe that customization often changes a small subset of the original LLM parameters. LoRA = Low-Rank Adaptation of Large Language Models Fine-tune on a lower rank subset of parameters before adding to the pretrained weights Example to the left: Results showing LLaMA-2 can learn ViGGO (Video Game vernacular) Source 9 ← From original LoRA research paper Parameter-efficient fine-tuning example
  • 10.
    LLM can modifyits behavior based on previous prompts Exhibiting chain-of-thought, or reasoning, in prompts produces better responses. 10 Few-shot In-context Learning example
  • 11.
    Custom options forLLMs (3 of 3) 11 Corpus of Human Knowledge (“The Internet”) Train a custom LLM Tune general purpose LLM Prompt general purpose LLM Modify prompts to incorporate or emphasize custom knowledge. These techniques are used in foundation models already too!
  • 12.
    How do LLMshave a conversation? Research on LLM long term memory not implemented yet. Currently Chat LLM programs feed the previous queries and responses into each new prompt to provide a sense of conversation. Note: ● LLM input token limit: means earlier parts of the conversation are eventually forgotten ● Transformers do not “change their minds”: responses are auto-regressive (errors accumulate). ● Self-attention helps: More information in prompt leads to more specialized responses. 12 LLM input token limit ChatGPT allows ~4k words, GPT-4 allows ~32k words, Claude 2 allows ~75k words
  • 13.
    Prompt expansion withRetrieval Augmented Generation (RAG) Rather than pass costly text with each prompt, RAG adds an information retrieval mechanism to augment the user prompt with relevant context info otherwise not available to the LLM, e.g.: ● real-time context (weather, location, etc.) ● user-specific information (website orders, status, etc.) ● relevant factual information (docs not in LLM training data - either private or updated after the LLM was trained). 13 This is accomplished by building a vector embedding index around input data using e.g., Langchain or Llama Index source
  • 14.
    Summary Beyond an all-outretraining, there are two main techniques to improve LLM output relevance summarized as [ref]: ● Fine-tune model for form (e.g., LoRA) ● Prompt expansion for fact (e.g., RAG) Also, more LLMs are providing tools for custom data (see demo: NotebookLM) 14 Source
  • 15.
    Additional References Personal LLAMA Getting startedwith LlamaIndex LlamaIndex Origin 15
  • 16.
  • 17.
    Example of VectorEmbeddings with LlamaIndex from llama_index import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader('my-directory-of-docs').load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() results = query_engine.query("What is DeepHaiku? Be brief") print(results) 17 source