Customizing LLMs

Customizing LLMs
TechFrontiers AI Meetup, Nov 2, 2023
Jim Steele

Customizing Large Language Models
LLMs allow humans to efﬁciently interact and
generate content with basically “all” annotated
publicly available human knowledge.
But what about incorporating proprietary, domain
speciﬁc, or private data into these models? 2
Custom
knowledge
?
Corpus of Human
Knowledge
(“The Internet”)

Three custom options for LLMs
3
Custom
knowledge
Corpus of Human
Knowledge
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM

Custom options for LLMs (1 of 3)
4
Corpus of Human
Knowledge
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Training requires immense resources only available to the few such
as Open AI, Google, and Meta. (Named “Foundation Models”)
Closed models: ChatGPT/GPT-4 (Open AI), Bard/PaLM2 (Google), …
Open models: LLaMA 2 (Meta), Claude 2 (Anthropic), …

5
Corpus of Human
Knowledge
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Fine-tuning can be done post-training to incorporate or
emphasize custom knowledge.
These techniques are used in foundation models already!

Fine-tuning for LLMs (1 of 2)
Prompt: “Write an essay about Alexander Hamilton.”
LLM: “Your essay should be at least five pages, double-spaced, and include at
least two citations.”
source 6
Supervised Fine-Tuning (SFT): correct gross errors to be
more in line with expected use-cases. Prioritization of
quality examples over quantity, as numerous reports show
that the use of high-quality data results in improved ﬁnal
model performance.
Without tuning, LLMs would produce wrong intended responses, e.g.:

Fine-tuning for LLMs (2 of 2)
7
Prompt: What is Yann LeCun an expert in?
LLM possible results:
● Yann has many publications in artificial intelligence, computer
vision, and mobile robotics.
● Yann has worked in artificial intelligence, machine learning, and
mobile robotics.
● Yann has researched artificial intelligence, computer vision, and
computational neuroscience.
Reinforcement Learning with Human Feedback (RLHF):
corrects nuanced errors to be more in line with expected
responses. Examples are collected and annotators select
their preferred model outputs. This data is used to train a
reward model, where the focus is on helpfulness and safety.
source

Ways for the Rest of Us to
Customize Pre-trained LLMs
Parameter-efﬁcient ﬁne-tuning (PEFT): e.g. adapter
modules, prompt tuning, sparse update methods, provides
better accuracy with lower compute costs
Few-shot In-context Learning (ICL): feed a small number of
training examples as part of the input (computationally
intensive)
8

Custom Fine-tuning with LoRA
Observe that customization often changes a
small subset of the original LLM parameters.
LoRA = Low-Rank Adaptation of Large
Language Models
Fine-tune on a lower rank subset of parameters
before adding to the pretrained weights
Example to the left: Results showing LLaMA-2
can learn ViGGO (Video Game vernacular)
Source
9
← From
original
LoRA
research
paper
Parameter-efﬁcient ﬁne-tuning example

LLM can modify its behavior
based on previous prompts
Exhibiting chain-of-thought, or
reasoning, in prompts produces
better responses.
10
Few-shot In-context Learning example

11
Corpus of Human
Knowledge
Train a custom
LLM
Tune general
purpose LLM
Prompt general
purpose LLM
Modify prompts to incorporate or emphasize custom
knowledge.
These techniques are used in foundation models already too!

How do LLMs have a conversation?
Research on LLM long term memory not implemented yet.
Currently Chat LLM programs feed the previous queries and
responses into each new prompt to provide a sense of
conversation. Note:
● LLM input token limit: means earlier parts of the
conversation are eventually forgotten
● Transformers do not “change their minds”: responses are
auto-regressive (errors accumulate).
● Self-attention helps: More information in prompt leads to
more specialized responses.
12
LLM input token limit
ChatGPT allows ~4k words,
GPT-4 allows ~32k words,
Claude 2 allows ~75k words

Prompt expansion with Retrieval
Augmented Generation (RAG)
Rather than pass costly text with each prompt,
RAG adds an information retrieval mechanism to
augment the user prompt with relevant context
info otherwise not available to the LLM, e.g.:
● real-time context (weather, location, etc.)
● user-speciﬁc information (website orders,
status, etc.)
● relevant factual information (docs not in LLM
training data - either private or updated after
the LLM was trained).
13
This is accomplished by building a vector
embedding index around input data using
e.g., Langchain or Llama Index
source

Summary
Beyond an all-out retraining, there are two
main techniques to improve LLM output
relevance summarized as [ref]:
● Fine-tune model for form (e.g., LoRA)
● Prompt expansion for fact (e.g., RAG)
Also, more LLMs are providing tools for
custom data (see demo: NotebookLM)
14
Source

Additional
References
Personal LLAMA
Getting started with LlamaIndex
LlamaIndex Origin
15

Example of Vector Embeddings
with LlamaIndex
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('my-directory-of-docs').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
results = query_engine.query("What is DeepHaiku? Be brief")
print(results)
17
source

Customizing LLMs

More Related Content

What's hot

Similar to Customizing LLMs

Recently uploaded

In this document

Customizing LLMs