From the course: AI Orchestration: Validation and User Feedback and Performance Metrics
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
BLEURT
- [Instructor] We'll now move on to the BLEURT technique for model-based evaluation. BLEURT is a learning-based evaluation metric that leverages the power of the BERT model, where BERT stands for Bidirectional Encoder Representations from Transformers. This is a pre-trained language model, and can be used to assess text similarity with a deeper understanding of semantics. Unlike traditional models that rely on exact word matches, BLEURT considers both lexical and contextual alignment between generated text and reference text. Here is a summary of what BLEURT is. A learning based evaluation metric that leverages BERT, a pre-trained language model, to assess tech similarity with a deeper understanding of semantics. Unlike traditional metrics, BLEURT evaluates both lexical and contextual alignment. Lexical alignment, because it checks whether the words in the generated text and reference texts are similar, and contextual alignment, because it looks beyond just words and assesses if the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Evaluating models using metrics1m 50s
-
(Locked)
Evaluating regression models2m 48s
-
(Locked)
Evaluating classification models4m 8s
-
(Locked)
Evaluating clustering models1m 52s
-
Accuracy precision recall5m 45s
-
(Locked)
Evaluating large language models (LLMs)5m 3s
-
(Locked)
Human evaluation2m 12s
-
(Locked)
Statistical methods for LLM evaluation2m 28s
-
(Locked)
ROUGE scores3m 29s
-
(Locked)
BLEU score1m 13s
-
(Locked)
METEOR score57s
-
(Locked)
Perplexity2m 48s
-
(Locked)
Model-based methods for LLM evaluation1m 53s
-
(Locked)
Natural language inference3m 22s
-
(Locked)
BLEURT3m 57s
-
(Locked)
Judge models4m 16s
-
(Locked)
LLM evaluation10m 11s
-
(Locked)
-
-