Join now Sign in

From the course: AWS Certified AI Practitioner (AIF-C01) Cert Prep

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Foundation model performance metrics and evaluation

Foundation model performance metrics and evaluation - Amazon Web Services (AWS) Tutorial

From the course: AWS Certified AI Practitioner (AIF-C01) Cert Prep

Start my 1-month free trial Buy for my team

Foundation model performance metrics and evaluation

“

- There are a number of different performance metrics that can be used to evaluate how a foundation model is doing, especially when it comes to accuracy. And so we're going to take a look at a couple of these, and brace yourselves, there's going to be some interesting acronyms that we're going to be going through. And we're going to start with ROUGE. That stands for Recall-Oriented Understudy for Gisting Evaluation, but ROUGE is a lot easier. And this is a set of metrics for evaluating the summarization and machine translation primarily of LLMs. There are a couple of different components here. We've got ROUGE-N, which is for N-grams and ROUGE-L for longest common subsequence. And we're going to take a look at both of these coming up. And the purpose of this is to measure the overlap between the generated text and reference texts. And so a higher ROUGE score reflects a better quality summary. The definition of an N-gram is a contiguous sequence of N items, which are usually words…

Contents