From the course: GPT Foundations: Building AI-Powered Apps

What are large language models (LLMs)?

- [Host] So what is ChatGPT? It's a large language model or LLM for short. But what does that mean? A large language model is a machine learning model or ML model for short that has been trained on a large amount of data and requires powerful computers to run on. So you might ask, is ChatGPT the only large language model or are there others out there? The answer is yes. There's many large language models out there, both by big providers and open source. So three popular large language models are GPT, which is from OpenAI, Claude, which is from Anthropic and Gemini, which is from Google. Now, over the years there have been dozens and even hundreds of large language models released, and here's an overview from March, 2023. Since then, there have been thousands of new models released with some coming out every day. Now let's go through a brief history of large language models. The origin of large language models is in 2017 when Google released the transformer model. On the graph here, the numbers underneath the model represent the size of the models. For example, GPT-2 has 1.5 billion parameters. The larger the model, generally the more powerful it is. Many models were released since then. GPT-3 came out in 2020, which was the beginning of the true LLM moment in time for us, and now we're here at various versions of GPT. In the last two years, the techniques that we use for training models have changed, and models getting bigger doesn't mean they're necessarily getting better. As we can see here, a model released by Google called Gemma, with 27 billion parameters, outperforms models with 600 plus billion parameters. For example, the model from Deepseek. Now, typically larger models do better. For example, the Gemma-3 model, which is 27 billion parameters, does better than Deepseek's V3 model, which is over 600 billion parameters. So models getting bigger doesn't necessarily mean that they're getting better, but oftentimes this is still true. For example, we have this concept called the LLM Arena Pareto Frontier. Now, what this demonstrates is the capability of the model measured by human feedback on the LLM arena compared to the cost per token. As you can see here, typically the more expensive the model is, the better it is, but sometimes you don't need the most expensive model to do your task. So that's where the trade-offs occur. Now, over the past few years, the progress has been immense. As we can see here, some of the earlier large language models and some of the biggest large language models like Gopher 280 billion or PaLM, 540 billion now are outperformed by much smaller models, many of which that are open sourced. Now, in this case, every day and every month, the models keep getting better. So make sure to keep up with the news to see what's changing. Now, you might be wondering, so larger models are better, models are getting better, but how do they actually work? So large language models work by predicting the next word. It's a very simple but powerful concept. For example, a large language model might get the input the cat wore a, and predict hat, or it might also predict sweater. Or if you ask it, what is two plus two? And the LLM would answer two plus two equals four. So large language models while being Gen AI models work in fairly simple ways. Now, how do we train these models and where do we get the data? So to train a large language model, you provide billions or even trillions of words and other pieces of information. From this information, these large language models predict the next word and then compare the predictions to real sentences. Now, most of this data comes from the internet. So many common data sources that are used are Reddit, Wikipedia, and Common Crawl. You can think about Common Crawl as an archive of the internet, archiving each month. Now, up until 2023, most models were trained purely on text data. But since then we've added multimodal capabilities with images, video, and audio now part of the training sets. That's why models like GPT can generate nice images for us. Now, there are some things we should keep in mind. The internet has some nasty stuff. So because of this, we need to make sure that we clean the data before training our model. Otherwise, this toxic content or biases might make it into our training data, which then would make it into our large language model. Now, when you ask a large language model a question, what is it? Well, it's a prediction, meaning it's a guess, and not necessarily the truth. While large language models might sound incredibly confident, friendly, or helpful, we need to make sure that we double check the responses. Now, you might be wondering what is the difference between ChatGPT and these GPT models that you've been mentioning? Now you can think about ChatGPT as being the interface and there being many models that can power it. So if we click here, we can see all the different models while we're on ChatGPT. So originally ChatGPT was the model, but now it's evolved to be a system that runs a variety of different models with different purposes. So there are many other large language models out there that work in many similar ways, but today we'll be using ChatGPT and learning how to make the most out of it.

Contents