Join now Sign in

From the course: Build with AI: Building a Project with the ChatGPT API

Explore pricing and rate limits

From the course: Build with AI: Building a Project with the ChatGPT API

Explore pricing and rate limits

“

- [Narrator] Now that you understand what tokens are, let's talk about how they impact your wallet and your app's performance. When you're building with OpenAI's API, understanding how usage is calculated can help you make smarter decisions about how your app runs, scales, and stays within budget. First up, pricing. Open AI charges based on the number of tokens processed. That includes both the input tokens you send and the output tokens the model returns. Different models have different pricing tiers. So let's take a quick look at the current rates. You can find the latest pricing at openai.com/api/pricing. But as of this recording, here's a basic breakdown. GPT is more powerful, but also more expensive, and GPT nano is fast and the most cost effective for low latency task. You can continue to scroll down this page looking at the different model types and how it's processed and priced. For example, there are different charges when you're fine tuning your models, and the APIs themselves often have different pricing points. So this is a great page to be familiar with as you're planning your application. I've navigated to the Jupyter Notebook for this video, and let's say you send a prompt in with 200 tokens. Let's take a look at the price point. Let's say your input tokens, 200, and you receive 300 back, so that's a total of 500 tokens. Here's the formula for the input cost and the formula for the output cost, and let's calculate the total cost. So when looking at 500 tokens, here is the estimated costs. This might seem small, but if your app is serving thousands of users or running frequent jobs, it adds up. Optimizing token usage, as we've covered, is one way to manage costs. Now, let's talk about rate limits. OpenAI applies rate limits per organization, and they vary by model. These limits are defined in three ways, requests per minute, RPM, tokens per minute, TPM, and concurrent request. For example, you might be allowed 3,500 RPM and 350,000 TPM for GPT-3.5 Turbo, but only 500 RPM and 50,000 TPM for GPT-4. These limits help ensure fair access across users, but they also affect how your application performs under load. If you exceed your limit, you'll get a 429 rate limit exceeded error. To avoid that, you can throttle requests in your code, or cue them using a retry mechanism. In the next lesson, we'll dive even deeper, breaking down best practices that will help you get the most from the OpenAI API, while keeping your app fast and efficient.

Contents