From the course: Build with AI: Building a Project with the ChatGPT API

Convert text and speech with the Transcriptions API

From the course: Build with AI: Building a Project with the ChatGPT API

Convert text and speech with the Transcriptions API

- [Instructor] So far you've learned how to make your app talk, but what if it could also listen? Let's explore OpenAI's transcriptions API powered by Whisper to convert spoken audio into text. Whether you're building a voice-enabled assistant, transcribing meeting recordings, or enabling voice search, this API makes it easy to turn audio into usable, searchable content. Let me show you how it works. I've navigated to the Jupyter notebook. The first few lines you are familiar with, importing the necessary libraries. This section here I'm reading my API key from a local environment variable and setting up the OpenAI client with that key. Here in Section 2, this is where we upload and transcribe an audio file. In this line here, I'm setting the OS path directory name and the name of the file in this case, which is speech.mp3. Let's open and listen to that file. - [AI] Welcome to your AI-powered app. Let's get started. - [Instructor] I'm assigning that file to this audio file variable and opening the file. And here I am using the client to call the transcriptions API, the create function, passing in that file and also selecting the model, which is GPT-4o transcribe, and the response will be stored in this transcription variable. On this line here I am printing the text from that transcription variable, and as you noticed, I've already executed this code and the output is what we expect. Welcome to your AI powered app. Let's get started. Now let's try it with a different language. The code is very similar. This time I am pointing to this LinkedIn learning file that is in Italian. Let me play this file for you. (AI speaking in Italian) - [Instructor] Now let's see if the API can transcribe Italian. I'm opening that file and storing it in this audio file variable, code is very similar to before. I'm using the client to call the transcriptions API the create function using the GPT-4o transcribe model and passing in the audio file stored in this transcription variable. And here I am printing the text. And if you notice down here it transcribed that file and printed out the Italian. With the transcriptions API, your app can take in speech and turn it into text automatically, reliably, and in multiple languages. Some common use cases include auto transcribing customer support calls, enabling voice notes or journals, indexing audio content for search and supporting multilingual apps with speech input. In the next video, we'll shift gears and explore how to use embeddings to add meaning and structure to your unstructured text.

Contents