From the course: Prompt Engineering with Gemini (2024)

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Image recognition and augmentation with Gemini

Image recognition and augmentation with Gemini - Gemini Tutorial

From the course: Prompt Engineering with Gemini (2024)

Image recognition and augmentation with Gemini

- [Instructor] Gemini is a multimodal LLM, meaning you can input text, images, or videos. Let's learn how we can leverage images and texts together. I'm going to analyze an instruction manual about making a hot beverage. We can find it under our exercise files. Let's go to chapter 04, 0401, and open up this image. Going to copy it and paste it into Gemini. I'm also going to grab the prompt, which is under 0401.txt. Let's copy this in and paste it. Now this is called a multimodal prompt because we have both text and an image. So let's go ahead and hit Enter. Now you can see here I've specified how hot does this water need to be and how much do we need. I've also asked to only include information from the image. All right, so based on the image, we need 200 milliliters of water, which looks right here at the top, and that the water should be heated to 92 degrees. So that's pretty good. We got the information extracted from the image. Now, Gemini can be a little overzealous with…

Contents