From the course: Prompt Engineering with Gemini (2024)
Unlock the full course today
Join today to access over 24,900 courses taught by industry experts.
Image recognition and augmentation with Gemini - Gemini Tutorial
From the course: Prompt Engineering with Gemini (2024)
Image recognition and augmentation with Gemini
- [Instructor] Gemini is a multimodal LLM, meaning you can input text, images, or videos. Let's learn how we can leverage images and texts together. I'm going to analyze an instruction manual about making a hot beverage. We can find it under our exercise files. Let's go to chapter 04, 0401, and open up this image. Going to copy it and paste it into Gemini. I'm also going to grab the prompt, which is under 0401.txt. Let's copy this in and paste it. Now this is called a multimodal prompt because we have both text and an image. So let's go ahead and hit Enter. Now you can see here I've specified how hot does this water need to be and how much do we need. I've also asked to only include information from the image. All right, so based on the image, we need 200 milliliters of water, which looks right here at the top, and that the water should be heated to 92 degrees. So that's pretty good. We got the information extracted from the image. Now, Gemini can be a little overzealous with…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Image recognition and augmentation with Gemini1m 41s
-
(Locked)
Creative image generation1m 57s
-
(Locked)
Analyzing a multi-modal document with Gemini4m 16s
-
(Locked)
Searching and summarizing a YouTube video with Gemini2m 14s
-
(Locked)
Challenge: Comparing two world wonders28s
-
(Locked)
Solution: Comparing two world wonders5m 52s
-
(Locked)
-