Wondering if your AI agent can process images? Not all models support image input, but multimodal models do! Learn how to find image-capable models using Microsoft’s AI Toolkit, test them in the Playground, and ensure your agent handles visual data effectively. Unique :

Can You Show Your AI Agent a Picture? Here’s What You Need to Know
If you’re building an AI agent, you might wonder: can it actually process images like screenshots or photos? The quick answer is yes—but only if your model supports it. Let’s break down what’s new and important in the world of multimodal AI agents.
What’s New: Image Input for AI Agents
Image input means sending a non-text file—like a PNG or JPG—into your AI prompt. The model then analyzes or interprets it. This could involve describing the image, extracting text, or even answering questions about a chart.
However, not all AI models can handle images. Most base language models are text-only. Platforms usually hide image upload options if the model doesn’t support visual data. So, if you don’t see an image upload feature, it’s likely a model limitation, not a bug.
“Not all models support image input—you’ll need a multimodal model specifically built to handle visual data.”
Major Updates: Multimodal Models and How to Find Them
Multimodal models are trained to understand both text and images. Think of them as bilingual—able to “speak” text and visual languages. These models open up exciting possibilities for AI agents that can see and interpret images.
To find these models, Microsoft’s AI Toolkit offers a handy Model Catalog. You can filter models by features like “Image Attachment” to quickly spot which ones support image input.
Here’s a quick way to find image-capable models:
- Open the Model Catalog in the AI Toolkit panel inside Visual Studio Code.
- Use the Feature filter near the search bar.
- Select “Image Attachment” to see all compatible models.
- Test your chosen model in the Playground before integrating it.
Why Testing Matters Before You Build
Before wiring an image-capable model into your AI agent, test it solo. Upload an image in the Playground and try prompts like:
- “Describe the contents of this image.”
- “Summarize what’s happening in this screenshot.”
If the model supports images, you’ll see relevant responses. If not, double-check your model selection.
“Test the model in the Playground before integrating it into your agent to make sure it behaves the way you expect.”
Wrapping It Up
Not every AI model can process images. You’ll need a multimodal model designed for visual data. Use the AI Toolkit’s Model Catalog to filter and find these models. Always test first in the Playground to avoid surprises later.
Ready to dive deeper? Check out Microsoft’s Build an Agent series and Model Mondays to sharpen your AI skills. Plus, join the Azure AI Foundry Discord to chat with experts and stay updated.
With the right tools and models, showing your AI agent a picture is not just possible—it’s a game changer.
From the Microsoft Developer Community Blog articles