Multimodal Prompting
Learn to prompt AI models with text, images, audio, and video. Combine modalities for richer interactions and better results.
Multimodal Prompting
Multimodal prompting combines text with images, audio, or video to give AI models richer context. Modern models like GPT-4o, Claude 3.5, and Gemini can process multiple input types simultaneously, enabling more natural and capable interactions.
Text + Image Prompting
Image Analysis
[Attach image]
What objects are in this image? List them with their approximate positions.
Image Comparison
[Attach image 1]
[Attach image 2]
Compare these two designs. Identify:
1. Key differences
2. Which follows better UX principles
3. Specific improvements for each
Code from Screenshot
[Attach screenshot of code or UI]
Convert this to working code. Include:
- Exact layout structure
- All text content
- Styling details
Text + Audio Prompting
Transcription + Analysis
[Attach audio file]
1. Transcribe the audio
2. Identify key points discussed
3. Extract action items with owners
4. Note any decisions made
Voice Instructions
[Attach voice memo]
Based on these voice notes:
1. Create a structured outline
2. Fill in missing details where unclear
3. Suggest additional points to consider
Best Practices
Image Prompting
- Be specific about what you want analyzed
- Reference specific parts of the image when needed
- Provide context for ambiguous images
- Use high-quality, clear images
Audio Prompting
- Specify if you need verbatim or summary
- Note the language if not English
- Indicate speaker identification needs
- Mention background noise handling
Modality Combinations
| Combination | Use Cases |
|---|---|
| Text + Image | Design review, code conversion, visual Q&A |
| Text + Audio | Meeting notes, voice memos, transcription |
| Text + Video | Content analysis, tutorial creation |
| Image + Text + Audio | Comprehensive documentation |
Prompt Templates
Image Description:
Describe this image in detail, covering:
- Main subjects and their attributes
- Setting and background
- Colors, lighting, and mood
- Any text visible in the image
Visual Comparison:
Compare these two images focusing on:
1. Structural differences
2. Color and style variations
3. Quality and clarity
4. Which better achieves [stated goal]
Audio Summary:
From this audio recording:
1. Provide a 3-sentence summary
2. List key topics discussed
3. Extract direct quotes for important points
4. Identify any unresolved questions
Related Articles
Social Media Content Prompts for ChatGPT
Master social media content creation with ChatGPT. Platform-specific templates, engagement strategies, and proven prompts for Twitter, LinkedIn, Instagram, and more.
ChatGPT Best Practices
Comprehensive guidelines for effective prompt writing and interaction with ChatGPT.
Monochromatic & Black and White SREF Codes
Single color explorations and high-contrast black and white minimalist aesthetics.