Images, Audio & Video
Unleash creative expression with generative AI while understanding how it works and the limits
What You'll Learn
- ✓The intuition behind diffusion models and how they generate images
- ✓Why prompts for images differ from text prompts
- ✓How negative prompts filter out unwanted artifacts
- ✓Techniques like style referencing, prompt composition, inpainting, and outpainting
- ✓The strengths and limitations of Midjourney, DALL-E, and Stable Diffusion
- ✓Combining generative tools with manual editing for polished results
- ✓Professional applications and workflows for marketing, product visualization, and multimedia
- ✓Legal and copyright considerations: data provenance, fair use, and licensing
Key Ideas
Diffusion models are generative algorithms that progressively add noise to training data and then learn to reverse the process to reconstruct or synthesize new data. During training, a neural network is taught to introduce noise to the data in a forward diffusion process, then to reverse the diffusion to generate outputs. Examples include DALL-E 2, Midjourney, and Stable Diffusion.
Examples:
- • DALL-E 2: Strong at following complex prompts and generating realistic compositions
- • Midjourney: Known for artistic and stylized output; excels at atmospheric and cinematic scenes
- • Stable Diffusion: Open-source model enabling local control, custom training, and fine-tuning
Text-to-image prompts describe visual elements (objects, composition, lighting, style) rather than specifying roles or tasks. Because diffusion models map text embeddings into visual latent space, the wording of prompts influences composition, color palette, and style. Negative prompts allow you to specify what you don't want in the image (e.g., 'blurry,' 'extra limbs,' 'low resolution'), acting as soft constraints.
Examples:
- • Positive: 'A futuristic city skyline at golden hour, ultra-wide angle, neon lights'
- • Negative: 'no people, no text, clean background, no blur'
- • Style: 'inspired by Syd Mead, digital art, 8K resolution'
Dive Deeper
Explore the mechanism, mastery techniques, and critical thinking considerations. Click to expand each layer.
How Diffusion Works
During training, random noise is gradually added to data samples. The model learns to predict and remove this noise step-by-step in reverse order. This two-phase process (forward diffusion and reverse denoising) enables the model to learn the probability distribution of complex data. In inference, the model starts with pure noise and iteratively denoises it using the learned weights to produce an image.
Key Points:
- •Negative prompts and guidance scales: Negative prompts filter out unwanted elements by reducing attention on specified features during generation
- •Inpainting and outpainting: Inpainting fills in missing regions based on surrounding pixels and a prompt. Outpainting extends an existing image beyond its original borders
- •Guidance scale parameters allow you to control how strongly the model follows the prompt versus the learned distribution
Advanced Techniques
Style references and consistency: Provide an image as a style reference along with your prompt to maintain visual consistency across multiple outputs. Control the random seed and guidance scale to reproduce similar compositions.
Techniques:
- •Prompt composition: Combine descriptive phrases, artistic styles, camera settings, and aspect ratios
- •Advanced syntax: Use '+' to join concepts ('forest + cyberpunk city'), specify negative prompts
- •Combining tools: Generate base image → upscale → refine with manual editing → AI enhancer
- •Professional applications: Marketing material creation, product visualization, multimedia storytelling
Legal and Ethical Considerations
Diffusion models are trained on vast datasets, some of which may include copyrighted images without permission. Understand the legal landscape in your jurisdiction: fair use, transformative works, and derivative rights are complex and evolving. When using AI-generated assets commercially, verify licensing terms, credit sources when required, and consider using models trained on curated, rights-cleared datasets.
Considerations:
- •Bias and representation: Image models reflect biases in their training data and may under-represent certain cultures or reinforce stereotypes
- •Environmental impact: Training and running diffusion models require significant computational resources
- •Copyright issues: Verify licensing terms and understand data provenance
Suggested Resources
Microsoft
Midjourney
Try This Now
Put your learning into practice with these hands-on exercises. Copy the prompts and try them in your favorite AI tool.
Generate 'cat in a hat' in three styles: (1) oil painting, (2) photorealistic, (3) pixel art. Use negative prompts: 'blurred background, extra limbs, low quality'
Choose a famous painting style (e.g., Van Gogh's Starry Night) and generate modern scenes in that style
Take a street scene and remove a car. Evaluate how well the model fills in the missing area with natural surroundings
Related Prompts from the Library
Practice what you've learned with these prompts from our library.
ROLE: You are a assistant. GOAL: Generate a 7-minute video script for a YouTube video about our newest <product/service description> and <targeted audience>. CONTEXT: Input details: describe your audience, describe your product. Ask clarifying questions if any input details are missing. TASK: Generate a 7-minute video script for a YouTube video about our newest <product/service description> and <targeted audience>. Product/service description = [describe your product]. Targeted audience = [describe your audience] OUTPUT FORMAT: Script CONSTRAINTS:
View Prompt →ROLE: You are a video producer. GOAL: Design a referral program that incentivizes current customers to share our <product/service> with their network "I want you to act as a video producer. CONTEXT: Ask clarifying questions if any input details are missing. TASK: Design a referral program that incentivizes current customers to share our <product/service> with their network "I want you to act as a video producer. You will produce a video that showcases <company/product/service> in OUTPUT FORMAT: Text CONSTRAINTS:
View Prompt →ROLE: You are a assistant. GOAL: Develop a video series that showcases the features and benefits of our <product/service>, while also addressin. CONTEXT: Ask clarifying questions if any input details are missing. TASK: Develop a video series that showcases the features and benefits of our <product/service>, while also addressin OUTPUT FORMAT: Text CONSTRAINTS:
View Prompt →ROLE: You are a assistant. GOAL: Design an infographic that visualizes the key benefits and features of <product/service> in a simple and easy to. CONTEXT: Ask clarifying questions if any input details are missing. TASK: Design an infographic that visualizes the key benefits and features of <product/service> in a simple and easy to OUTPUT FORMAT: Text CONSTRAINTS:
View Prompt →ROLE: You are a assistant. GOAL: Write a video script that showcases the features and benefits of our latest <product/service>, and includes cust How can I improve my website's search engine ranking to drive more organic traffic? I've been creating content for my website, but I'm not sure how to measure its effectiveness. CONTEXT: Ask clarifying questions if any input details are missing. TASK: Write a video script that showcases the features and benefits of our latest <product/service>, and includes cust How can I improve my website's search engine ranking to drive more organic traffic? I've been creating content for my website, but I'm not sure how to measure its effectiveness. What metrics shou OUTPUT FORMAT: Script CONSTRAINTS:
View Prompt →ROLE: You are a assistant. GOAL: Create an infographic that visually displays the key findings of our latest consumer survey, and offers insights in As a simulated expert in <content creation>, having graduated from <Columbia University> and working in <co. CONTEXT: Ask clarifying questions if any input details are missing. TASK: Create an infographic that visually displays the key findings of our latest consumer survey, and offers insights in As a simulated expert in <content creation>, having graduated from <Columbia University> and working in <co OUTPUT FORMAT: Text CONSTRAINTS:
View Prompt →Reflection Questions
- 1.How do diffusion models differ from GANs and VAEs? What are the trade-offs in image quality and training complexity?
- 2.What ethical considerations should guide the use of AI-generated imagery in marketing and media?
- 3.How could generative audio and video tools transform your work or hobby projects?