You’ve probably had this experience. You type something into Midjourney or DALL-E, hit generate, and get back an image that looks nothing like what you had in mind. Maybe the composition is off, the style is random, or the AI decided your “portrait of a woman” should have seven fingers and a melting face.
The problem usually isn’t the model. It’s the prompt. Knowing how to write AI image prompts is the single biggest factor in whether you get usable results or spend an hour regenerating garbage. And once you learn the formula, it applies to every text to image tool out there.
This guide gives you a repeatable system. Not vague advice about “being more descriptive” but an actual structure you can copy, fill in, and use right now.
The prompt formula that works every time
After months of testing across Midjourney, DALL-E 3, Stable Diffusion, and Ideogram, one formula consistently produces better results than freeform writing. Here it is:
[Subject] + [Setting] + [Style/Medium] + [Lighting] + [Camera/Composition] + [Quality modifiers]
Not every prompt needs all six elements. But the more you include, the more control you have over the output. Let’s break each one down with examples you can actually use.
Subject is what the image is about. Be as specific as possible. “A dog” gives you anything from a cartoon poodle to a stock photo of a lab. “A scruffy border collie with one blue eye and one brown eye” gives you something you can actually use. Age, breed, material, texture, expression. These details are free and they dramatically narrow what the model produces.
Setting is where the subject exists. “In a foggy forest at dawn” is 100x better than leaving this blank and letting the AI pick a random background. Time of day matters a lot here because it directly affects the lighting and mood of the final image.
Style and medium is arguably the most powerful element. “Oil painting”, “35mm film photography”, “Studio Ghibli illustration”, “3D render in Unreal Engine 5” are all commands the model understands because they appeared alongside matching images in training data. Skip this and the model defaults to whatever it feels like.
Lighting changes everything. “Soft diffused light” produces a completely different feel than “harsh overhead fluorescent lighting” or “golden hour backlight.” Photographers spend years mastering light. You get to specify it in three words.
Camera and composition includes angle, lens type, and framing. “Close up shot”, “wide angle from below”, “shot on Hasselblad 80mm” are all valid modifiers. The model has learned what these look like from millions of real photographs.
Quality modifiers go at the end. “Highly detailed”, “8K resolution”, “award winning”, “masterpiece” might sound like fluff but they genuinely push the output toward higher fidelity. These terms were associated with high quality images in the training data, so the model treats them as quality signals.
Five templates you can copy and fill in
Knowing the formula is one thing. Having ready made templates speeds everything up. Here are five you can paste directly into any image generator and swap out the bracketed parts.
Template 1: Realistic portrait
Close up portrait of [age + gender + distinguishing features], [expression], [lighting type], shot on [camera], [photography style], [color palette]
Example: Close up portrait of a woman in her 60s with deep smile lines and silver hair pulled back, warm gentle expression, soft window light from the left, shot on Canon 5D Mark IV with 85mm f/1.4, editorial portrait photography, warm muted tones
Template 2: Environment or landscape
[Type of place] at [time of day], [weather/atmosphere], [specific details], [art style or photography type], [mood descriptor]
Example: A coastal fishing village at dawn, thick morning fog rolling in from the sea, colorful wooden boats moored along a stone dock, 35mm film photography, nostalgic and quiet mood
Template 3: Product or object
[Object] on [surface/background], [lighting setup], [camera angle], [rendering style], clean composition, [quality modifiers]
Example: A ceramic coffee mug with a speckled glaze on a marble countertop, soft studio lighting from above, slight overhead angle, product photography, clean composition, highly detailed, 4K
Copy paste prompt templates Fill in the brackets and generate | |||||
|
Template 4: Fantasy or concept art
[Character or creature] in [fantasy world/setting], [action or pose], [art style], [atmosphere], [specific details]
Example: A frost giant standing at the edge of a frozen lake, holding a glowing blue war hammer, concept art style, ominous twilight atmosphere, intricate ice armor with rune engravings, highly detailed
Template 5: Abstract or artistic
[Concept or emotion] expressed through [art movement], [color palette], [texture description], [medium]
Example: Solitude expressed through abstract expressionism, deep blues and muted grays, thick impasto brush strokes, oil on canvas, gallery quality
Words that actually change the output
Not all words in a prompt carry equal weight. Some terms reliably shift the output in specific directions. Here are the categories worth knowing.
Lighting words have massive impact. “Rembrandt lighting” gives you dramatic shadows on one side of a face. “Rim lighting” creates a glowing outline around the subject. “Overcast diffused light” softens everything. “Neon glow” pushes toward cyberpunk aesthetics. If you only add one extra detail to your prompts, make it lighting.
Camera words work because the training data included EXIF data and photographer descriptions. “Shot on Leica M6” gives a specific film look. “Macro lens, f/2.8” gets you extreme close ups with creamy bokeh. “Drone shot, bird’s eye view” changes the entire perspective. These aren’t random words. They map directly to visual patterns the model has learned.
Art style words are your biggest creative lever. “Impressionist” gives you loose brushwork and soft edges. “Ukiyo-e” produces Japanese woodblock print aesthetics. “Bauhaus poster” gives flat geometric shapes. “Pixar 3D render” gives you that smooth animated look. The more specific you are about the style, the less the model has to guess.
Mood and atmosphere words influence color grading and composition. “Ethereal” tends to produce lighter, dreamier images. “Gritty” pushes toward darker tones and rougher textures. “Serene” usually means softer colors and calmer compositions. These are subtle but they add up.
The negative prompt: what to exclude
Most image generation tools let you specify what you don’t want. This is called a negative prompt and it’s just as important as what you include. Without it, you’re inviting every possible artifact the model can produce.
A solid default negative prompt for most use cases looks something like this:
blurry, low quality, watermark, text, extra fingers, mutated hands, deformed, ugly, duplicate, cropped, out of frame
You don’t need to memorize this. Just save it somewhere and paste it into the negative prompt field every time. In Stable Diffusion, negative prompts are practically mandatory for clean results. In Midjourney v6+, the model handles most of these issues internally but you can still use the --no parameter to exclude specific elements like “no text, no people, no watermark.”
The key insight is that negative prompts work best when they’re specific to your use case. Generating portraits? Add “cross eyed, asymmetric face” to your negatives. Generating architecture? Add “impossible geometry, floating objects.” Tailor them to the common failure modes of whatever you’re creating.
Mistakes that waste your time
Writing a novel instead of a prompt. More words doesn’t always mean better results. After about 60 to 75 words, most models start ignoring or deprioritizing the later parts of your prompt. Front load the important stuff. Subject and style should come first, quality modifiers last.
Contradicting yourself. “A bright sunny day with dark moody shadows” confuses the model. “A minimalist room packed with detailed objects” is a contradiction. If you want contrast, be explicit: “A brightly lit room with one dark shadowy corner.”
Repeating words for emphasis. Writing “very very very detailed” does nothing extra. Each word should add new information. If you’ve already said “highly detailed,” the model got the message.
Skipping the style declaration. Without a style, the AI picks one at random. Sometimes it’s photo realistic. Sometimes it’s a weird digital painting. Sometimes it’s clip art. Always declare what you want: photography, illustration, 3D render, watercolor, whatever fits your project.
Common prompt mistakes vs fixes | |||||||||||
|
| ||||||||||
How to get better fast
The fastest way to improve at writing AI image prompts is to change one variable at a time. Take a prompt that produced a decent result and swap just the lighting. Then swap just the style. Then just the camera angle. You’ll quickly learn which words carry the most visual weight in each model.
Keep a prompt journal. Nothing fancy. A spreadsheet with three columns works: the prompt, the model you used, and a note about what worked or didn’t. After a few dozen entries you’ll start seeing your own patterns emerge. Maybe you’ll discover that “shot on Portra 400 film” consistently gives you the vintage look you want, or that “Studio Ghibli watercolor” works beautifully in Midjourney but falls flat in DALL-E.
Pay attention to prompt sharing communities on Reddit and Discord. People post their exact prompts alongside results, which lets you reverse engineer what works. But don’t just copy and paste. Adapt what you find to your own projects and keep testing. That’s how you build the intuition that turns prompt writing from guesswork into a reliable skill.

