Day 11 of 28 · ChatGPT Course

Video Generation with Sora

⏱ 6 min 📊 Beginner

Text and images were just the beginning. With Sora, OpenAI's video generation model integrated into ChatGPT, you can now create HD video clips from text descriptions. Sora 2 generates 1080p video up to 60 seconds long — enough for social media content, product demos, explainer clips, and creative storytelling.

Today you'll learn how to describe scenes for video generation, use features like Character Cameos and Video Stitching, and understand what's possible with AI-generated video in 2026.

Sora 2 — what it can do

Sora 2 represents a major leap in AI video generation. Here's what you're working with:

Resolution and length — Full HD 1080p output. Clips can be up to 60 seconds long. You can choose landscape (16:9), portrait (9:16), or square (1:1) aspect ratios depending on your platform.

Visual quality — Sora 2 generates realistic lighting, reflections, shadows, and physics-aware motion. Objects fall naturally, water flows convincingly, and camera movements feel cinematic.

What it handles well:

- Scenic landscapes and establishing shots

- Product showcases and turntable-style rotations

- Abstract and artistic visual sequences

- Character-driven scenes with consistent faces and clothing

- Text overlays and simple title sequences

What to keep in mind:

- Very complex multi-character interactions can sometimes produce artifacts

- Extremely specific hand gestures and fine motor movements are still challenging

- The more precise your description, the better the output

Availability: Sora is available on ChatGPT Plus and Pro plans. Plus users get a monthly generation quota, while Pro users get higher limits and priority processing.

🤖

ChatGPT

online

Generate a 15-second video of a steaming cup of coffee on a wooden table by a window. Morning sunlight is streaming in, casting warm shadows. Slow, cinematic camera push-in. Cozy cafe atmosphere.

I'm generating your video now. Here's what I'm creating: **Scene:** A ceramic coffee cup on a rustic wooden table **Lighting:** Warm morning sunlight from a large window, visible light rays with dust particles **Camera:** Slow push-in starting from a medium shot, ending on a close-up of the steam **Atmosphere:** Cozy cafe, soft bokeh background with blurred shelves and plants **Duration:** 15 seconds at 1080p, 16:9 landscape Your video will be ready in about 30 seconds. I'll share it here when it's done. Want me to add any background music suggestions or text overlay?

↻ Replay conversation

Knowledge Check

What is the maximum video length Sora 2 can generate?

A

10 seconds

B

30 seconds

C

5 minutes

D

60 seconds

Sora 2 can generate video clips up to 60 seconds long at full 1080p HD resolution. You can choose landscape, portrait, or square aspect ratios. This is enough for social media clips, product showcases, and short creative pieces.

How to describe scenes for video generation

Writing prompts for video is different from writing prompts for images. With video, you need to think about motion, time, and camera behavior in addition to visual composition.

The video prompt formula:

1. Subject and action — What's happening? "A woman walks through a field of wildflowers" is better than "a woman in a field" because it includes motion.

2. Setting and time — Where and when? "A neon-lit Tokyo street at night during light rain" gives Sora environment and atmosphere context.

3. Camera movement — How does the camera behave? Options include:

- Static — locked-off, no movement

- Pan — horizontal sweep left or right

- Tilt — vertical sweep up or down

- Push-in/Pull-out — moving toward or away from the subject

- Tracking — following a moving subject

- Drone/Aerial — overhead sweeping shots

- Handheld — slight natural shake for documentary feel

4. Pacing and mood — "Slow and contemplative" vs. "fast and energetic" changes the entire feel.

5. Duration — Specify how long you want the clip: "a 20-second clip" or "a 10-second loop."

🤖

ChatGPT

online

Create a 30-second video: aerial drone shot slowly flying over a modern city skyline at sunset, golden light reflecting off glass skyscrapers, transitioning to a nighttime view where city lights flicker on. Cinematic, epic feel, 16:9 landscape.

Generating your cinematic city transition. Here's the breakdown: **Seconds 0-15:** Aerial drone shot gliding over a modern skyline. Golden-hour sunlight reflects off glass facades. Warm amber and orange tones dominate. **Seconds 15-25:** The sky transitions through purple and deep blue as the sun sets. Building lights begin flickering on one by one. **Seconds 25-30:** Full nighttime cityscape. Streets glow with headlights and neon. The drone continues its slow forward path. **Technical:** 1080p, 16:9, cinematic color grading, smooth drone movement with no abrupt changes. This will be ready shortly. Want me to generate an alternative version — perhaps a different city style or weather condition?

↻ Replay conversation

Knowledge Check

Which element is essential in video prompts but NOT typically needed for image prompts?

A

Color palette

B

Lighting conditions

C

Subject description

D

Camera movement and direction

Video prompts require camera movement instructions — pan, tilt, push-in, tracking, drone shots — because video is about motion through time. Image prompts only need a static composition. Describing how the camera moves is what separates a dynamic, cinematic clip from a boring static shot.

Character Cameos — consistent characters across videos

One of Sora 2's standout features is Character Cameos. This solves one of the biggest problems in AI video: keeping the same character looking consistent across multiple clips.

How it works:

1. Create a character — Describe a character in detail: appearance, clothing, distinctive features. Sora generates a reference and saves the character profile.

2. Reuse in any video — Reference that character by name in future prompts. Sora will maintain the same face, body type, and clothing across all clips.

3. Change outfits and settings — You can put the same character in different scenarios, outfits, and environments while keeping their identity consistent.

Why this matters:

Before Character Cameos, every new video generation might produce a completely different-looking person, even with the same description. This made it impossible to create a series of related videos — the "main character" would look different in every clip.

Now you can create an entire ad campaign, tutorial series, or short film with the same recognizable character throughout.

Practical uses:

- Marketing campaigns with a consistent brand spokesperson

- Tutorial series with a recurring instructor character

- Social media content series with a recognizable host

- Short narrative films with returning characters

Knowledge Check

What problem does the Character Cameos feature solve?

A

It makes videos generate faster

B

It adds voice acting to characters

C

It allows real people to appear in videos

D

It keeps AI-generated characters looking the same across multiple video clips

Character Cameos solve the consistency problem. Without it, generating "a woman in a red jacket" in two separate videos would produce two completely different-looking people. With Cameos, you create a character once and reuse them across any number of videos with consistent appearance.

Video Stitching — narrative storytelling

Video Stitching lets you connect multiple clips into a coherent narrative sequence. Instead of generating one isolated scene, you can create a series of scenes that flow together as a story.

How to use Video Stitching:

1. Plan your sequence — Outline the scenes you want: Scene 1 (establishing shot), Scene 2 (character introduction), Scene 3 (action), Scene 4 (resolution)

2. Generate each scene — Create each clip with prompts that reference the overall narrative context

3. Stitch them together — Sora connects the clips with smooth transitions, matching the color grading, lighting continuity, and motion flow between scenes

Transition types available:

- Cut — direct jump between scenes

- Dissolve — smooth blend from one scene to the next

- Fade to black — classic cinematic transition

- Match cut — transitions where a shape or motion in one scene matches the next

Example narrative workflow:

Scene 1: "Wide establishing shot of a bakery storefront at dawn, warm light glowing from inside, 5 seconds"

Scene 2: "Interior close-up of hands kneading bread dough on a flour-dusted counter, 10 seconds"

Scene 3: "Time-lapse of bread rising and baking in an oven, golden crust forming, 8 seconds"

Scene 4: "A customer smiling as they receive a fresh loaf across the counter, 7 seconds"

Stitched together, this becomes a 30-second brand story for a local bakery — all generated from text.

Final Check

You want to create a 45-second promotional video for a fitness app with a consistent main character across 4 scenes. Which Sora features would you use?

A

Both Character Cameos (for consistent character) and Video Stitching (for connected scenes)

B

Video Stitching only

C

Only basic video generation with detailed prompts

D

Character Cameos only

You'd use Character Cameos to create a consistent main character who looks the same in all 4 scenes, and Video Stitching to connect those scenes into a smooth, coherent 45-second narrative. Together, these features let you create professional-quality promotional content.

🎬

Day 11 Complete

"Sora turns your words into cinema. Describe the scene, direct the camera, and let AI roll the footage."

Tomorrow — Day 12

Writing Like a Pro

Master professional writing with ChatGPT — emails, proposals, blog posts, and the revision workflow that produces polished results.

Continue to Day 12 →