Hands-On with Sora 2: Why OpenAI’s Video ChatGPT Moment Changes Everything

OpenAI just announced, out of nowhere in the middle of the night, that Sora 2 is here.

image-72ed6fd2bdb5

After watching the livestream, I'm wide awake. No way I'm sleeping now.

I've always said that when we work on AI film industrialization, we have one principle: AI-generated segments don't make it into the final cut. They're just assistance, pure assistance.

Heavy sigh. The wheels of progress keep rolling forward.

Who knows what this world will look like in 2027?

Sora 2, described in OpenAI's own words:

"With Sora 2, we are jumping straight to what we think may be the GPT‑3.5 moment for video."

The ChatGPT moment for AI video has officially arrived.

The new Sora 2 is a native video and audio generation model.

Note the wording carefully: it's a video AND audio generation model. This isn't just a pure video model anymore.

Similar to Veo3, but from the demos we've seen so far, the overall quality blows Veo3 out of the water.

Let's look at their official promo video.

This quality has me literally on my knees. I'm speechless.

GPT-5 was a letdown, but Sora 2 is the king's return.

Classic OpenAI.

This time they launched two things: the Sora 2 model and the Sora app.

Sora 2 is basically the new SOTA, while the Sora app has bigger ambitions. They want to build the AI TikTok for the new era, and it's genuinely creative. It's socially driven, reminding me of the old days with Faceu.

Let's break this down piece by piece.

I. Sora 2

Current AI video models are all competing in the same areas: physical movement, character performance, consistency, and audio.

Physical movement is straightforward: realism. Extreme realism. As real as reality itself.

Sora 2 has made tremendous progress in this area.

We've always said that sports, especially gymnastics and ball games, are like the Turing test for AI video models. Almost no AI video model can handle them well, not even Hailuo 02 and Kling 2.5, which only succeed partially or locally.

But Sora 2 is absolutely ridiculous. It can complete Olympic gymnastics routines, do backflips on paddleboards, and even play volleyball.

Prompt: A gymnast flipping on a balance beam, cinematic quality.

image-f4cee6486469

Compare that to the original Sora 1 from back in the day.

image-f66253c20724

Honestly, this improvement is so obvious it hurts. A year and a half, but it feels like forever has passed.

Prompt: A skateboarder does a backflip.

image-c0db8d62d752

Prompt: A man jumps off a diving board doing a cannonball.

image-9604fe3f1696

These are GIFs I've posted, but don't forget these actually have sound.

Like this volleyball scene.

And here's the paddleboard backflip.

The audio is nearly flawless, extremely realistic, and the volleyball movement is spot on too.

This is honestly the best sports quality and physics I've seen so far.

The anime style looks pretty good too.

Now let's talk about character performance, which really needs to be discussed alongside consistency and multimodal capabilities.

If we're talking about character performance without dialogue, most models are pretty much neck and neck now. But character performance WITH dialogue—what we call AI actors or digital humans—that's where everyone's competing now. So you need the multimodal audio capabilities.

Now, you can verify your identity in Sora and generate an avatar of yourself as a fixed digital ID.

You can then directly use this character for specific person generation.

Combined with fixed characters, incredibly realistic performance, and nearly perfect audio generation, having AI create actual story films is no longer a fantasy.

This collision moment looks very TikTok-style, but also incredibly realistic.

Prompt: @daniel playing trumpet in the middle of a zebra herd.

Prompt: @daniel and @duxin having an arm wrestling match, you decide who wins.

The character performance, expressions, and cinematography are indistinguishable from real video. The audio quality is also current SOTA.

Environmental sounds, wind, collisions, even multiple people in the same clip—you can't find audio errors.

And you can see that @daniel looks virtually identical across both video clips.

ID consistency has been perfectly preserved.

From the code, it looks like Sora 2 has two models.

image-846a4b954c60

Sora 2 and Sora 2 Pro, probably similar to Kling's standard and high-quality versions.

Currently, the frustrating part is that while they prioritize ChatGPT Pro users, it's only available in the US and Canada, and they've implemented the dreaded invitation code system.

I managed to get an invitation code through a friend, but honestly, the barrier to entry is still high...

image-95813ca92f62

First-batch users get 4 invitation codes to share with friends, because the team believes this app works best in social settings and could even become a new way of messaging...

The currently available version, honestly, isn't very useful. It's been significantly limited. When generating, you can barely select any parameters.

image-bbd82ccba9d2

Only landscape and portrait options, direct 10-second video output, disappointingly low resolution—360P.

image-6f8bea49c7b1

We'll have to wait for OpenAI to update it later.

II. The Sora App

This time, the Sora product itself became the focus.

The web version has been updated, iOS version is live on the US App Store, but no Android yet.

image-65da1dfe6911

But as I mentioned above, it's invitation-only. Users without invitation codes can't get in.

Let me help you understand what this product actually is.

First, let's look at their video.

If I had to summarize in one sentence: this looks like AI TikTok.

Users can scroll through AI videos created by other users, like, share, follow, and do all the usual social media activities. The entire interaction and UI are exactly like TikTok.

But the most interesting feature is actually the "cameos" function.

This image shows a standard cameo interface. Think of each avatar as a cameo. When generating videos, you can @ them to have specific characters perform.

For example, the first one is me. You could @ rockhazix and have me join the second person, Sam, for an interesting dinner at a cool restaurant.

That's cameo—having your friends guest star in your videos, co-starring together.

OpenAI believes the Sora app was made purely for use with friends.

They say testers' overwhelming feedback shows that cameo is what makes this app different and fun—it's a novel and unique way to communicate with people.

But OpenAI has strict limitations here.

When creating your own cameo, you have to go through very complex identity verification.

You'll record dynamic audio prompts, complete random audio challenges, then pass liveness detection to ensure the person in front of the phone is actually you.

After recording, you can adjust how the model presents you through Cameo preference settings.

Once everything's set up, you can @ yourself when creating.

For example, I @ myself and Altman for dinner together.

Final Thoughts

Finally, let me spend some time discussing this AI TikTok, the Sora app.

Honestly, I can't quite figure out this product.

Every AI video feed that's been tried before—every single one—either stays lukewarm or gets buried in the dust of history.

Because you can't solve a fundamental problem: creators post content hoping for traffic and positive feedback. So why would I post the same video on your platform instead of TikTok?

Plus, do regular users actually care if something's AI video? Nobody cares. Technology serves content. If your content is good, users don't care whether it's hand-drawn, CG, live-action, or AI.

So I've always thought that going to a new product with barely any ecosystem just to see AI content is a false premise.

But Sora app is different this time. Because of the model's massive leap forward, they created this cameo thing, turning what could have been AI TikTok into a social product.

Right, the Sora app—while everyone calls it AI TikTok—isn't really AI TikTok at all. This is a social product centered on pranking friends and being absurd.

Kind of like Snapchat or Faceu back in the day.

Many new social products share a common pattern: rapid growth followed by equally rapid decline.

There was this product called BeReal that exploded in 2022. It forced all users to take a photo at the same time every day using both front and back cameras, quickly sweeping through young people in Europe and America, topping download charts.

But less than a year later, once users lost the novelty, it quickly hit rock bottom and has now faded from mainstream view.

AI video + cameo is certainly a completely new species. Plus AI gives everyone creative power. Starting with pranking friends and remixing could create a thriving ecosystem.

But it could also lead to serious community homogenization and eventual disappearance.

Honestly, I can't see the future of this Sora product clearly.

All I can say right now is this:

Let's just start playing with it!