There's gotta be a catch here.
Can't think of one. There might be limits and things where it has errors / weirdness to it but the train of AI is accelerating so fast.
I'll quote my post from the other thread but :
The paper is a must read
We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and...
openai.com
"We do this
by arranging patches of Gaussian noise in a spatial grid with a temporal extent of one frame. The model can generate images of variable sizes—up to 2048x2048 resolution."
" Emerging simulation capabilities We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals and environments from the physical world.
These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale."
"Sora serves as a foundation for models that
can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI."
I'm stealing a quote from someone on openAI reddit but he nailed it :
I think a lot of people are looking at this the wrong way. Everyone is thinking... oh cool, it's a video Gen tool.
That's not the main story though.
The real story is the fact that this thing can model the future and past and project it into pixel space using an internal world model and do it very well.
Humans have something like that too. It's called imagination. When you walk around absorbing the data from your eyeballs, you are constantly thinking of what could happen next. When you close your eyes you can imagine it.
Now we have a system that does this quite well. And this is also a key part of making things like truly autonomous cars and robotics a reality. It really is a only a matter of time and getting the right hardware
Unless governments put heavy stoppers on AI, because this tech basically just set the endpoint on the horizon for entire industries to collapse and massive layoffs, but also create entire new ones. It's scary and fascinating at the same time.
And this is the tip of the iceberg. The technology is so new and advancing at such a rapid page there society has no time to react and integrate it for impact to be felt.
We won't even be able to tell what's real or what's AI.
This will put into question why even bother to make powerful rendering hardware with shader pipelines, RT cores, vertices and so on. You'll need such a massive rig and so many artists and money to make what Sora will generate in a few seconds
This is the
old NERF tech to better understand just how insane AI is
AI knows what time of day should look like and since references are all with real life, you effectively get free path tracing quality. Now Sora is like eons ahead of that 3 years old demo.