Hi
I’m wondering if anyone here has experimented with creating an AI that generates video and scripts together, rather than generating one based on the other?
The current ai workflow in relation to scipting at the moment seems to be:
- Start with a video
- Running a model that analyses the video
- Model generates a script based on what it detects
So the script is based on interpretation of visuals.
So instead of interpreting what’s happening in a clip, if the AI is responsible for both visual generation and script logic, it would know the intended actions, movement, and timing instead of having to guess afterwards. This seems especially relevant for multi-axis!
I’m sure this is something we will see emerging in the future
Interested to know people’s thoughts or if this is “in the works” / already exists?