AI video + AI scripts (created together in one process)

ol1262222 · March 14, 2026, 9:35pm

Hi

I’m wondering if anyone here has experimented with creating an AI that generates video and scripts together, rather than generating one based on the other?

The current ai workflow in relation to scipting at the moment seems to be:

Start with a video
Running a model that analyses the video
Model generates a script based on what it detects

So the script is based on interpretation of visuals.

So instead of interpreting what’s happening in a clip, if the AI is responsible for both visual generation and script logic, it would know the intended actions, movement, and timing instead of having to guess afterwards. This seems especially relevant for multi-axis!

I’m sure this is something we will see emerging in the future

Interested to know people’s thoughts or if this is “in the works” / already exists?

Rose · March 15, 2026, 6:57am

Popular video generation models using the Stable Diffusion method don’t have inherent access to motion data IIRC. Your best bet would be to use a pipeline of 3D skeleton movement generation to script and video generation.

ol1262222 · March 15, 2026, 11:59am

if good scripts can be made based on 3d skeleton movement, and 3d skeleton movements can be made based on prompts/video data that would be a great start. it would then just be putting the pieces together and converting the 3d skeleton to video — maybe by putting cgi model over skeleton.. then passing that video through another ai to enhance visuals.

as a disclaimer i don’t have much practical experience with these things, so maybe I am misunderstanding.