So I’ve been putting some thought into a minimal FOSS AI scripter. There’s a real temptation to jump straight to some sort of visual transformer, but AI courses repeatedly suggest starting with something simple, so here I go. I’m going to log everything I do, and I will make all the code available on github. What I won’t do is actually publish any trained model. I will be using free scripts and scripts I purchased for training, and while I might release some free scripts of my own if it works, I want to respect the creators here and not release a model that has effectively memorized their hard work. If I succeed, hopefully we can make this a plug-in to OFS.
To that end, my initial assumptions:
- It’s a regression problem, not a classification problem.
- It’s a time series problem, adjacent frames matter, use them.
- Start simple, the first network will be a CNN predicting the funscript position at time 0 surrounded by videos frames -n to n-1. Input data will be 2 * x * y * 2n to allow a CNN to capture time as well as position.
- Videos will be processed into 2D motion vectors with optical flow then downsampled to 512x512 single .
- No SLR originals training data, just not going there.
- All code will be GPL3. Don’t even look at my code if you want to monetize.
This is more or less an end to end approach modulo step 4. Let’s find out what’s possible!