FOSS AI for Scripting 2024

So I’ve been putting some thought into a minimal FOSS AI scripter. There’s a real temptation to jump straight to some sort of visual transformer, but AI courses repeatedly suggest starting with something simple, so here I go. I’m going to log everything I do, and I will make all the code available on github. What I won’t do is actually publish any trained model. I will be using free scripts and scripts I purchased for training, and while I might release some free scripts of my own if it works, I want to respect the creators here and not release a model that has effectively memorized their hard work. If I succeed, hopefully we can make this a plug-in to OFS.

To that end, my initial assumptions:

  1. It’s a regression problem, not a classification problem.
  2. It’s a time series problem, adjacent frames matter, use them.
  3. Start simple, the first network will be a CNN predicting the funscript position at time 0 surrounded by videos frames -n to n-1. Input data will be 2 * x * y * 2n to allow a CNN to capture time as well as position.
  4. Videos will be processed into 2D motion vectors with optical flow then downsampled to 512x512 single .
  5. No SLR originals training data, just not going there.
  6. All code will be GPL3. Don’t even look at my code if you want to monetize.
    This is more or less an end to end approach modulo step 4. Let’s find out what’s possible!
8 Likes

Unless someone provide a model, this would seem kind of hard to use. Perhaps limit it’s value.

Did you consider making an opt-in program for script authors that would like to support the project?

3 Likes

Great idea, but first I have to make something work at my end before asking for that. 2024 goals!

1 Like

Best of luck. :slight_smile:

1 Like

GPL3 does not forbid using the source code in a commercial product.

1 Like
1 Like

The article mentions failure cases. I wonder how well it works with our use cases?

Failure Cases

Like many motion estimation methods, our method struggles with rapid and highly non-rigid motion as well as thin structures. In these scenarios, pairwise correspondence methods can fail to provide enough reliable correspondences for our method to compute accurate global motion.

1 Like

Thanks, I didn’t know that, but I found this thanks to you pointing that out.

Our use case seems fairly limited. One is basically looking for 1 to 2 sets of more or less connected motion vectors and their collective motion per frame. Distractions include the backgrounds, extra partners, and some occasional occlusion. Anyway, the only way to understand what works and what doesn’t is to build it and find out.

1 Like