Funscript AI Generation - VR (& 2D POV now?) - Join the Discord :)

k00gar · December 30, 2024, 10:37pm

This comment means a lot.

k00gar · December 30, 2024, 10:59pm

Hi there @PeeTee, and thanks again for your valuable feedback.

The latest version of the algo is now trying to track the hips, as part of a pose detection YOLO model, on top of all the other initial body parts “image detection”.

Hope this drives us to better results. Thanks again for being one of the first to provide feedback

k00gar · December 31, 2024, 8:36am

Hi all, as promised, here are the results of the batch ran last night.

n-sized yolo detection model, no pose model involved.

We are down to 6375 actions vs 6630 with the pose model tracking movement of the hips, vs 6188 actions in the human reference funscript.

Just to make it very clear, the human made “reference” script is only used for post-processing comparison, it was NEVER used to train the AI model/software/whatever.

Here the first raw-results:

And now, the results with the post-processing auto enhancements I was mentioning yesterday :

Currently writing methods to deal with the shallow strokes and enhance them without breaking 0 / 100 thresholds and creating flat sections.

We are getting closer that way in terms of avg depth, peaks and lows, but still have a lot to look after.

General observations:

I did not notice the off syncing before in doggystyle
Some strokes are still missing in closeup missionary, and some are reversed

More food for thought. Makes me wonder if the use of the pose model for hips detection really is a plus or not

Might need to switch to another video and leave Blake where she is for now.

PeeTee · December 31, 2024, 12:16pm

Thank you for all the work u doing. I am watching this project closely (but more silently since there are more knowledgeable person than me ). Looking foward the first script u put out for test.

k00gar · December 31, 2024, 10:30pm

Could not resist and send through the pipeline a couple more videos today.

First a JAV, which gave a mitigated result, but less worse than it has been in the early days. Still a lot to go through though, might even need to consider and train a specific model given the mosaic’ed stuff. Darn, I love JAV-VR though…

Anyway, the recent release of @Shayuki made me wonder how good the model would behave on such a scene.

https://discuss.eroscripts.com/t/vrconk-com-game-of-thrones-daenerys-targaryen-a-porn-parody-kiara-cole/223691

As usual, the report picks six 10s scenes at random in the video after the algorithm processed the full video, and builds a report comparing the AI generated funscript to a “reference” funscript. It does not use the reference script for training of the model, I only use the findings from the report comparison to tune, adjust weights, parameters and logic in the algorithm. The AI part is “only” a body part detection within an image.

On this first report, well, there is some kind of satisfaction, it’s broken, misses a stroke here and there, lacks of subtleness, but tries its best and seems to perform ok-ish.

However, in this one, there is absolutely no competition, the amount of subtlety and refinement in the human made script, the missed/reversed strokes and the outliers in the AI generated one sometimes makes me feel it’s a dead end.

Anyway, all this material will be useful for further analysis of the way the algorithm behaves.

NB. : that part of @Shayuki 's script looks really nice

Thank you for scripting this scene and sharing @Shayuki (your script was not used for training anything, just for post-processing comparison, I hope this is fine with you)

Shayuki · January 1, 2025, 5:56am

Sure! Don’t mind at all. If you use them for training of any kind then go ahead.

k00gar · January 1, 2025, 1:43pm

Boy oh boy… You fix one thing, you break another one…

Now my closeup detection is messy

I changed the rendering of the comparison report for better readability.

k00gar · January 1, 2025, 3:56pm

Dealt with this issue, … till the next one…

Next one obviously already here as we can see below…

Anyway, also challenged the algorithm on this one scripted by @sentinel (thank you again!) :

https://discuss.eroscripts.com/t/orgyvr-ember-snow-linda-lan-lulu-chu-nicole-doshi-the-pussycat-girls/218150

Not so bad given the amount of information it had to go through (the tracked info dumped to a json is over 120MB, not mentioning the 1GB log file for debugging)…

Lastly, an attempt at comparing output on a video recently scripted by @bumdude (thank you again!) :

https://discuss.eroscripts.com/t/povr-originals-angelina-moon-thats-a-paddlin/223038

So much to do left to dig the scenarios of the broken parts and come up with fixes.

Hoping collaboration might level all this up soon when the repo is finally ready for sharing.

In the meantime, I just had an idea about something I want to experiment to generate a second axis, will burn some more electricity to retrain a model with a “twist” and see if this can help

sentinel · January 1, 2025, 7:54pm

I’m happy to contribute to the training indirectly with my scripts

I think the blowjob/handjob sections will be the toughest to train on since at least I script with the intent to avoid repositioning strokes, i.e. strokes that are only there to compensate for a hand stroke to the bottom and then a lick starting from the middle. Repositioning tend to break the immersion for me. This means that I intentionally make strokes shorter than they are, or I might limit or extend one or more strokes to avoid repositioning at the end. I also use e.g. dick swinging after the girl release the dick for sleeve movement to avoid repositioning later.

The trick is that brain associate visuals with feeling of sleeve movement as being in sync even if it isn’t the girl that do anything. And no sleeve movement can also be an immersion breaker so some movement, even if the length is wrong, is better than none.

I also tend to focus on adding movement even if it’s in the wrong direction when the girl focuses on the head with her lips and tongue.

Maybe info that might be good to know when deciding on how to approach blow/handjobs from an AI perspective. Visual tracking probably fall short and need other components for learning.

k00gar · January 1, 2025, 8:21pm

Wow, thank you so much for that feedback @sentinel , very enlightening,

I’ve been struggling with automation on blowjob, specifically when hands where involved, and so far, I was tracking all body parts touching the penis, measuring the one that had the biggest sum of unitary moves in between frames on a tracking history window, and transitioning to it.

But what you state makes perfect sense, and might call for another layer of processing.

Raw detection data => raw tracking data => (sex) pose (-e +ition) and raw funscript pos estimation => transition layer (*new) => simplification (vw algo)

Really, you might have made my day both with your free script release and this comment itself, thank you so much.

k00gar · January 1, 2025, 8:28pm

Ok, meanwhile, I converted my previous dataset to an OBB dataset, might open new perspectives.

Currently training, burning

We’ll see where this leads us…

Zalunda · January 1, 2025, 9:24pm

Ideally, the last layer of the AI/Tools should not work linearly (i.e. not take a final decision on the first point, then the second point, etc).

It should do a pre-analysis of each frame (or maybe groups of 3-4 frames in rows) and start by ‘finalizing’ the ones that are the clearest (i.e. highest % of detection or something).

In some cases, the pre-analysis could create actions that are not fully defined. For example, if the girl is only moving her head left and right while giving a blowjob (or your lick the head example), it could create something like: change direction, then go at 50 ± 20 speed, but the direction is undefined.
Also, some actions could be set as optional or with a lower priority, like dick swinging for example. If the tools need to be repositioned during that space, it could use those ‘optional’ moves to hide repositioning.

Somewhat related to this, a while back, I worked on an algorithm to increase or decrease the intensity of a script by a specific percentage, while trying to keep near the original position. You can see an example below (top: original, 2nd line: 50% intensity, 3rd line: 80% intensity, 4th line: 150%). Source: 1, 2

In short, it works by setting a new distance for actions and then ‘nudging’ points up and down until it minimizes a computed error value (position and distance). You can see in the 50% line that this makes some actions migrate slowly to the desired position.

This could be a starting point for the last part of the tool, where the algorithm could ‘resolve’ the final action’s positions.

In some cases, it could ignore the location when computing the error value of a point. It might also be able to replace low-priority direction changes with ‘a change of speed’ instead (which I sometimes do in script).

For example, if a dick-swinging action has no defined position (or lower priority than the handjob / lick position), it would be able to create something like this:

Anyway, food for thought.

Zalunda · January 1, 2025, 9:55pm

Also, I don’t mind at all that an AI try to create a full script for a scene but, personally, I’m not interested in using AI generated scripts as-is. I see it more as a tools for scripter.

If it’s possible, something that might be useful, is if the tool could generate a secondary script that only output the tool’s level of confidence.

For example (top: generated script, bottom: confidence script:

This would help a scripter knows which part might need more works.

k00gar · January 1, 2025, 10:04pm

Both comments are excellent points @Zalunda !

More food for thought, so as to adjust and / or pivot the approach. Thanks !

sentinel · January 1, 2025, 11:54pm

When both hands and mouth are involved you need to see the bigger picture. If they are almost in sync you often make one stroke for them combined. They might get out of sync at the end points where one change direction before the other and in that case I just go on feeling. I try to “feel” what is in sync when I watch the segment with the simulator visible.

If both mouth and hand are out of sync a lot I usually tend to look at what is most dominant visually and use that for the script.

Another thing I often do is to look in what position things end and then I backtrack points to optimize so that the final stroke end up in the right end (top or bottom typically). This is mostly for blowjob for example in the following situations:

The tongue is involved
The mouth goes down on the side of the dick head (i.e. not true up/down movements). The girls head is basically sideways sometimes. This can sometimes be combined with a pop where the dick slips out from her cheek.
The dick is moved back and forth across the face.

In most of these situations I need the sleeve to be at the position when those movements end and the girl finally take the dick in her mouth again.

It’s worth mentioning that sideway movements doesn’t matter if they are up or down movements in general. You just need movement.

Hip rotation is also simulated by using different speeds on the same up or down stroke. Usually the speed change occur when the direction change. This is also something that is done a bit on feeling and experience. No rules there.

sentinel · January 2, 2025, 12:01am

All valid points!

I leave it to the AI pros here to figure out how to train a model with that kind of thinking

k00gar · January 2, 2025, 9:30am

Thanks @sentinel, this is the type of approach I have been trying to implement, by tracking over the last frames the parts that are touching the penis, and summing their absolute movements over time, trying to determine what is the dominant one, and transitioning to it as the lead part. This is still a bit broken, but I will try to refine it.

For this one, we would need some kind of flagging + raw track, and post-processing. Adding to the todo list.

Unfortunately, for this one, I would need a model trained to detect lips/mouth, I have only face for now. And glans, amongst others, but the glans could be occluded by hand and not be in mouth… Need to think about it.

This could be a game changer!

Let me explain, until now, I mainly focused on y-axis / height, and on a combination of the difference of height between parts (base of penis, hand, etc) and absolute y-position of the part itself. I worked with Euclidian distance at some point, but was not happy with the result.

I also thought looking at the x-axis would be for when we would want to work with generating multi-axis scripts, but in fact I understand now that this x-axis movement is key even in a single axis funscript approach.

Wow, thank you for sharing part of your recipe, and thank you all for the valuable feedbacks!

Sidenote, as I am currently in the process of training an Oriented Bounding Box model, we could get an orientation of detected boxes, that could help for first stage multi-axis.

Quick preview, please disregard mistakes as this is very early in the training stage (epoch 30 or so on 200+). So we would move from this:

To something like that (thus allowing to analyze angles and stuff if it ends well…):

k00gar · January 2, 2025, 2:22pm

Ok, my head hurts already…

OrgyVR - Linda Lan, Lulu Chu, Ember Snow, Nicole Doshi - The Pussycat Girls

sentinel · January 2, 2025, 2:25pm

Lol, maybe you should focus on BG to begin with.

k00gar · January 2, 2025, 2:27pm

lol, you are so right, I actually lost myself in this video, no wonder why