Funscript AI Generation - VR (& 2D POV now?) - Join the Discord :)

k00gar · October 13, 2024, 3:57pm

Update of Feb. 8th 2025:

New YOLO Model Update – More Reliable, More Precise!

@spatialflux and I are thrilled to announce a major upgrade to our YOLO detection model, bringing better stability, and more reliable tracking across all classes!

What’s New?
More Stable Training – We extended training with more epochs and a lower learning rate, ensuring the model learns deeper patterns without overfitting.
Enhanced Object Recognition – We tweaked multiple hyperparameters and switched to a new optimizer, making the model much better at detecting less common objects.
Smarter Fine-Tuning – We ran a dedicated fine-tuning phase without mosaic or scale augmentation, helping the model handle real-world cases more effectively.

Key Improvements:
Flicker Ratio Reduced – Smoother, more consistent tracking with fewer ID switches.
Better Track Stability – Fewer unique track IDs = more reliable object persistence across frames.
Increased Detection Performance – Higher accuracy in recognizing faces, hands, and key body parts.

This update is a huge step forward in model reliability and brings us closer to seamless, flicker-free tracking!

In the attached screenshot, left side being the new model, you can expect improvements also on BJ scenes with better face detection and tracking!

As usual, download the models from the #resources from the discord, place them in your “models” folder, and don’t forget to change the model you are using via the Settings Panel !

Introducing AI Funscript Bot!
Upload your scripts to share with the community.
Request new scripts and let the AI generate them for you.
Browse and download from an ever-growing collection!

Exciting times to be alive!

Give it a spin and let us know your feedback!

Update of Feb. 3rd 2025:

VR Funscript AI Generator - v0.2.0 - Release Notes

Hold onto your VR headsets—this update is a banger!

Once again, massive props to @spatialflux for his tireless wizardry and deep dives into the codebase.

What’s New?

10-Bit Color & Edge Case Support – Because your high-quality VR content deserves precision!
Settings Page – No more tweaking config.py like a hacker in a movie—meet config.json and the all-new settings UI!
Smart Folder Processing (CLI Mode) – Scan entire folders with intelligent ignores for versions and completed files!
Metadata Files in Output – Because organization matters! Plus, an easy re-open .bat file for Windows users.
File Versions in YOLO & Metrics Files – Tracking your experiments just got easier!

New Format Alert!
Due to the move to msgpack and file versions, it’s best to clear out the old data under the output folder in your project as it’s no longer compatible with this release.

Fixes & Improvements

Stop Processing Button – Because sometimes, we all have regrets.
Heatmap & Report Generation – Now separate actions! No more waiting on one for the other.
Log Level Setting – Switch between info and debug modes to suit your level of nerdiness.
Better FFmpeg Error Handling – Because FFmpeg errors should not feel like dark magic.
Fixed Seek Issues – Smooth playback, fewer headaches.
Resolved UI & Preview Closing Bugs – No more UI ghosting!

How to Get It?

Update your repo and check out the latest installation guide:
GitHub - ack00gar/VR-Funscript-AI-Generator

What’s Next?
• New AI model – The new model is already being trained right now (inc. approx. 30% JAVR)! Dreadful electricity bill knocking at the door soon!
• Tracking Layer Overhaul – Accuracy, and modularity are the next targets.
• Discord Bot – To collect, organize, and distribute AI-generated funscripts.

Thank You!

Join the Discord community

Update 4 of Jan. 04th 2025:

The project has changed consistently from what it was initially, as it is now fully unsupervised and code released to GitHub.

Please, read the thread till the end and join the Discord if you wish : VR Funscript AI Generation

Initial approaches

Update 3 of Nov. 16th:

Footage below of unsupervised funscript generation (green in the graph), comparing to the human made funscript (blue in the graph) for 4 randomly picked sections of the video:

Update 2 of Nov. 11th:

Still tweaking the BJ/HJ algo, but now disregards scenes that it considers irrelevant.

Model trained on 4500+ hand picked and tagged images

edit: I was curious, so I counted… 30149 classes tagged in those images to train the model

Introduced Kalman filtering (again… ) and fallback tracking (booooobies ).

Below a 10 random picked and unsupervised processed cuts of a video:

Another shot of the algo capabilities and inabilities at this stage, in a better resolution.

Still tweaking…

Though for reference, the base video is a reencoded 60 fps VR video at 1920 x 1080 and the whole inference process runs at 90fps with monitoring/displaying/logging on my Mac mini when I was barely getting 8 to 20 fps with the previous trackers solution.

Update 1 of Nov. 11th:

Made some progress, got rid of all the OpenCV tracking, now fully relying on a model I trained and algorithm to detect what is relevant to use in the frame.

This footage below is fully unsupervised and picks 4 random sections of 240 frames in a video. I still need to fix the blowjob algorithm, got a glitch in the matrix when fixing another section of the algorithm, but almost there:

Hi all,

Dreaming from being able to enjoy scripts, even basic, with whichever VR video from my collection, I’ve been working on (yet another) funscript generation helper using Python and Computer Vision techniques for the past few weeks on my free time, and I am looking for constructive feedback (to tweak, adjust, reconsider from scratch etc. the algorithm) and motivation to keep going.

I started with no coding knowledge at all, I have been testing countless Computer Vision approaches, faced disappointment and frustration, close to going batshit crazy at some point, and even lost countless hours on things as simple as the plain simple GUI…

And I won’t mention the Kalman filtering (ouch, too late), nor the occlusion handling that keeps me awake at night.

So please, manage your expectations

Anyway, at this stage (I do not even know if we can call this alpha stage):

the tool is mostly set for VR videos,
it uses two different approaches depending on the type of scene (handjob/blowjob vs the rest),
it could be leveraged to help scripters produce some sections of scripts before refining and fine-tuning
it needs some supervision in the making
the process is run frame by frame
it generates a funscript file and a project file that you can restart from if you close the app

Handjob / Blowjob scenes:

input consists in drawing the box where the penis is located
the system detects hands and mouth, then tracks their position if touching the penis box
applies a dynamic weighting mechanism for hands and mouth to determine the target funscript position

shocum-test

Other positions:

input consists in drawing boxes on the male and female bodies
these body parts are tracked and their relative distance is registered, then normalized

czechvr723-test

The process cuts sections of data every 15 seconds, auto filters, normalizes and export/saves funscript file and project.

There are still many things to implement before making it fully enjoyable, and needs to be optimized performance wise.

All this to actually come up with the following request : does any one of you have a VR video he would like to see a part scripted so as to test and provide feedback ? Please do not expect a fully scripted vid, but rather just short section(s) of it at this stage.

Just really looking for feedback, it still needs a LOT of work but it is already 2k+ lines of code and I kinda feel alone facing a giant mountain to hike…

I would definitely like designing and coding help, but I am still very shy to share my terrible and under documented code at this stage.

IIEleven11 · October 13, 2024, 5:14pm

Is this openCV?

k00gar · October 13, 2024, 5:20pm

That’s it, exactly

Combination Face detection mechanism, Hands detection (mediapipe for the hands) then locked and tracked with CSRT trackers + Kalman filters and reevaluated periodically.

Simple CSRT trackers + Kalman filtering for the body parts in the other tracking system, moving area assessment and locking to detect outliers.

IIEleven11 · October 13, 2024, 5:36pm

You didnt want to use any of the more recent AI specific options? Theres a few on github i was going to give a shot. Ive also been playing around with adding control point or anchor point “nodes” to a scene. So like this using a left, right, and center node.

or like this where I just repeat it all the way across.

This works well in VR because the camera is always steady and pointing in one direction (it should be at least). This "brings that bottom trackers potential to fault to essentially 0%. It’s not necessarily the most elegant solution but in terms of me having to do less work when using a system to track and plot points. It does save me time. (ive been experimenting with different colors and shapes. max green or max blue work best.)

k00gar · October 13, 2024, 6:09pm

Oh wow, I had no clue about this AI approach you are mentioning, I felt like I’ve been alone in a dark room bumping walls trying to find the exit. ChatGPT and other like Mistral and Deepseek helped me out of this nightmare, but still tripped me here and there

When I came up with this solution, I was already happy! Actually struggled to find the best face detector as they’re very sensitive to face orientation, Yunet saved me. Kept mediapipe only for the hands.

As for the VR specifics area of interest, fully agreed, I actually focus only on a specific part of the frame to limit the computation.

I would gladly look into the approach you mention if you are willing to share. Happy to collaborate if we can make better progress.

Thanks again for your message!

IIEleven11 · October 13, 2024, 8:43pm

k00gar · October 13, 2024, 8:46pm

Thanks!
I was actually looking on another Meta stuff to create passthrough videos (segment anything 2)

IIEleven11 · October 13, 2024, 8:52pm

Yeah i looked at this awhile ago when it was new and now they have a refined model, so im thinking it might be time to see what it can do

k00gar · October 13, 2024, 9:23pm

This is smart, but only valid when the « male » is not moving, right ?

k00gar · October 13, 2024, 9:40pm

Still looking for testers

IIEleven11 · October 13, 2024, 9:56pm

well yeah but usually one party is moving the other isnt. But yeah its not absolute, obviously.

To expand on the concept though you would or could place a node on the person/s that is a more clear tracking point for the model. But at some point you can cut out the middle man and just track the model/s themselves.

k00gar · October 13, 2024, 9:59pm

I would be happy to work with you and test new perspectives !

IIEleven11 · October 13, 2024, 10:34pm

Yeah I have all these projects in mind, i need to find the time. I’m not as much a developer as I am an AI/ML engineer though. So while I can most definitely code and build things, I am better at training LLM’s and working with data.

If you want to play around with it and see what it does and if you see potential in it then I can easily start fine tuning off of their model used as a base.
Im assuming they used a lot of data, we would want to make sure the model held on to that “knowledge” then add a dataset that was solely adult oriented. Resulting in a new model. Could maybe do a LoRA but it depends

you seem to be more than halfway through your OpenCV implementation though. So if you want to finish that first…

k00gar · October 14, 2024, 6:11am

Well, I have quite some features left on my roadmap, so if you have any clue what model and how to train it to detect pose (even just handjob/blowjob vs other) so as to position some bookmarks in the timeline and autoswitch/suggest the appropriate tracking method.

Also, maybe how to train a yolo model to recognize a dick and auto position the bounding box in the handjob/blowjob tracking method, that would be next level versus what I have now…

The target being less supervised (dreaming of unsupervised) processing…

k00gar · October 15, 2024, 7:42pm

Actually considering to add funscript manual editing options, beyond just the automated tracking solutions to ease up the process for applicable parts of the video (either combined opencv tracking, or impact sound detection for section without clear view like closeup missionary, etc.).

Not sure if this is getting interest nor if this is worth spending my time on (beyond the personal challenge!) given the funscript editing/generation software offer as of now

Rriik · October 15, 2024, 10:06pm

Honestly if all the people working alone on their separate small non-AI tracking projects just came together and started working on a bigger editor project that would by default incorporate the tech of MTFG and meta cotracker or other AI tracking pipelines into it, I would genuinely offer to help, but as of now everyone’s kinda doing their own thing and it leads to no overall progress.

k00gar · October 16, 2024, 6:37am

I actually took this as a personal challenge to spend some of my spare time on, as I had no clue about coding anything (be it Python, or any OpenCV, face/hands detections) like just 3 weeks ago and had very limited knowledge of the already developed techniques here.

So if a global project starts, and if I can be of any help, I would gladly do.

mahmoudmody12 · October 16, 2024, 2:16pm

Great effort, how could get the app?

k00gar · November 3, 2024, 8:49pm

Been working a tad more on my free time, with YOLO, now I have auto detection for bodyparts and estimation for scene types based on a model I trained on my video library along with cue points from timestamp.trade so as to adapt the algorithm accordingly.

If anyone with a proper experience in CV is interested in contributing, please message me, feeling kinda lonely

I sometimes feel like I should just script instead…

Below raw footage is unsupervised, and only relies on models I trained, but will be used to give start key points for actual OpenCV (or other) tracking:

progress

Pretty sure we can do something.

k00gar · November 5, 2024, 11:10pm

Made progress on the model today, getting way better and faster
new_progress

Everything in this gif, including labelling, in unsupervised.

Training takes time, but handpicking zone of interest and matching them to a class for learning is beyond that. Made my own software to run through my VR video library and draw boxes over ‘classes’ for later (now showing) yolo learning… Went through 4000+ snapshot and hand drew the different boxes matching desired classes. Might need to surgery-fix my mouse-handling wrist now.

Anyway, I initially thought this could serve as anchoring the CV trackers and then start the proper tracking, but this is actually now even way faster, more reliable and does not need as much supervision as CV tracker management required. Quite impressed at the result, but I am a know-nothing in that field, please, experts, pardon my insulting ignorance (readying the whip).

Still need to add smoothing, but kalman filters and thresholding parameters should help, this is again very raw at this stage.