VR Funscript generation helper - Python & now CV AI

Update 3 or Nov. 16th:

Footage below of unsupervised funscript generation (green in the graph), comparing to the human made funscript (blue in the graph) for 4 randomly picked sections of the video:

Update 2 of Nov. 11th:

Still tweaking the BJ/HJ algo, but now disregards scenes that it considers irrelevant.

Model trained on 4500+ hand picked and tagged images :hot_face:

edit: I was curious, so I countedā€¦ 30149 classes tagged in those images to train the model :melting_face:

Introduced Kalman filtering (againā€¦ :face_holding_back_tears:) and fallback tracking (booooobies :heart_eyes_cat:).

Below a 10 random picked and unsupervised processed cuts of a video:

Another shot of the algo capabilities and inabilities at this stage, in a better resolution.

Still tweakingā€¦

Though for reference, the base video is a reencoded 60 fps VR video at 1920 x 1080 and the whole inference process runs at 90fps with monitoring/displaying/logging on my Mac mini when I was barely getting 8 to 20 fps with the previous trackers solution.

Update 1 of Nov. 11th:

Made some progress, got rid of all the OpenCV tracking, now fully relying on a model I trained and algorithm to detect what is relevant to use in the frame.

This footage below is fully unsupervised and picks 4 random sections of 240 frames in a video. I still need to fix the blowjob algorithm, got a glitch in the matrix when fixing another section of the algorithm, but almost there:


Hi all,

Dreaming from being able to enjoy scripts, even basic, with whichever VR video from my collection, Iā€™ve been working on (yet another) funscript generation helper using Python and Computer Vision techniques for the past few weeks on my free time, and I am looking for constructive feedback (to tweak, adjust, reconsider from scratch etc. the algorithm) and motivation to keep going.

I started with no coding knowledge at all, I have been testing countless Computer Vision approaches, faced disappointment and frustration, close to going batshit crazy at some point, and even lost countless hours on things as simple as the plain simple GUIā€¦ :face_with_symbols_over_mouth:

And I wonā€™t mention the Kalman filtering (ouch, too late), nor the occlusion handling that keeps me awake at night.

So please, manage your expectations :slight_smile:

Anyway, at this stage (I do not even know if we can call this alpha stage):

  • the tool is mostly set for VR videos,
  • it uses two different approaches depending on the type of scene (handjob/blowjob vs the rest),
  • it could be leveraged to help scripters produce some sections of scripts before refining and fine-tuning
  • it needs some supervision in the making
  • the process is run frame by frame
  • it generates a funscript file and a project file that you can restart from if you close the app

Handjob / Blowjob scenes:

  • input consists in drawing the box where the penis is located
  • the system detects hands and mouth, then tracks their position if touching the penis box
  • applies a dynamic weighting mechanism for hands and mouth to determine the target funscript position

shocum-test

Other positions:

  • input consists in drawing boxes on the male and female bodies
  • these body parts are tracked and their relative distance is registered, then normalized

czechvr723-test

The process cuts sections of data every 15 seconds, auto filters, normalizes and export/saves funscript file and project.

There are still many things to implement before making it fully enjoyable, and needs to be optimized performance wise.

All this to actually come up with the following request : does any one of you have a VR video he would like to see a part scripted so as to test and provide feedback ? Please do not expect a fully scripted vid, but rather just short section(s) of it at this stage.

Just really looking for feedback, it still needs a LOT of work but it is already 2k+ lines of code and I kinda feel alone facing a giant mountain to hikeā€¦

I would definitely like designing and coding help, but I am still very shy to share my terrible and under documented code at this stage.

19 Likes

Is this openCV?

Thatā€™s it, exactly :slight_smile:

Combination Face detection mechanism, Hands detection (mediapipe for the hands) then locked and tracked with CSRT trackers + Kalman filters and reevaluated periodically.

Simple CSRT trackers + Kalman filtering for the body parts in the other tracking system, moving area assessment and locking to detect outliers.

You didnt want to use any of the more recent AI specific options? Theres a few on github i was going to give a shot. Ive also been playing around with adding control point or anchor point ā€œnodesā€ to a scene. So like this using a left, right, and center node.
image

or like this where I just repeat it all the way across.

This works well in VR because the camera is always steady and pointing in one direction (it should be at least). This "brings that bottom trackers potential to fault to essentially 0%. Itā€™s not necessarily the most elegant solution but in terms of me having to do less work when using a system to track and plot points. It does save me time. (ive been experimenting with different colors and shapes. max green or max blue work best.)

Oh wow, I had no clue about this AI approach you are mentioning, I felt like Iā€™ve been alone in a dark room bumping walls trying to find the exit. ChatGPT and other like Mistral and Deepseek helped me out of this nightmare, but still tripped me here and there :rofl:

When I came up with this solution, I was already happy! Actually struggled to find the best face detector as theyā€™re very sensitive to face orientation, Yunet saved me. Kept mediapipe only for the hands.

As for the VR specifics area of interest, fully agreed, I actually focus only on a specific part of the frame to limit the computation.

I would gladly look into the approach you mention if you are willing to share. Happy to collaborate if we can make better progress.

Thanks again for your message!

Thanks!
I was actually looking on another Meta stuff to create passthrough videos (segment anything 2) :smile:

Yeah i looked at this awhile ago when it was new and now they have a refined model, so im thinking it might be time to see what it can do

1 Like

This is smart, but only valid when the Ā« male Ā» is not moving, right ?

Still looking for testers :smiling_face_with_three_hearts:

well yeah but usually one party is moving the other isnt. But yeah its not absolute, obviously.

To expand on the concept though you would or could place a node on the person/s that is a more clear tracking point for the model. But at some point you can cut out the middle man and just track the model/s themselves.

I would be happy to work with you and test new perspectives !

Yeah I have all these projects in mind, i need to find the time. Iā€™m not as much a developer as I am an AI/ML engineer though. So while I can most definitely code and build things, I am better at training LLMā€™s and working with data.

If you want to play around with it and see what it does and if you see potential in it then I can easily start fine tuning off of their model used as a base.
Im assuming they used a lot of data, we would want to make sure the model held on to that ā€œknowledgeā€ then add a dataset that was solely adult oriented. Resulting in a new model. Could maybe do a LoRA but it depends

you seem to be more than halfway through your OpenCV implementation though. So if you want to finish that firstā€¦

Well, I have quite some features left on my roadmap, so if you have any clue what model and how to train it to detect pose (even just handjob/blowjob vs other) so as to position some bookmarks in the timeline and autoswitch/suggest the appropriate tracking method.

Also, maybe how to train a yolo model to recognize a dick and auto position the bounding box in the handjob/blowjob tracking method, that would be next level versus what I have nowā€¦

The target being less supervised (dreaming of unsupervised) processingā€¦

Actually considering to add funscript manual editing options, beyond just the automated tracking solutions to ease up the process for applicable parts of the video (either combined opencv tracking, or impact sound detection for section without clear view like closeup missionary, etc.).

Not sure if this is getting interest nor if this is worth spending my time on (beyond the personal challenge!) given the funscript editing/generation software offer as of now :sweat_smile:

Honestly if all the people working alone on their separate small non-AI tracking projects just came together and started working on a bigger editor project that would by default incorporate the tech of MTFG and meta cotracker or other AI tracking pipelines into it, I would genuinely offer to help, but as of now everyoneā€™s kinda doing their own thing and it leads to no overall progress.

2 Likes

I actually took this as a personal challenge to spend some of my spare time on, as I had no clue about coding anything (be it Python, or any OpenCV, face/hands detections) like just 3 weeks ago and had very limited knowledge of the already developed techniques here.

So if a global project starts, and if I can be of any help, I would gladly do.

Great effort, how could get the app?

Been working a tad more on my free time, with YOLO, now I have auto detection for bodyparts and estimation for scene types based on a model I trained on my video library along with cue points from timestamp.trade so as to adapt the algorithm accordingly.

If anyone with a proper experience in CV is interested in contributing, please message me, feeling kinda lonely :sweat_smile:

I sometimes feel like I should just script insteadā€¦ :joy:

Below raw footage is unsupervised, and only relies on models I trained, but will be used to give start key points for actual OpenCV (or other) tracking:

progress

Pretty sure we can do something.

2 Likes

Made progress on the model today, getting way better and faster
new_progress

Everything in this gif, including labelling, in unsupervised.

Training takes time, but handpicking zone of interest and matching them to a class for learning is beyond that. Made my own software to run through my VR video library and draw boxes over ā€˜classesā€™ for later (now showing) yolo learningā€¦ Went through 4000+ snapshot and hand drew the different boxes matching desired classes. Might need to surgery-fix my mouse-handling wrist now.

Anyway, I initially thought this could serve as anchoring the CV trackers and then start the proper tracking, but this is actually now even way faster, more reliable and does not need as much supervision as CV tracker management required. Quite impressed at the result, but I am a know-nothing in that field, please, experts, pardon my insulting ignorance (readying the whip).

Still need to add smoothing, but kalman filters and thresholding parameters should help, this is again very raw at this stage.

1 Like