Funscript AI Generation - VR (& 2D POV now?) - Join the Discord :)

PeeTee · December 19, 2024, 2:38pm

Yes that part. The dots were all over her chest which was weird to me. I would expect the dots be on her hips (or at least on that area) like the first and second sample u send.

k00gar · December 19, 2024, 2:50pm

Ok, no, sorry, we have a misunderstanding here

So you are referring to the third extract I posted.

Those dots are the positions of the hips as per the base yolo11 pose model, nothing I have been using so far on my code, as a I using a specifically trained yolo11 detect (image recognition, not pose recognition) model.

It was just to show you that the result of a pose model can be very inconsistent in some specific cases

k00gar · December 19, 2024, 3:02pm

Here is the debugging display (very low quality for upload) of that grinding part, and indeed, there is so less visible move between the base of the penis and lowest part of the pussy that the algo is lost.

This definitely calls for a different approach…

Zalunda · December 19, 2024, 4:26pm

Have you tried using a 2D projection instead of the VR image?

It should eliminate the “VR encoding distortion”, especially the massive one at the bottom of the screen.

Something like this:
ffmpeg -i VideoFile.mp4 -filter_complex [0:v]v360=input=he:in_stereo=sbs:pitch=-25:yaw=0:roll=0:output=flat:d_fov=120:w=2048:h=2048 Video2D.mp4

k00gar · December 19, 2024, 6:48pm

Thanks a lot @Zalunda, I was hoping I could skip any video processing, but now I will definitely try that

Zalunda · December 19, 2024, 7:08pm

I would start by using ffmpeg to reencode but, if it end up helping, ideally, the best would be to include that step inside your existing pipeline if its possible.

Something like: file => VR image => FLAT image => YOLO.

I just asked sonnet-3.5 and it spit out this:

import cv2
import numpy as np
from ultralytics import YOLO

def create_projection_maps(width, height, fov=120, pitch=25):
    # Create meshgrid of coordinates
    x = np.linspace(0, width-1, width)
    y = np.linspace(0, height-1, height)
    xv, yv = np.meshgrid(x, y)

    # Convert to normalized device coordinates (NDC)
    x_ndc = (2.0 * xv / width - 1.0) * np.tan(np.radians(fov/2))
    y_ndc = (2.0 * yv / height - 1.0) * np.tan(np.radians(fov/2)) * (height/width)

    # Apply pitch rotation
    pitch_rad = np.radians(pitch)
    y_rotated = y_ndc * np.cos(pitch_rad) - np.sin(pitch_rad)
    z_rotated = y_ndc * np.sin(pitch_rad) + np.cos(pitch_rad)

    # Convert to spherical coordinates
    r = np.sqrt(x_ndc**2 + y_rotated**2 + z_rotated**2)
    theta = np.arctan2(x_ndc, z_rotated)
    phi = np.arccos(y_rotated / r)

    # Convert to equirectangular coordinates
    map_x = (theta / (2 * np.pi) + 0.5) * width
    map_y = phi / np.pi * height

    # Convert to proper format for cv2.remap
    map_x = map_x.astype(np.float32)
    map_y = map_y.astype(np.float32)

    return map_x, map_y

def process_vr_video(input_path, output_path=None):
    # Load YOLO model
    model = YOLO('yolov8n.pt')
    
    # Open video
    cap = cv2.VideoCapture(input_path)
    
    # Get video properties
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Calculate dimensions for reprojected frame
    output_width = 2048
    output_height = 2048

    # Create projection maps (only need to do this once)
    map_x, map_y = create_projection_maps(output_width, output_height)

    # Initialize video writer if output path is provided
    if output_path:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (output_width, output_height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # If frame is side-by-side stereo, take only left eye (first half)
        if width > height * 2:  # Assuming side-by-side format
            frame = frame[:, :width//2]

        # Apply reprojection
        reprojected = cv2.remap(frame, map_x, map_y, cv2.INTER_LINEAR)

        # Run YOLO detection
        results = model(reprojected)
        annotated_frame = results[0].plot()

        # Save or display frame
        if output_path:
            out.write(annotated_frame)
        else:
            cv2.imshow('YOLO Detection on Reprojected VR', annotated_frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    cap.release()
    if output_path:
        out.release()
    cv2.destroyAllWindows()

# Usage example
if __name__ == "__main__":
    input_video = "path/to/your/vr_video.mp4"
    output_video = "output_processed.mp4"
    process_vr_video(input_video, output_video)

Note: I have no idea if it works or it the performance would be good. In ffmpeg, adding a re-projection like this doesn’t add much time to the overall process.

k00gar · December 19, 2024, 7:38pm

I ran the ffmpeg and generated the unwarped vid.

Downside is that the command trims the edges of the video in the process, thus losing the genitals in the very bottom on the frame in the case we were trying to troubleshoot…

Too bad…

Zalunda · December 19, 2024, 8:25pm

Yes, unfortunately, it’s a general problem. You must choose only one viewpoint for the projection unless you have settings for each part of the video (in my example, pitch=-25, which means “look down 25 degrees”). If you used -30 or -35, you would probably not have cut anything for that specific video.

But that means, it is hard to have a single setting, especially for some “Western VR” which are all over the place in terms of dick position relative to the head (distance and angle). They even mess that up for most of the AR video, where it’s, by far, for me, the most important thing for a good AR experience (I don’t know why but I don’t feel immersed when I have dick sticking out of my knee)… But, well, that’s another topic entirely…

k00gar · December 19, 2024, 9:20pm

Hmm, very enlightening comment, I appreciate the guidance

I mean, I raw-coded a couple weeks ago a video reader library relying on ffmpeg as some format/encoding of videos were giving a hard time to opencv to read properly (performance wise). Which I ended up not using, as I am now first downscaling/reencoding videos before processing them.

I will experiment a lil bit, and maybe the unwarping could be done on the fly through the library based on the piece of code you shared, using an adapted pitch value when a scene requires it based on some trigger of some kind.

So much to explore, so little free time to do, I wish we guys could find a way to team up forces even more

Thank you all for the benevolent comments and the objective observations, I really like you sharing and giving objective feedback. Makes me want to go on, experiment and share. Genuinely hoping this ends up benefitting the community.

k00gar · December 19, 2024, 9:35pm

And if this works, on of my most moronic move might have been to try and train the yolo model to detect shapes even in the “warped” zone… Not sure I have the courage to start all the image tagging and model training from the beginning lol

k00gar · December 19, 2024, 10:01pm

So, now, I have this against that (pitch -35 on top):

and the results of the YOLO run:

First observations:

Even if trained on raw VR images, those results are better on unwaped images (cf. confidence values 0.something as a probability…)
The “carrot” dick still is detected properly
We even get to detect one foot on (which might trigger some foot job algo logic though…)

Though, in that more extreme case, results are different : we get a “pussy” class but lose the penis one…

Anyway, thank you @Zalunda, this thinking out of the box helps greatly, we need to experiment further I think this approach could be a game changer! (Except maybe for the difficult grinding scenes )

Zalunda · December 19, 2024, 11:04pm

Few subjects:

Now that I think about it, the best projection would probably be something like what we see in Heresphere using the following settings: “Forward” setting completely to the left (-0.750) and a slight pitch down (around -10). I just tested it on a different video and I could get a mostly unwarped view of everything (i.e. it simulates that the camera is far away from the scene and thus can see a larger section of it unwarped). Maybe the math for that projection can be found somewhere, I’ll look around because I’m interested too (or maybe @Heresphere is willing to help out unless he considers that it’s his secret sauce, which would be understandable).
As for your training, I’m not an expert but I expect that your training will ‘hold up’. Most of the center of the VR image is pretty unwarped by default. It’s only the side that gets messed up.

Also, more of a shared goal, I think it’s great that we have people who are testing various ideas to automate part of scripting but I wonder if the community could try to work on shared resources to help out, like:

Set of curated video sections (sitting cowgirl JAV and Not-JAV, leaning forward cowgirl, etc), ideally with a known script given to the community by a scripter. [I’m thinking that an OFS plugin that would allow someone to select a section of a script and extract it to a separate .mp4/.funscript might help to create that library]
Tool/Algorithm to compare a generated script to the baseline script and output metrics (in text and graphs): added points, missing points, the average difference in timing, etc.

[updated the last part comment for clarity]

k00gar · December 20, 2024, 2:16pm

So, I have been toying around with both ffmpeg and a derivative of the python ClaudeAI code, and unfortunately the YOLO detection is not better

Example #1:

Original:

Unwarped through ffmpeg from source:

Undistorted through python code from source:

Same sequence of images for Example #2 :

Original:

FFmpeg:

Python:

Dimava · December 20, 2024, 2:38pm

Any thoughts on multiaxis?

k00gar · December 20, 2024, 5:27pm

Definitely at some point, but right now I am struggling with some specific parts of videos and need to come up first with a solution for the one-axis approach.

Then, if we manage to overcome this challenge, we’ll have time to deal with multi-axis for sure.

k00gar · December 20, 2024, 5:46pm

That would be awesome indeed, and we could definitely use those results to fine tune weights, parameters and stuff.

Same with the curated library of video sections along with test funscript to run automated tests and comparisons.

I actually started a low-tech approach to check that the algorithm was picking the right lows and highs here, compared to a human made script for various random sections of videos : VR Funscript generation helper - Python & now CV AI - #44 by k00gar

With this kind of automated output for analysis (in green the human made funscript, the rest was generated):

tanner · December 22, 2024, 2:19am

Just want to say this is Amazing work here!
Please keep it up… if a working AI Scripter program that anyone can use is the result of this effort… You Are A HERO!!!

Gusguez1980 · December 22, 2024, 5:09pm

Still another artistic expression killed by AI

just joking

k00gar · December 23, 2024, 6:46pm

Sorry guys, not much news to share lately.

Still reworking code whenever I have free time, trying to envision a better and more reliable approach.

In this case, I was comparing a pose detection model vs a the image recognition, and I actually think I need to combine both for better results.

I mean, it could still be useful in that case :

But even more useful in that case when we struggle to get a proper movement between pussy and penis on the right panel, when on the left panel we can detect a proper movement of the hips:

Of course, this is way more costly computationally-wise, but seems interesting enough exploring.

Also, even though we are not there yet AT ALL, I kinda like seeing this type of funscript heat map.

At this stage, disregard speed, number of actions (basically, disregard color and height of strokes, as it is a matter of tweaking, parametrization, etc.) and please have a look at the detection of the NO-action (close up pussy, no penis touching, etc.) zone in the middle of the video.

Current algorithm is “almost” there

Still a (big big) LOT to do anyway, but this is kind of interesting.

Man-made:

Very famous website AI made:

Current algorithm shared here:

blipplbat · December 24, 2024, 5:28am

This is phenomenal work, man. And a lot of hours going into this too, I’m sure.