Funscript AI Generation - VR (& 2D POV now?) - Join the Discord :)

Yes that part. The dots were all over her chest which was weird to me. I would expect the dots be on her hips (or at least on that area) like the first and second sample u send.

Ok, no, sorry, we have a misunderstanding here :slight_smile:

So you are referring to the third extract I posted.

Those dots are the positions of the hips as per the base yolo11 pose model, nothing I have been using so far on my code, as a I using a specifically trained yolo11 detect (image recognition, not pose recognition) model.

It was just to show you that the result of a pose model can be very inconsistent in some specific cases

Here is the debugging display (very low quality for upload) of that grinding part, and indeed, there is so less visible move between the base of the penis and lowest part of the pussy that the algo is lost.

This definitely calls for a different approach…

Have you tried using a 2D projection instead of the VR image?

It should eliminate the “VR encoding distortion”, especially the massive one at the bottom of the screen.

Something like this:
ffmpeg -i VideoFile.mp4 -filter_complex [0:v]v360=input=he:in_stereo=sbs:pitch=-25:yaw=0:roll=0:output=flat:d_fov=120:w=2048:h=2048 Video2D.mp4

Thanks a lot @Zalunda, I was hoping I could skip any video processing, but now I will definitely try that :slight_smile:

I would start by using ffmpeg to reencode but, if it end up helping, ideally, the best would be to include that step inside your existing pipeline if its possible.

Something like: file => VR image => FLAT image => YOLO.

I just asked sonnet-3.5 and it spit out this:

import cv2
import numpy as np
from ultralytics import YOLO

def create_projection_maps(width, height, fov=120, pitch=25):
    # Create meshgrid of coordinates
    x = np.linspace(0, width-1, width)
    y = np.linspace(0, height-1, height)
    xv, yv = np.meshgrid(x, y)

    # Convert to normalized device coordinates (NDC)
    x_ndc = (2.0 * xv / width - 1.0) * np.tan(np.radians(fov/2))
    y_ndc = (2.0 * yv / height - 1.0) * np.tan(np.radians(fov/2)) * (height/width)

    # Apply pitch rotation
    pitch_rad = np.radians(pitch)
    y_rotated = y_ndc * np.cos(pitch_rad) - np.sin(pitch_rad)
    z_rotated = y_ndc * np.sin(pitch_rad) + np.cos(pitch_rad)

    # Convert to spherical coordinates
    r = np.sqrt(x_ndc**2 + y_rotated**2 + z_rotated**2)
    theta = np.arctan2(x_ndc, z_rotated)
    phi = np.arccos(y_rotated / r)

    # Convert to equirectangular coordinates
    map_x = (theta / (2 * np.pi) + 0.5) * width
    map_y = phi / np.pi * height

    # Convert to proper format for cv2.remap
    map_x = map_x.astype(np.float32)
    map_y = map_y.astype(np.float32)

    return map_x, map_y

def process_vr_video(input_path, output_path=None):
    # Load YOLO model
    model = YOLO('yolov8n.pt')
    
    # Open video
    cap = cv2.VideoCapture(input_path)
    
    # Get video properties
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Calculate dimensions for reprojected frame
    output_width = 2048
    output_height = 2048

    # Create projection maps (only need to do this once)
    map_x, map_y = create_projection_maps(output_width, output_height)

    # Initialize video writer if output path is provided
    if output_path:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output_path, fourcc, fps, (output_width, output_height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # If frame is side-by-side stereo, take only left eye (first half)
        if width > height * 2:  # Assuming side-by-side format
            frame = frame[:, :width//2]

        # Apply reprojection
        reprojected = cv2.remap(frame, map_x, map_y, cv2.INTER_LINEAR)

        # Run YOLO detection
        results = model(reprojected)
        annotated_frame = results[0].plot()

        # Save or display frame
        if output_path:
            out.write(annotated_frame)
        else:
            cv2.imshow('YOLO Detection on Reprojected VR', annotated_frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    cap.release()
    if output_path:
        out.release()
    cv2.destroyAllWindows()

# Usage example
if __name__ == "__main__":
    input_video = "path/to/your/vr_video.mp4"
    output_video = "output_processed.mp4"
    process_vr_video(input_video, output_video)

Note: I have no idea if it works or it the performance would be good. In ffmpeg, adding a re-projection like this doesn’t add much time to the overall process.

I ran the ffmpeg and generated the unwarped vid.

Downside is that the command trims the edges of the video in the process, thus losing the genitals in the very bottom on the frame in the case we were trying to troubleshoot… :confused:

Too bad…

1 Like

Yes, unfortunately, it’s a general problem. You must choose only one viewpoint for the projection unless you have settings for each part of the video (in my example, pitch=-25, which means “look down 25 degrees”). If you used -30 or -35, you would probably not have cut anything for that specific video.

But that means, it is hard to have a single setting, especially for some “Western VR” which are all over the place in terms of dick position relative to the head (distance and angle). They even mess that up for most of the AR video, where it’s, by far, for me, the most important thing for a good AR experience (I don’t know why but I don’t feel immersed when I have dick sticking out of my knee)… But, well, that’s another topic entirely…

1 Like

Hmm, very enlightening comment, I appreciate the guidance :slight_smile:

I mean, I raw-coded a couple weeks ago a video reader library relying on ffmpeg as some format/encoding of videos were giving a hard time to opencv to read properly (performance wise). Which I ended up not using, as I am now first downscaling/reencoding videos before processing them.

I will experiment a lil bit, and maybe the unwarping could be done on the fly through the library based on the piece of code you shared, using an adapted pitch value when a scene requires it based on some trigger of some kind.

So much to explore, so little free time to do, I wish we guys could find a way to team up forces even more :crazy_face:

Thank you all for the benevolent comments and the objective observations, I really like you sharing and giving objective feedback. Makes me want to go on, experiment and share. Genuinely hoping this ends up benefitting the community.

And if this works, on of my most moronic move might have been to try and train the yolo model to detect shapes even in the “warped” zone… Not sure I have the courage to start all the image tagging and model training from the beginning lol

So, now, I have this against that (pitch -35 on top):

and the results of the YOLO run:

First observations:

  • Even if trained on raw VR images, those results are better on unwaped images (cf. confidence values 0.something as a probability…)
  • The “carrot” dick still is detected properly
  • We even get to detect one foot on (which might trigger some foot job algo logic though…)

Though, in that more extreme case, results are different : we get a “pussy” class but lose the penis one…

Anyway, thank you @Zalunda, this thinking out of the box helps greatly, we need to experiment further :slight_smile: I think this approach could be a game changer! (Except maybe for the difficult grinding scenes :rofl:)

Few subjects:

  • Now that I think about it, the best projection would probably be something like what we see in Heresphere using the following settings: “Forward” setting completely to the left (-0.750) and a slight pitch down (around -10). I just tested it on a different video and I could get a mostly unwarped view of everything (i.e. it simulates that the camera is far away from the scene and thus can see a larger section of it unwarped). Maybe the math for that projection can be found somewhere, I’ll look around because I’m interested too (or maybe @Heresphere is willing to help out unless he considers that it’s his secret sauce, which would be understandable).
  • As for your training, I’m not an expert but I expect that your training will ‘hold up’. Most of the center of the VR image is pretty unwarped by default. It’s only the side that gets messed up.

Also, more of a shared goal, I think it’s great that we have people who are testing various ideas to automate part of scripting but I wonder if the community could try to work on shared resources to help out, like:

  1. Set of curated video sections (sitting cowgirl JAV and Not-JAV, leaning forward cowgirl, etc), ideally with a known script given to the community by a scripter. [I’m thinking that an OFS plugin that would allow someone to select a section of a script and extract it to a separate .mp4/.funscript might help to create that library]
  2. Tool/Algorithm to compare a generated script to the baseline script and output metrics (in text and graphs): added points, missing points, the average difference in timing, etc.

[updated the last part comment for clarity]

1 Like

So, I have been toying around with both ffmpeg and a derivative of the python ClaudeAI code, and unfortunately the YOLO detection is not better :confused:

Example #1:

Original:

Unwarped through ffmpeg from source:

Undistorted through python code from source:

Same sequence of images for Example #2 :

Original:

FFmpeg:

Python:

Any thoughts on multiaxis?

1 Like

Definitely at some point, but right now I am struggling with some specific parts of videos and need to come up first with a solution for the one-axis approach.

Then, if we manage to overcome this challenge, we’ll have time to deal with multi-axis for sure.

That would be awesome indeed, and we could definitely use those results to fine tune weights, parameters and stuff.

Same with the curated library of video sections along with test funscript to run automated tests and comparisons.

I actually started a low-tech approach to check that the algorithm was picking the right lows and highs here, compared to a human made script for various random sections of videos : VR Funscript generation helper - Python & now CV AI - #44 by k00gar

With this kind of automated output for analysis (in green the human made funscript, the rest was generated):

2 Likes

Just want to say this is Amazing work here!
Please keep it up… if a working AI Scripter program that anyone can use is the result of this effort… You Are A HERO!!!

3 Likes

Still another artistic expression killed by AI :crazy_face:

just joking

Sorry guys, not much news to share lately.

Still reworking code whenever I have free time, trying to envision a better and more reliable approach.

In this case, I was comparing a pose detection model vs a the image recognition, and I actually think I need to combine both for better results.

I mean, it could still be useful in that case :

But even more useful in that case when we struggle to get a proper movement between pussy and penis on the right panel, when on the left panel we can detect a proper movement of the hips:

Of course, this is way more costly computationally-wise, but seems interesting enough exploring.

Also, even though we are not there yet AT ALL, I kinda like seeing this type of funscript heat map.

At this stage, disregard speed, number of actions (basically, disregard color and height of strokes, as it is a matter of tweaking, parametrization, etc.) and please have a look at the detection of the NO-action (close up pussy, no penis touching, etc.) zone in the middle of the video.

Current algorithm is “almost” there :slight_smile:

Still a (big big) LOT to do anyway, but this is kind of interesting.

Man-made:

Very famous website AI made:

Current algorithm shared here:

7 Likes

This is phenomenal work, man. And a lot of hours going into this too, I’m sure.

1 Like