Yes that part. The dots were all over her chest which was weird to me. I would expect the dots be on her hips (or at least on that area) like the first and second sample u send.
Ok, no, sorry, we have a misunderstanding here
So you are referring to the third extract I posted.
Those dots are the positions of the hips as per the base yolo11 pose model, nothing I have been using so far on my code, as a I using a specifically trained yolo11 detect (image recognition, not pose recognition) model.
It was just to show you that the result of a pose model can be very inconsistent in some specific cases
Here is the debugging display (very low quality for upload) of that grinding part, and indeed, there is so less visible move between the base of the penis and lowest part of the pussy that the algo is lost.
This definitely calls for a different approachâŚ
Have you tried using a 2D projection instead of the VR image?
It should eliminate the âVR encoding distortionâ, especially the massive one at the bottom of the screen.
Something like this:
ffmpeg -i VideoFile.mp4 -filter_complex [0:v]v360=input=he:in_stereo=sbs:pitch=-25:yaw=0:roll=0:output=flat:d_fov=120:w=2048:h=2048 Video2D.mp4
Thanks a lot @Zalunda, I was hoping I could skip any video processing, but now I will definitely try that
I would start by using ffmpeg to reencode but, if it end up helping, ideally, the best would be to include that step inside your existing pipeline if its possible.
Something like: file => VR image => FLAT image => YOLO.
I just asked sonnet-3.5 and it spit out this:
import cv2
import numpy as np
from ultralytics import YOLO
def create_projection_maps(width, height, fov=120, pitch=25):
# Create meshgrid of coordinates
x = np.linspace(0, width-1, width)
y = np.linspace(0, height-1, height)
xv, yv = np.meshgrid(x, y)
# Convert to normalized device coordinates (NDC)
x_ndc = (2.0 * xv / width - 1.0) * np.tan(np.radians(fov/2))
y_ndc = (2.0 * yv / height - 1.0) * np.tan(np.radians(fov/2)) * (height/width)
# Apply pitch rotation
pitch_rad = np.radians(pitch)
y_rotated = y_ndc * np.cos(pitch_rad) - np.sin(pitch_rad)
z_rotated = y_ndc * np.sin(pitch_rad) + np.cos(pitch_rad)
# Convert to spherical coordinates
r = np.sqrt(x_ndc**2 + y_rotated**2 + z_rotated**2)
theta = np.arctan2(x_ndc, z_rotated)
phi = np.arccos(y_rotated / r)
# Convert to equirectangular coordinates
map_x = (theta / (2 * np.pi) + 0.5) * width
map_y = phi / np.pi * height
# Convert to proper format for cv2.remap
map_x = map_x.astype(np.float32)
map_y = map_y.astype(np.float32)
return map_x, map_y
def process_vr_video(input_path, output_path=None):
# Load YOLO model
model = YOLO('yolov8n.pt')
# Open video
cap = cv2.VideoCapture(input_path)
# Get video properties
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Calculate dimensions for reprojected frame
output_width = 2048
output_height = 2048
# Create projection maps (only need to do this once)
map_x, map_y = create_projection_maps(output_width, output_height)
# Initialize video writer if output path is provided
if output_path:
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (output_width, output_height))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# If frame is side-by-side stereo, take only left eye (first half)
if width > height * 2: # Assuming side-by-side format
frame = frame[:, :width//2]
# Apply reprojection
reprojected = cv2.remap(frame, map_x, map_y, cv2.INTER_LINEAR)
# Run YOLO detection
results = model(reprojected)
annotated_frame = results[0].plot()
# Save or display frame
if output_path:
out.write(annotated_frame)
else:
cv2.imshow('YOLO Detection on Reprojected VR', annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
if output_path:
out.release()
cv2.destroyAllWindows()
# Usage example
if __name__ == "__main__":
input_video = "path/to/your/vr_video.mp4"
output_video = "output_processed.mp4"
process_vr_video(input_video, output_video)
Note: I have no idea if it works or it the performance would be good. In ffmpeg, adding a re-projection like this doesnât add much time to the overall process.
I ran the ffmpeg and generated the unwarped vid.
Downside is that the command trims the edges of the video in the process, thus losing the genitals in the very bottom on the frame in the case we were trying to troubleshootâŚ
Too badâŚ
Yes, unfortunately, itâs a general problem. You must choose only one viewpoint for the projection unless you have settings for each part of the video (in my example, pitch=-25, which means âlook down 25 degreesâ). If you used -30 or -35, you would probably not have cut anything for that specific video.
But that means, it is hard to have a single setting, especially for some âWestern VRâ which are all over the place in terms of dick position relative to the head (distance and angle). They even mess that up for most of the AR video, where itâs, by far, for me, the most important thing for a good AR experience (I donât know why but I donât feel immersed when I have dick sticking out of my knee)⌠But, well, thatâs another topic entirelyâŚ
Hmm, very enlightening comment, I appreciate the guidance
I mean, I raw-coded a couple weeks ago a video reader library relying on ffmpeg as some format/encoding of videos were giving a hard time to opencv to read properly (performance wise). Which I ended up not using, as I am now first downscaling/reencoding videos before processing them.
I will experiment a lil bit, and maybe the unwarping could be done on the fly through the library based on the piece of code you shared, using an adapted pitch value when a scene requires it based on some trigger of some kind.
So much to explore, so little free time to do, I wish we guys could find a way to team up forces even more
Thank you all for the benevolent comments and the objective observations, I really like you sharing and giving objective feedback. Makes me want to go on, experiment and share. Genuinely hoping this ends up benefitting the community.
And if this works, on of my most moronic move might have been to try and train the yolo model to detect shapes even in the âwarpedâ zone⌠Not sure I have the courage to start all the image tagging and model training from the beginning lol
So, now, I have this against that (pitch -35 on top):
and the results of the YOLO run:
First observations:
- Even if trained on raw VR images, those results are better on unwaped images (cf. confidence values 0.something as a probabilityâŚ)
- The âcarrotâ dick still is detected properly
- We even get to detect one foot on (which might trigger some foot job algo logic thoughâŚ)
Though, in that more extreme case, results are different : we get a âpussyâ class but lose the penis oneâŚ
Anyway, thank you @Zalunda, this thinking out of the box helps greatly, we need to experiment further I think this approach could be a game changer! (Except maybe for the difficult grinding scenes )
Few subjects:
- Now that I think about it, the best projection would probably be something like what we see in Heresphere using the following settings: âForwardâ setting completely to the left (-0.750) and a slight pitch down (around -10). I just tested it on a different video and I could get a mostly unwarped view of everything (i.e. it simulates that the camera is far away from the scene and thus can see a larger section of it unwarped). Maybe the math for that projection can be found somewhere, Iâll look around because Iâm interested too (or maybe @Heresphere is willing to help out unless he considers that itâs his secret sauce, which would be understandable).
- As for your training, Iâm not an expert but I expect that your training will âhold upâ. Most of the center of the VR image is pretty unwarped by default. Itâs only the side that gets messed up.
Also, more of a shared goal, I think itâs great that we have people who are testing various ideas to automate part of scripting but I wonder if the community could try to work on shared resources to help out, like:
- Set of curated video sections (sitting cowgirl JAV and Not-JAV, leaning forward cowgirl, etc), ideally with a known script given to the community by a scripter. [Iâm thinking that an OFS plugin that would allow someone to select a section of a script and extract it to a separate .mp4/.funscript might help to create that library]
- Tool/Algorithm to compare a generated script to the baseline script and output metrics (in text and graphs): added points, missing points, the average difference in timing, etc.
[updated the last part comment for clarity]
So, I have been toying around with both ffmpeg and a derivative of the python ClaudeAI code, and unfortunately the YOLO detection is not better
Example #1:
Original:
Unwarped through ffmpeg from source:
Undistorted through python code from source:
Same sequence of images for Example #2 :
Original:
FFmpeg:
Python:
Any thoughts on multiaxis?
Definitely at some point, but right now I am struggling with some specific parts of videos and need to come up first with a solution for the one-axis approach.
Then, if we manage to overcome this challenge, weâll have time to deal with multi-axis for sure.
That would be awesome indeed, and we could definitely use those results to fine tune weights, parameters and stuff.
Same with the curated library of video sections along with test funscript to run automated tests and comparisons.
I actually started a low-tech approach to check that the algorithm was picking the right lows and highs here, compared to a human made script for various random sections of videos : VR Funscript generation helper - Python & now CV AI - #44 by k00gar
With this kind of automated output for analysis (in green the human made funscript, the rest was generated):
Just want to say this is Amazing work here!
Please keep it up⌠if a working AI Scripter program that anyone can use is the result of this effort⌠You Are A HERO!!!
Still another artistic expression killed by AI
just joking
Sorry guys, not much news to share lately.
Still reworking code whenever I have free time, trying to envision a better and more reliable approach.
In this case, I was comparing a pose detection model vs a the image recognition, and I actually think I need to combine both for better results.
I mean, it could still be useful in that case :
But even more useful in that case when we struggle to get a proper movement between pussy and penis on the right panel, when on the left panel we can detect a proper movement of the hips:
Of course, this is way more costly computationally-wise, but seems interesting enough exploring.
Also, even though we are not there yet AT ALL, I kinda like seeing this type of funscript heat map.
At this stage, disregard speed, number of actions (basically, disregard color and height of strokes, as it is a matter of tweaking, parametrization, etc.) and please have a look at the detection of the NO-action (close up pussy, no penis touching, etc.) zone in the middle of the video.
Current algorithm is âalmostâ there
Still a (big big) LOT to do anyway, but this is kind of interesting.
Man-made:
Very famous website AI made:
Current algorithm shared here:
This is phenomenal work, man. And a lot of hours going into this too, Iâm sure.