Funscript AI Generation - VR (& 2D POV now?) - Join the Discord :)

Today’s attempt at fixing the handjob sections.

Will need to decide between a “realistic scale / depth of stroke” approach or some “emphasized / enhanced depths” one… We can look into this later on, as this would be a matter of parameter more than algo logic.

Still a couple outliers, but way more manageable than the previous mess.

Extract slowed down at 0.5x :

1 Like

Wow, that looks awesome!
For info, It seems that stereographic output preserve the center (i.e. point of focus) of the 3D-to-2D image the best.

Here’s an example at 200* FOV

1 Like

Thanks @jambavant, it seems promising. I never thought about changing the output format.

Did you use something like this?

-filter_complex "[0:v]v360=input=he:in_stereo=sbs:output=sg:w=2048:h=2048

So far, for my purpose, I like this ‘zoomed in’ view (which I had trouble doing with ‘flat’ output):
-filter_complex "[0:v]v360=input=he:in_stereo=sbs:pitch=-35:v_fov=90:h_fov=90:output=sg:w=2048:h=2048

2 Likes

Really nice, happy to test if you are willing to share the formula :smile:

1 Like

This is seriously cool and looks super promising! That’s some true dedication there especially with the tagging… :slight_smile: Wouldn’t mind contributing when the time comes. Keep up the good work!

1 Like

Happy that you like it :smile:
The main issue is that preparing a 3D video to a 2D video requires some camera angle changes: you need to crop video and the action does not consistently happen in the same region (center of the image). You need to control pitch, yaw, roll, FoV to get a good picture.

There are a few good programs for 3D-2D conversion, but the programmers are quite opinionated on how their programs should be used (e.g. restrict max FoV, etc.).

The v360 filter I use looks like this: hequirect:sg:in_stereo=2d:out_stereo=2d
(In VR2normal, you can copy paste in the “v360 custom filter”)


Here’s an example, at an exceptional 180* FoV, and I think it still looks quite good.

You can play a little with this, and see what would yield the best results.This would allow to zoom in on the action (e.g. penis) in order to only keep the relevant zone of action in the video to feed to the AI Script program.

Let me know if you want me to prepare a file for you.

Looking at the video of Katrina Jade, here’s what the result looks like:
00:00:00-00:32:35 => Pitch -50, Yaw 0, Roll 0, FoV 100
00:32:35-00:44:55 => Pitch -20, Yaw 0, Roll 0, FoV 100
It literally took 2 min to configure, preview and mark, then 15min of video conversion.

Here’s the preview of what it looks between 00:32:30 and 00:32:40 (the transition from -50 to -20 pitch).

The resulting video would have the penis always zoomed in and centered, which I think can very much help the scripting process.

namkatrinadamon_vrdesktophd_PrevieFragment1940000-1970000

And the missionary position may be much more manageable like this:

namkatrinadamon_vrdesktophd_PrevieFragment1940000-1970000

vs original cropped
katrina

And adding an example for IPVR-283 too
IPVR-283_Transition_600000

vs original cropped
IPVR-283

Thank you @jambavant !

I really need to look into that, even though I was initially looking for it to be done on the fly using a ffmpeg filter.

But if pre-processing allows to unlock this kind of view with the stunt cock in the very low bottom of the frame, why not…

Aside from that, I have been making a little bit of progress today, here is the output vs a reference world famous human script-maker :slight_smile:

“State of the art” hand-made reference for comparison:

  • 6160 actions rendered VS 6189 in the reference “state of the art” script
  • Respect (almost) of the no-action zone (close-up)
  • No good at the initial rubbing when the cock is not seen

I added a speed limiter option to accommodate devices like the Handy, but now, I need to work on the strokes depth which are way too shallow in many cases.

I use this one wich still has a flat output but also a single “d_fov”, not sure how ffmpeg translate it to vertical/horizontal:

ffmpeg -ss 00:10:00 -i “testin.mp4” -t 30 -filter_complex “fps=30,crop=in_w/2:in_h:0:0,scale=-1:1440:flags=lanczos,v360=input=hequirect:output=flat:d_fov=140:pitch=-30,crop=in_w/2:in_h:in_w/4:in_h” -c:a copy -c:v libx264 -preset ultrafast -tune zerolatency -y testout.mp4

I guess you can use it as an ffmpeg filter. Here’s the command generated by VR2Normal (after I marked the video). You can try with the video filter see if that works.

ffmpeg -hide_banner -loglevel error -stats -y \
-i "SLR_SLR Originals_Vote for me_1920p_51071_FISHEYE190_alpha.mp4"
-map 0:v:0 \
-vf "sendcmd=f='SLR_SLR Originals_Vote for me_1920p_51071_FISHEYE190_alpha.cmd',crop=w=iw/2:h=ih:x=0:y=0,v360=fisheye:sg:iv_fov=190:ih_fov=190:d_fov=100:pitch=-20:yaw=0:roll=0:w=854:h=480:interp=lanczos:reset_rot=1" \
-c:v libx264 -preset fast -crf 24 -tune film -x264-params keyint=600:bframes=7 \
"SLR_SLR Originals_Vote for me_1920p_51071_FISHEYE190_alpha_VR2Normal.mkv"

Where “SLR_SLR Originals_Vote for me_1920p_51071_FISHEYE190_alpha.cmd” is a text file containing:

#START
	0ms-1181517ms [enter] v360 pitch '-20', [enter] v360 d_fov '100';

	1181517ms-1182517ms [expr] v360 pitch 'lerp(-20,-65,(cos(lerp(PI,0,TI))+1)/2)';
	1182517ms-1460850ms [enter] v360 pitch '-65';

	1460850ms-1461850ms [expr] v360 pitch 'lerp(-65,-10,(cos(lerp(PI,0,TI))+1)/2)';
	1461850ms-2240700ms [enter] v360 pitch '-10';

	2240700ms-2241700ms [expr] v360 pitch 'lerp(-10,-60,(cos(lerp(PI,0,TI))+1)/2)', [expr] v360 yaw 'lerp(0,5,(cos(lerp(PI,0,TI))+1)/2)';
	2241700ms-2778069ms [enter] v360 pitch '-60', [enter] v360 yaw '5';


Thanks for the explication @jambavant. Looking at the terms used in the application UI, I’m pretty sure the application uses ffmpeg in the background.

v360 is also the name of the filter in ffmpeg.

ffmpeg -i video.mp4 -filter_complex [0:v]v360=input=he:in_stereo=sbs:pitch=-35:v_fov=90:h_fov=90:d_fov=180:output=sg:w=2048:h=2048 output.mp4

(note: input=he => hequirect like the app’s UI, output=sg => stereographic like @jambavant suggested)

Also, if the quality of the image is important, it’s also possible to add this at the end to have a better interpolation of the pixels, but it take longer to process:
:interp=lanczos

In the UI, I’m pretty sure that FOV is the same as d_fov (i.e. depth FOV).

One of the advantages of the application, from what I can see, besides being able to ‘queue’ different settings for different parts of the video, is that it can compute automatically v_fov & h_fov relative to the width and height that you choose (I see 87.2 HFOV, 49 VFOV in one of the images). If, like me, you are fine with a square image that shows most of the action, you can use 90 / 90 and only play with pitch & d_fov to get a different level of ‘zoom’. As long as we use the “sg” output, the image seems to be relatively ‘unwarped’.

Thanks @SlowTap, your command seems to be similar to what I was using before. Using a flat output and ‘cheating’ on d_fov to try to get a larger view of the scene, but that also warp a little bit the image. With the sg output, we get a unwarped image even when zoomin and out.

Note: the example above is using v360=fisheye:sg:iv_fov=190:ih_fov=190 which wouldn’t work with Ass Zapped.

Yes, You’re absolutely right.
I posted an example of the ffmpeg commands generated by the tool + commands.

Thank you guys! @jambavant, @zalunda and @SlowTap

Here’s what I got with the following parameters in the ffmpeg reader command:

  • self.type = “fisheye”
  • self.iv_fov = 120
  • self.ih_fov = 190
  • self.d_fov = 110

Original on the left, with the new parameters on the right.

Looks like we retrieved a penis instance doing so (but lost the pussy box on the way though…).

I will re-run a full detection on the video, and parse the results so as to recreate the funscript.

In the meantime, I reverted some change I made earlier, and I am getting less shallow (yet, still too shallow) actions. Lost some actions also, be upon couple checks, it looks like those were unwanted outliers:

Before:

Now:

See this unwanted value down during the 100 plateau just before 860s (before):

Now:

I know what you mean now… I tried VR2Normal and it limits d_fov to 110. I’m guessing you compiled your own version without the limitation. I just wanted to see if I could compute v_fov / h_fov for me but I don’t think that it takes into account ‘sg’ output anyway so it’s not really useful anyway. Using 90/90 in all case seems to work pretty well already.

Ideally, you should not ‘mess’ with the input type & fovs (type=fisheye, iv_fov=190, ih_fov=190 or type=he) since the video has been encoded like that. From my understanding, the input parameter allow ffmpeg to map the original video to a sphere (always the case for all type of projection) and then the output take the image from the sphere and remap it into a different projection.

Which wrapper are you using?

IMO, you should try something like this (if you have those parameters in your ffmpeg wrapper):
– Input

  • self.type = “fisheye”
  • self.iv_fov = 190
  • self.ih_fov = 190
    – Output
  • self.output = “sg”
  • self.pitch = -35
  • self.v_fov = 90
  • self.h_fov = 90
  • self.d_fov = 180
  • self.width = 2048
  • self.height = 2048

You would get this:

Some homemade Python library mimicking OpenCV cv2 but relying on a ffmpeg subprocess…

I struggled to read h265 at some point with opencv, so I tried to elaborate some alternative with my best buddies (ChatGPT and DeepSeek…).

All my code was initially relying on OpenCV for the frame reading/parsing, so, in order to minimize the rework I created a ffmpeg library that could swiftly replace it in my use case… (might sound moronic, once again, I am not a coder :sweat_smile:).

I initially tried with 190/190, but got weird/unexpected results. Will try again after this round of yolo detection is done…

Is this frame the #79000 ?

In the “Config” tab, you can set the maximum FoV to 180*, which I think is a hard limit. Same thing in VR Reversal, you can edit the lua script to change the max FoV.

What settings give the best results with detection? Is a zoomed in view better or zoomed out?

Around 21:58.1 or #79006 (near).

If the goal is to analyze the whole video without having to set differents settings for differents part of the video, IMO zooming out is better since it allow to grab all type of actions. The only ‘price’ to pay would be to use a higher resolution so that the AI have enough pixels to recognize stuff.

The model was trained on the left panel of VR videos, resized at 640x640.

So, at first, I would say unzoomed for best results.

However, in the process, I focus on the center third (horizontally-wise) of the frame.

So maybe we could replace that step by a zoomed in view.

In the end, I think what matters most is how warped/unwarped the image is after processing vs the source…