Absolutely, with this only comment from my side : most of the relevant info I need to grab (genitals and few other key points) are mostly within the center third of the frame horizontally wise, and in the low half part of the frame vertically wise (if this is understandable at all).
This is the part I want to focus on, as this is the part the penis is located most of the time in VR POV videos.
If I may add, the very tricky part is the lower 15th of the frame, during missionary for instance, the penis is located there and the image in that layer is so warped and compressed in size that it makes it terrible to perform image recognition.
Yes, I get it. Itâs the beauty of the sg output projection, it keep the proportion intact when changing d_fov, which was not the case with the âflatâ projection:
I work with these models among several of our own at my day job. Glad to take a look. I have a new studio on the way, if my pro isnt getting beaten up on what i run, I ought to be able to reduce the time with the gear I have My code aint pretty either but it is literally used around the globe in our products.
It looks like it helped enhance the detection in this shadowy/dark area, without altering the rest of the recognition confidences on the rest of the frame.
But there could be better approach, if anyone hase a clue, feel free to share
Once again, trying to keep all that processing at ffmpeg level.
Also, now that I added some pose recognition task so as to get the hips in the evaluation process, the YOLO stage is wayyyy longer (close to 4 hoursâŚ) for this 3840x1920 video (focusing on a 1920x1920 panel only).
I decided to go with the largest (âxâ) YOLO 11 pose recognition model as I was getting stuttering results for the hips location. Might be mitigated by some moving average or kalman filtering if we try to proceed with a leaner version of the modelâŚ
Ok, so, here are the results of the experiments of the day.
I actually made a mistake, and run all the detections (detection + pose detection) on a regular (not âunwarpedâ videoâŚ). Boy, I need to start again.
Anyway, I now have this automated report, that compares a reference funscript (in red) with what the algorithm produced (in blue). Sections of 10 seconds, randomly picked within the video.
Quite disappointed that I messed something in the process, the closeup should show no action (now that I think of it, this might have been induced by the moves of the hips in the pose detection, that I should have filtered in case of no penetration⌠will check the debug logs⌠makes me want to bang my head to the wall sometimes lol).
I also need to check if the shallow strokes are not limited by a mistake in the speed limit I set with the handy in mind, might have miscomputed or made another dumb move there.
Will try and process it again, based on our âimage undistortingâ discussed strategy, and will also try with pose detection (both x-large, then with a smaller model for comparison, and lastly, without it).
Thank you very much for the kind words, I really appreciate.
I did not do that for the money, but more as a personal challenge and to give back to the community.
However, I wouldnât mind a coin or two for the time and energy spent, and to keep up the motivation but I have no clue how to proceed without making people feel they would have to pay for it.
Anyway, so much to do still⌠I wonât forget you on the multi axis if we ever get there!
@k00gar Iâm joining this topic late, but it looks like youâve made a ton of progress on this whole thing. Is the source code on github or anything like that? I bet this could really take off if you let some of us collaborate with you on it.
I saw in an early post of the thread that you are shy about sharing the source code, and thatâs totally understandable, but trust me the results youâve shown so far are more than enough to show how well youâve done with all of this.
I also agree with the notion of a Patreon or something like that if you want to monetize this work. All the more reason to open up collaboration, if you havenât already.
Thank you very much @jcwicked , the code is on GitHub but as a private repo for now, and I would definitely need another device to parallel some of this activity (training model, detecting, comparing, debugging, troubleshooting) but this is more a dream than anything else.
Anyway, I promised to share the code by end of January if not earlier, but I was hoping to make more progress than I actually did. Will need help for sure.
Actually, I did some housekeeping and commenting yesterday and might be able to share earlier than expected even if not at the maturity level I was targeting.
As per results follow up, this is what I got with the n-detect and n-pose model and some post-automation adjustment of the funscript:
Messy during section 5, need to debug/troubleshoot.
Weirdly reversed during section 6, investigation needed too.
Every time you lift a new stone, more mess under itâŚ
I have a job and a family that might not allow me to keep that effort going much longer, will do my best to dedicate as much time as I can to reach an acceptable state
Now running the detect n-sized model only, without the pose model, will see if we get any significant difference, and if any of it explains the reverse strokes or if I need to look deeper into the code to reverse that.
Just wanted to say that this is really interesting to follow even if I donât know much about it. Looking forward to see how all this turns out in the long run.