How to make your own Passthrough videos!

I am trying and I don’t know how to solve the freezes.
The curious thing is that in different videos fail at the same point (in 3 tests), the images are accelerated from minute 8 and 11 seconds later, it freezes for a few seconds, as if synchronizing the video again.

I think it is a problem of the mpeg concat, because all the matted segments are correctly.

hmm I think the best solution would probably be a combination of background matting v2 and robust video matting; so for the cowgirl scene, the male actor probably aint gonna be going anywhere so he would be relatively stationary, in that case; background matting v2 is a better fit.

RVM is just a good all-rounder for most cases but it will have some trouble if there’s not enough color separation between the moving actors and the background

I would be interested to develop some interface that lets you bookmark and choose parts of the video and what method to go with; and for the background matting, you could scrub through the video and lasso areas of the frames to “rebuild” the clean plate.

If there’s any studio looking at this, i’d recommend using background matting v2 with your high-res raw footage + cleanplates; it’d give you an infinitely better result than using Semantic Matting to make those passthrough! :slight_smile: Or just hit me up in a DM!

wow that’s really weird! Exactly at 8 minutes for all 3 videos? Could you identify which segments are after 8 minutes?

It sounds to me that at 8 minutes, one or several segments failed ( within that 11 second) so it safely continues and does not break the concatenation, appearing to accelerate and then freeze?

Did you spot any error messages or warnings when concatenating?

Edit:
I’m currently processing a 40 minutes video for the past 2 days :upside_down_face: and i really hope this will not happen to me :joy:

it is at the end of segment 32 to 33 that it fails.
It launches this warning continuously :
[vost#0:0/copy @ 00000248289400c0] Non-monotonic DTS; previous: 288013168, current: 18707982; changing to 288013169. This may result in incorrect timestamps in the output file.
And at the end this is the result, the time matches well.

size= 3190002KiB time=00:10:45.81 bitrate=40464.3kbits/s speed=82.7x

Today I’m not editing any video, and I’m testing with the one I already have matted to see if I can fix it.

hmm interesting! I’m getting the same warning messages now on my 15 seconds segments but they appear to concat just fine

How many seconds are your segments?

Could the error be due to the hvc1 codec of the video source?
In the segments I put 15 seconds, and it separates them in segments of 14s to 19s.

wow that’s super weird!

Not too sure about the codec but the segments having some variance is definitely weird

do you have a link to the video perhaps? maybe on VRporn.com?

Based on how straightforward runwayml is (although a paid option), would it be possible to create a similar GUI that would allow one to segment/highlight/negate areas? And maybe even see the process visually so one could pause and edit the result in real-time if something goes bad, due to maybe actor switch or scene cut.

The user hardware availability would probably determine this process to a great extent as well… Would be really interested in hearing your thoughts, thanks!

Edit: On an unrelated note, I’ve recently run into a ~30s section where the female actors leans in and is completely invisible, would I basically have to cut out that section from the initial video, then “cleanplate” that video using something like photoshop, and finally restart the matted segmentation process for that particular section? Based on the initial post, seems like RVM would be a good fit here?
There are also a few section where the actors change positions and legs are basically invisible on the female actor as well. Would the process remain the same? Thanks for your help in advance!

1 Like

yeah i do this in resolve. after effects has one too

Although i’m far from a professional, after using SIlhoutte/Mocha I cannot image rotoscoping anything that is longer than like 30s. Maybe I’m doing it wrong? Is it possible to just select an entire area in the video and mask/alpha it? How comparable are software like AE/resolve to the software in this thread? What is your current workflow if you don’t mind me asking?

Resolve you just draw a line and click autotrack. rotoscoping is tedious yeah. Mocha should have an automatic option too. But I am more of a developer than a video editor. I just kinda push buttons and see what happens. Right now davinci resolve has a tool called magic mask which is like auto rotoscoping then there are several other types of 3d trackers.

The software here is more automatic. Resolve gives you an automatic option as well as allow you to edit it afterwards. Resolve does cost money though. I read a fun fact today that Adobe products and resolve are also some of the most pirated programs in the world.

I dont necessarily have a workflow yet. Nothing really works too great. Ive just been playing with things here and there as they come out. Ive got pretty good with comfyui too. Which I think may be a good option here. Its just very compute heavy

1 Like

Ah yes I remember the line thingy now. It’s definitely simpler than basically magic lasso-ing hair.
Do you have a beefy system? I am wondering how these various tools would compare to each other using the same video. Might be a good experiment.

Are you using ComfyUI for stable diffusion?

I do, but things like this can still bring it to its knees without much effort. Yeah comfyui is stable diffusion with node editor based framework. its scary at first but once you start using it its not bad

Oh yeah that is possible! ( I assume you’re talking about background matting v2! )

I think it sounds a bit like openfunscripter but for passthrough, with a suite of tools to assist in building the passthrough :slight_smile:

It could possibly work if we fork openfunscripter as a base, but that is definitely a ton of work for something that may not even be widely used :frowning:

I think the easiest solution is to wrap RVM or BGMV2 as a AE/Resolve/Mocha/Premiere plugin

I’d much rather investigate 4D Gaussian for true AR porn or using stable diffusion to generate mattes

I think the nice thing about RVM and BGMV2 is that they are able to run on consumer hardware pretty well, so i don’t think that it’s too much of a blocker here if you just have a UI on top handling all that :slight_smile:

What i would really find interesting is distributing all these segments to be processed to multiple machines though; that would be cool.

Oh yeah those parts are probably best to use RVM to matte!

Updated the readme with @IIEleven11’s changes! Thank you!

Updated with some troubleshooting solutions!

These are somewhat complex things to be doing if someone that isnt a developer tries. Theres many points of failure for a python environment. My solution is not fail proof. If someone is having a problem though just let me know. I can solve it. Or if something doesnt make sense I can explain it for you. The stuff can be very confusing

1 Like

Yeah on that, i think we might need to add an installation section for git before git cloning the repo too, a non-dev probably wouldn’t have git already installed and ready to go; I’ll add that in

Hello, working on a web service (with an API) to get RVM on atleast 4 gpu simultaneously. I need to figure out gui, best video splitting algorithm (I’m thinking about PySceneDetect then time splitting) and a way to let user pick the background for each segment (with SAM-HQ maybe).
@blewClue215 ping me in mp if you want to collaborate / test the service :3

1 Like

:open_hands:

Amazing! So would it be a multiple solutions in one kinda thing? ( such as RVM on some shots, BGMV2 on some shots and SAM-HQ on some shots?)

If that’s the case It would be interesting to be able to use PySceneDetect to classify and identify shots that are good for each use cases :slight_smile:

But sure I’ll be interested in collaborating or at least testing out the service!

RVM with just a single video as input
Return an array of playable video segment to a web gui, then user can select what segment to keep.
On some key frame foreground selection (selection via SAM-HQ) then re process failled segment with BGMV2.
If user is ok with result, merge segment + audio.

1 Like