New Better Method for Making Passthrough Videos!

I was having problems with the image getting warped, turned out the problem was I was inputting a fisheye190 vid and the conversion to fisheye190 was messing with it. I tweaked the pipeline and made a new .bat to use for vids already fisheye190 that would skip the conversion step, and after doing that my 5 minute test file came out fine. The problem I’m running into now is that my first full attempt is moving at a crawl.

I have a 5080/9800x3d/32GB ram and I’m trying to convert a 15GB 8k 62 minute vid. Time elapsed is about 10 hours, overall 74% complete (on segment 295/372 of the MatAnyone step), but the ETA says 351 hours. Is that just a bad guess or did I hit a hitch somewhere? My settings are no conversion (my addition to the pipeline), no parallel eyes, max height 1280, quality ultra, segment length 20. Is there a more efficient way to do this? I know no parallel eyes will significantly increase the time, but the documentation says that parallel eyes requires 20GB VRAM and my 5080 only has 16, so I figured I needed to process sequentially. Anyone have suggestions on how to optimize this? Should I turn parallel eyes back on but limit the mask height to get under 16GB vram? Is that even how that works?

**for the record, the ETA was absolutely wrong and it ended up finishing in about 14 hours. Also, from my testing, parallel eyes isn’t worth using on a 5080. Even when I seriously limited the mask height, I couldn’t get the single eye VRAM usage low enough to be able to be able to do both at once, so rather than doubling the speed, all parallel eyes did was do both eyes together at a much slower pace than each eye separately. I think it ended up taking a few more hours than a comparable attempt with much higher mask height but single eye sequential processing. If anyone has been able to get parallel eyes to give meaningful benefits with a 5080, I’d love to hear how. No parallel eyes, max height 1280, quality ultra, segment length 40 seems to be working pretty well for me. With using the sequential processing I do have over 5GB overhead left on the VRAM, so I’d be interested in seeing if anyone has suggestions on how to take advantage of that to either speed the process up or make the final product better quality (without significantly increasing the time this takes). Would upping the mask height above 1280 make a noticeable difference or just make the process longer with negligible improvement?

I’m having issues getting the external alpha mask into Heresphere, even though it correctly identifies an existing XALPHA file. I just get full passthrough all the time, like the mask isn’t even there. Can you share some of your settings?

11 hours for full scenes? I don’t have a great GPU, but that isn’t bad. I’m tempted to rent a cloud GPU… Chances are those will be running on linux, which means those scripts would need reworking…

Or maybe not, linux drivers are wonky, so maybe those data centers are running windows since it’s GPU processing rather than CPU.

did you give it the same file name as teh video file with _XAPLHA.MP4 at the end? did youclick external mask on heresphere?

Sure did! After reading the Heresphere discord it seems I am not alone in this. Some ppl have had success with removing audio from the XALPHA file, but that did not fix my problem

You can post gofile links here

you can try compositing the video with ffmpeg and use the mask to cutout the positive image from the original video. ask any of the ai’s for help with an ffmpeg command

something like this:

ffmpeg -i input_8k_sbs.mp4 -i mask.mp4 -filter_complex “[0:v]split[orig][bg]; [bg]drawbox=color=0x00FF00:t=fill[green]; [1:v]format=gray[mask]; [green][orig][mask]maskedmerge” -c:v libx265 -crf 22 -preset medium output_greenscreen.mp4

1 Like

possible to make this work on linux?

Check the Github link in the original post

1 Like

depends how have you created your XALPHA file, it has to be specific format, but personally I have given up on this approach, as Heresphere is not able to keep it in sync, probably because it is processing two video streams, and I am using the original approach, i.e. converting it and packing mask together in one file