How to create subtitles for a scene (even if you don't understand the language)

laweru5748 · March 6, 2023, 5:39am

The processing progress bar does not seem to be accurate ever for me. For example, it would say like {number} / 800 and then the number goes waaaay over 800. It has never successfully stopped at the correct number and Iv tried a decent amount of times at this point.

Just to double check, after I click the X in the upload files section, I literally just drag and drop all the chunk files (in this case it was like 116 small chunk files adding up to 8.5 mb) into the upload file section and then click submit again right?

Zalunda · March 6, 2023, 5:54am

Yes. You could try transcribing in batch of 20 or something. I don’t really know. And for the number, no, it’s not really accurate.

Ambi · March 22, 2023, 4:02pm

Just went through the process once, and while its not hard, its very tiresome Good job though! i will probably use it a bit more on javs i really want to have translated ~

Zalunda · March 23, 2023, 2:36am

I totally agree that it’s a bit tiresome but there is always this really simple two-step alternative:

Step 1: Learn Japanese
Step 2: Use SubtitleEdit to create the subtitles from scratch

Marethyu · March 23, 2023, 1:06pm

You forgot to include the “IPython” module in your command to install the needed modules.

shamona_heehee · April 1, 2023, 1:56am

ChatGPT seems to combine lines together if the two lines are logically connected. Makes it pretty annoying sometimes. Does anyone else have this problem? I’ve tried to use different prompts but it just keeps happening.

MeraMera · May 11, 2023, 2:14pm

Did something go wrong when in step 4 I pull in the subtitles and they all seem to be a dot? I at first thought it was because the video I chose had too much background noise, but I tried another video and got the same thing.

If it’s wrong, any idea how to fix the issue? / what to try redoing

Zalunda · May 11, 2023, 9:33pm

No, you didn’t do anything wrong. It’s normal to have dot / “.” at step 4.
The first 4 steps only define where there is “voice” in the audio, without trying to know what is said.

It’s in steps 5 to 11 that we “replace” the dots with a transcription of the audio (using AI), in the original language (ex. Japanese).

In steps 12 to 21, we translate the original language texts to English.

MeraMera · May 11, 2023, 11:15pm

Oh, okay. I was getting paranoid I had done something wrong. Thanks! Rereading the tutorial I just realized after step 2 it says entries for each voice detected (with no text). I missed that.

jackb8911 · January 19, 2024, 2:16am

Curious if anyone has any new best practices for translating Japanese to English for JAV subtitles?

I’ve been using whisper-faster w/ the large-v2 model and they’re okay for some scenes but for others its just the same odd sentence over and over and over. I’m guessing it’s picking up some background static or something.

Zalunda · January 20, 2024, 1:45am

I found nothing that works perfectly yet. There are some models/tools (stable-ts, whisper-faster, whisperX, etc) that do a better job than ‘vanilla whisper’ for the timing of subtitles but they still have trouble with the Voice-Detection and repetition, as you said. The nice part is that whisper-faster can now be used directly in SubtitleEdit (i.e. Purfview’s Faster-Whisper seems to be based on whisper-faster).

I tested large-v3 a little bit. For Japanese, it didn’t seem better than large-v2.

As for the translation, I’m trying to use local LLM (since it’s harder to bypass chatGTP censoring now). The one that I prefer so far is https://huggingface.co/TheBloke/Orca-2-13B-SFT_v5-GPTQ, with a prompt like this:

I need you to translate the subtitles of a porn movie from Japanese to English.
I have the following requirements.

Requirements:
{
1- The target audience are adults so it can contains explicitly sexual concept.
2- Try to spot any sentences containing double meanings or wordplay.
3- All the provided lines are said by a girl to a guy.
4- Only translate the meaning of the original text. Don't expand. Don't give explication.
}

Please translate each line one by one:
{
[00000] 阿修羅?
[00001] わりとくいるよな
[00002] なぁごめんやけどさ、あと10分だけ泳がしてくれへん?
[00003] 最近さああ
[00004] 仕事が忙しくて
[00005] 全然暗号できてへんね〜んか
[00006] なぁ、ごめんやからお願い
[00007] この通り
[00008] っていうの、自分だけ
}

But it’s not working very well. For the same input, it’s clear that ChatGPT is a lot better. Hopefully, Llama 3 will be even better than version 2.

jackb8911 · January 20, 2024, 5:43pm

Appreciate the updated thoughts! I found the same with large-v3.

Sounds like we just need to let the AI models continue to evolve. It’s still remarkable how much we can do today!

Havey · February 2, 2024, 2:41pm

Thanks for your guide !
I encountered some problems when proceeding to step 10. Can you help me？

--- subtitles.wavchunks2srt ---
.\kavr-074.temp.perfect-vad.srt: Loading 218 .srt file from folder kavr-074.temp.perfect-vad_wav_chunks_9FA13526...
.\kavr-074.temp.perfect-vad.srt: Exception occured: System.FormatException: 28: Invalid format for start/end time:
   在 FunscriptToolbox.Core.SubtitleFile.<ReadSrtSubtitles>d__3.MoveNext()
   在 System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   在 System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   在 FunscriptToolbox.Core.SubtitleFile..ctor(String filepath, IEnumerable`1 subtitles)
   在 FunscriptToolbox.Core.SubtitleFile.FromSrtFile(String filepath)
   在 FunscriptToolbox.SubtitlesVerb.VerbSubtitlesWavChunks2Srt.Execute()

--- subtitles.gpt2srt ---

--- subtitles.srt2gpt ---

--- subtitles.singlewav2srt ---
.\kavr-074.temp.perfect-vad.whisper.wav: Skipping because can't find a file named 'kavr-074.temp.perfect-vad.whisper.srt' and 'kavr-074.temp.perfect-vad.whisper.offset' and 'kavr-074.temp.perfect-vad.whisper.chunks.srt'.

use FunscriptToolbox1.2.7

Zalunda · February 2, 2024, 2:54pm

Humm, one of the 218 .srt file is invalid somehow but I didn’t write the offending file name in my exception handling.

Could you zip all .srt files in the folder “kavr-074.temp.perfect-vad_wav_chunks_9FA13526” and link it here?
You might have to rename the file to .zip.funscript or something like this.

Or create an issue here and link the file there.

Ricster89 · February 2, 2024, 3:00pm

i was legit thinking of learning japenese just to more enjoy jav scenes lol funny that ive been playing the yakuza games for years and havnt learned a single bit yet lol

Havey · February 2, 2024, 3:05pm

thank you for your reply
kavr-074.temp.perfect-vad_wav_chunks_9FA13526srt.zip.funscript (53.1 KB)
this is the .srt zip link.

Zalunda · February 2, 2024, 3:31pm

Ok, there is a bug in my .srt handling if the text is only a number. Ex:

7 <= part of the format
00:00:16,240 → 00:00:19,180
2 <= text

I’ll fix it. In the mean time, replace those files in your folder:

135___9fa13526__0326505_1720__wav-subs.srt (568 Bytes)
176___9fa13526__0703714_442__wav-subs.srt (39 Bytes)

Havey · February 2, 2024, 3:36pm

Thank you, it works.

WIZARD · March 27, 2024, 2:32am

Has your workflow improved or changed since then if you don’t mind me asking? I’m currently diving into the subtitle rabbit hole and find myself using some combination of Chatgpt + DeepL + Gemini (with the occasional unable to translate due to explicit content warning). And using Subtitle Edit with Large v2. For some reason, the Audio->Text ‘Auto Adjust timings’ setting messes with some of the timings off, so I have that turned off. Not sure if you experienced the same or experimented with that.

I was also experiencing inaccurate timings with what I am guessing is called hallucinations? simply due to the model? basically issues in the timings due to breaks/brief pauses in the audio or the audio not being entirely loud enough for it to pick up the words…(i belive there is something called the VAD filter to help with this? but I’m a total noob when it comes to using the cmd line and various parameters).

I know stable-ts is also included in the software (Subtitle Edit) as well - to help remedy these issues, but the efficiency of whisper-faster is really a game changer.

Zalunda · March 29, 2024, 11:46pm

I’m currently working on a new ‘subtitleV2’ verb in FunscriptToolbox that will be more automated and flexible (see subtitleV2 branch on github).

I’m testing other tools/methods right now:

Using PurfviewWhisper locally to make transcription, and the initial ‘draft’ VAD ( instead of SilaroVAD).
Using large JSON prompt with mistral.ai / Large Model. For example, given chat-mistral-large-REQUEST-001.txt, mistral Large generated chat-mistral-large-RESPONSE-001.txt
Using local AI API (Mixtral 8x7b hosted by LMStudio) to do the AI translation automatically.

For me, unfortunaly, as long as the translation is not helped by a ‘computer vision AI’ that can understand what’s going on in the scene, manual works will always be needed for some part of the process.