How to create subtitles for a scene (even if you don't understand the language)

When I started looking into this, I hoped to be able to give Whisper an audio file and get an almost perfect .srt. Unfortunately, it’s not “there” yet. Whisper rarely gets the right timing for subtitles (at least for Japanese). It also often hallucinate text (ex. “Please Like And Subscribe” or “Thanks for watching” at random spots because it was trained with a lot of YouTube content) and goes into a loop and repeats the same text over and over.

I finally ended up “helping” Whisper to keep it on track. I also used ChatGPT to get better translations, mostly because it’s possible to give it a context for the translation.

Hopefully, Whisper will mature and a future release will be more useful for my use case (i.e. subtitle for JAV VR) and this guide can be simplified.

Important:

THIS IS STILL A WORK IN PROGRESS but it’s working well for me.

So, here is the guide (version 1.1).

Yes, it has a lot of steps but, once everything is set up, it goes relatively fast (i.e. about twice times the length of the video, depending on the number of subtitles). For example, it took me 1h to make the subtitle for a 30 minutes video.

  1. Setting up:

    a. Download and install the latest version of FunscriptToolbox (at least version 1.1.0) somewhere on your machine.

    b. Download and install the latest version of SubtitleEdit somewhere on your machine.

    c. Create a new folder (ex. SubtitleCreation) somewhere.

    d. Put this file in the folder:
    FunscriptToolboxSubtitles.v1.1.1.bat.txt (1.5 KB)

    e. Remove “.txt” in the file name.

    f. In the batch file, put the path of your FunscriptToolbox folder on the “set path line”

     ```
     @echo off
    
     set path=C:\AwesomeTools\FunscriptToolbox;%path%
    
     echo --- subtitles.video2vadsrt ---
     ...
     ```
    

    g. I suggest creating a shortcut to SubtitleEdit.exe in the folder.

    h. Move the videos that need subtitles to the folder.

    i. At the end, your folder should look like this.
    Step1

  2. Start FunscriptToolboxSubtitles.bat for the first time.

    For each video file:

    It will extract a “(video name).temp.vad.wav” file from the audio tracks.
    I will also create a “(video name).temp.vad.srt” that contains a subtitle entry for each voice detected (with no text).

    Note:
    For this step, FunscriptToolbox will call a python script. You might need to install python. I’m using version 3.9 (the latest version is not compatible with some image generation AI that I use). You might also have to “pip install” a few modules (I don’t remember which one but you’ll get an error message that will tell you which one is needed).

    If you don’t want to install Python, you can also add the option “–skipvad” to the “subtitles.video2vadsrt” step of the batch file. With this option, it will create an empty .srt file but you’ll have to do more work in SubtitleEdit in the next step.

  3. At this point, the working folder will look like this:

    Step3

  4. Use SubtitleEdit to make a “perfect VAD” file:

    a. Start SubtitleEdit.

    b. Drag the “(video name).temp.vad.srt” over the subtitles list.

    c. Drag the “(video name).mp4” over the video section.

    d. Adjusts the subtitles (while ignoring the text column for now).

    Which means adjusting the start and end of every entry generated by VAD.
    Deleting extra subtitles (ex. moaning).
    Adding missing subtitles if needed.
    Don’t extend the length of the subtitle too much, it makes Whisper “misbehave” more. FunscriptToolbox will extend all subtiles at the last stage of the process.

    e. Choose the menu “File\Save As…”
    Step4e

    f. Save the file as “(video name).temp.perfect-vad.srt”.

    g. Close SubtitleEdit for now.

  5. Rename the file “(video name).temp.vad.wav” to “(video name).temp.perfect-vad.wav”.

    Step5

  6. Start FunscriptToolboxSubtitles.bat again.

    This time will use the “perfect VAD” to create files optimized for OpenAI Whisper.

    The file “(video name).temp.perfect-vad.whisper.wav” contains only the audio from VAD sections found in “(video name).temp.perfect-vad.srt”, separated by 1 to 2 seconds of silence.

    The file “(video name).temp.perfect-vad.whisper.offset” contains the offset that will be used to recreate a .srt file synchronized to the full wav/mp4 after Whisper has transcribed the audio.

    A file “(video name).temp.perfect-vad.whisper.newvad.srt” is also created. This file is not really needed for the process but can sometimes be useful.

  7. At this point, the working folder will look like this:

    Step7
    A folder “(video name).temp.perfect-vad_wav_chunks_ID” will also appear.

  8. Time to transcribe the audio using OpenAI Whisper.

    a. Open this link in a browser:
    Google Colab

    b. Start the first step “git clone” of the collab template and wait for it to finish.

    c. Accept the warning that will pop up (note: this is not my collab template).

    d. Skip the second step and start the third step “pip install requirements”, and wait for it to finish.

    e. Start the fourth step “python app.py”.

    f. When a link to a public URL shows up, click on it (i.e. https://(id).gradio.app).

    g. On the page, for model, choose “large-v2”.
    Step8g

    h. For Language, choose the audio language (ex. “Japanese”).

    i. Drag the “(video name).temp.perfect-vad.whisper.wav” to the Upload Files section.
    You can convert multiple files if you like.

    j. For Task, choose “transcribe”.
    Step8j

    k. For VAD, choose “none” (important because we already did a “better” VAD previously).

    l. Click “Submit”
    Step8l

    m. Wait for the transcription to complete. It usually take about 5 minutes for a 30 minutes wav file (70MB).

    n. When done, download the .srt file (or the zip file, if you converted multiple files). You can download the other files too but they aren’t needed.
    Step8n

    o. Drag all the .wav files from the folder “(video name).temp.perfect-vad_wav_chunks_ID” to whisper Download (after ‘deleting’ the old list), and then submit.

    p. When done, download the .zip file and unzip it into the folder “(video name).temp.perfect-vad_wav_chunks_ID”.

    q. If you won’t use the collab machine for a while, go back to the collab tab and disconnect and delete the runtime.

  9. Rename the .srt file generated by whisper to “(video name).temp.perfect-vad.whisper.srt
    Step9

  10. Start FunscriptToolboxSubtitles.bat again.

    This time, it will use the “(video name).temp.perfect-vad.whisper.offset” file to move the subtitles from “(video name).temp.perfect-vad.whisper.srt” back to their original spot (i.e. synchronized to the original video) and create a “(video name).temp.perfect-vad.whisper.jp.srt” file.

    Note: It usually works well but it’s still not full-proof.

  11. At this point, the working folder will look like this:
    Step11

  12. Use SubtitleEdit to do a basic translation with Google.

    a. Start SubtitleEdit.

    b. Drag the “(video name).temp.perfect-vad.whisper.jp.srt” over the subtitles list.

    c. Choose the menu “Auto-translate \ Auto-translate…”

    d. Select the right “From” and “To” language (ex. Japanese to English) then click “Translate”.

    e. When the translation is done, click “OK”.

    f. Choose the menu “File\Save As…”

    g. Save the file as “(video name).temp.whisper.jp.google.en.srt”.
    (And, yes, there is some logic & need for this file-naming madness. :grin:)

    h. Close SubtitleEdit for now.

  13. Start FunscriptToolboxSubtitles.bat yet again.

    This time, it will use the file “(video name).temp.whisper.jp.srt” to create a “(video name).temp.perfect-vad.whisper.jp.gptprompts” file that contains “prompts” that can be fed to OpenAI ChatGPT.

    It will also use the file “(video name).temp.whisper.jp.google.en.srt” to create a “(video name).temp.whisper.jp.gptresults” file.

  14. At this point, the working folder will look like this:

    Step14

  15. Load both files (gptprompts & gptresults) in your favorite text editor (ex. Notepad++).

  16. It’s now time to translate using OpenAI ChatGPT.

    a. Open this link in a browser: https://chat.openai.com/chat
    (you might need to create an account on https://openai.com first).

    b. Create a new chat.

    c. From the gptprompts file, send “instructions” text and the first 30 lines of texts (I added a few empty lines every 30 entries).

    d. Wait for GPT to translate the batch of texts, then feed it the next 30 lines of texts.

    e. When it’s done, copy all the text from the page (with Ctrl+A, Ctrl-C) and paste it at the end of the .gptresults file.

    f. ChatGPT will often say “Oh, this is not appropriate content for my prude little machine brain” but it will continue to translate anyway. I ignore it.

    g. Sometimes, it will say, “No, this is too much for me, I won’t continue”.

    In that case, copy all the text that you got so far from the page (with Ctrl+A, Ctrl-C) and paste it at the end of the .gptresults file.

    And then, recreate a new chat. Re-paste the instruction and continue about 10 lines before it stopped. For example, if it stopped translating at line [0090], restart at lines [0080]. It works best if it has a little bit of context.|

    h. I usually recreate a new chat and re-translate the whole thing a second and, sometime, a third time. Each time, adding ChatGPT responses to the end of the .gtpresults file.

  17. When you are done with the translation, start FunscriptToolboxSubtitles.bat yet again.

    This time, it will use the file “(video name).temp.perfect-vad.whisper.jp” and extract all GPT responses from “(video name).temp.perfect-vad.whisper.jp.gptresults” and create a file “(video name).temp.perfect-vad.whisper.jp.en.srt”.

  18. At this point, rename:
    (video name).temp.perfect-vad.whisper.jp.en.srt” to “(video name).srt”.
    And
    (video name).temp.perfect-vad.whisper.jp.srt” to “(video name).jp.srt”.

    Step18

  19. Back to SubtitleEdit for the last part (you might do a first pass in a text editor too).

    a. Start SubtitleEdit.

    b. Drag the “(video name).srt” over the subtitles list (the video should be load automatically).

    c. For each entry, you will have different choices of translation.

    Keep the one you prefer and/or adjust it.

    g. Save the file.

    g. Close SubtitleEdit.

  20. You can finally delete all the .temp files if you want :grin:.

19 Likes

Thanks for taking the time to make a guide about this. I didn’t even know this was possible. I’ll give it a try when I get off work later.

1 Like

Wow, impressive ! :dizzy_face:

Thank you for sharing !

1 Like

Thanks for sharing, that’s some work you did there!
I’m curious how long it takes you to create the subtitles for a 30min video?

When it’s going well, about 2 to 3 times the length of a video. It depends on the number of subtitles in the scene.

For JAV, the first part often have a lot of talking (ex. 20 min of the 30 min), so there is more subtitles to run through all steps (manual and AI). For other parts, it can be less then twice the time because it often have only 5 minutes of talking (100-150 subtitles).

For example, for the first part of “SIVR-130”, a 38 minutes video, it took me:

  • About 40 minutes to make the “perfect-vad”.
  • About 15 minutes running Whisper, doing the google translation, etc.
  • About 25 minutes to run all the subtitles (~300) through ChatGPT.
  • In theory, 25 minutes to clean up the translated srt (i.e. choosing which translation to use). I say, in theory, because this is the video where most of the subtitles were desynchronize by one spot. I stopped. I’ll fix the problem and retry.

So, around 105 min for a 38 min video.

3 Likes

I updated my tool (v1.1.1) and the guide (v1.1). It’s should be more ‘robust’ now.

Problem is, every time I update the guide, I need to make another subtitle to make sure it works… :grin:

Here are scenes that got subtitles made using this guide: CRVR-286, EXVR-149, SIVR-115, SIVR-130.

4 Likes

Thanks for the guide !

Could you please tell me how to view the subtitiles in heresphere (oculus quest 2) ?

I don’t know about Quest 2. I’m using HereSphere on Windows.
From what I can understand, HereSphere doesn’t support subtitles yet on Quest 2.
It could be a good idea to ask if it’s in their plan to support it soon: HereSphere VR Video Player for Quest 2. It should be noted that HereSphere is a paid app though, but, IMO, it’s really worth the price.
I don’t know if other applications available on Quest 2 support subtitles.

1 Like

Awesome work @Zalunda! Really appreciate your efforts documenting this for others to follow. Subtitles definitely add a lot in Heresphere. Looking forward to taking a crack at this.

getting stuck says "FunscriptToolbox.exe is not recognized as an internal or external command, operable program or batch file

Did you modify this line in the batch file?
“C:\AwesomeTools\FunscriptToolbox” need to be changed to the path where you unzipped FunscriptToolbox.

Yea I changed it and I also tried to recreate the path you used to see if it would run and its the same message I reinstalled python 3.9.

Python is not needed to start FunscriptToolbox (i.e. it’s a .Net application). So, no need to ‘debug’ that part yet.

In a command prompt, can you try to call the tool directly?
It should be “C:\AwesomeTools\FunscriptToolbox\FunscriptToolbox.exe”.

If it doesn’t work, I’m not sure, maybe check if the “.Net framework” is installed.

yes i can call it. the only error says that “No verb selected”

ERROR(S):
No verb selected.

audiosync.createaudiosignature, as.cas Create audio signature for videos.

audiosync.createfunscript, as.cfs Take an audio signature and funscript and try to generate a funscript
synchronized to a different videos.

audiosync.verifyfunscript, as.vfs Verify a funscript.

subtitles.video2vadsrt, sub.vid2vad Extract an dummy .srt from video, using Voice Activity Detection

subtitles.srt2vadwav, sub.srt2vadwav Create temporary VADWAV for whipser

subtitles.srt2wavchunks, sub.srt2wavs Extract .srt, using Voice Activity Detection

subtitles.wavchunks2srt, sub.wavs2srt Merge .srt file transcribed by whisper to a single srt

subtitles.vadwav2srt, sub.vadwavsrt Created .srt from the transcribed result from whisper

subtitles.gpt2srt, sub.gpt2srt Merge srts files (from whisper-translated .srt of a extracted wav from
SubtitleEdit)

subtitles.srt2gpt, sub.srt2gpt Create ChatGPT prompts for translations

help Display more information on a specific command.

version Display version information.

This is the current bat file

@echo off

REM set path=C:\AwesomeTools\FunscriptToolbox;%path%

echo — subtitles.video2vadsrt —
“FunscriptToolbox.exe” ^
subtitles.video2vadsrt ^
–suffix “.temp.vad” ^
“*.mp4”

REM … Create file .temp.perfect-vad.srt with SubtitleEdit …
REM … Rename .temp.vad.wav to .temp.perfect-vad.wav

echo.
echo — subtitles.srt2wavchunks —
“FunscriptToolbox.exe” ^
subtitles.srt2wavchunks ^
“*.perfect-vad.srt”

echo.
echo — subtitles.srt2singlewav —
“FunscriptToolbox.exe” ^
subtitles.srt2vadwav ^
–suffix “.whisper” ^
“*.perfect-vad.srt”

REM … Use whisper to transcribe the whisper.wav files with the following setting:
REM Model: large-v2
REM Language: Japanese (or other)
REM URL: empty
REM Upload Files: *.whisper.wav (you can add multiple files)
REM Task: Transcribe
REM VAD: none (it’s important to set to None)

REM … Rename .srt files produced by whisper to “.whisper.srt”

echo.
echo — subtitles.wavchunks2srt —
“FunscriptToolbox.exe” ^
subtitles.wavchunks2srt ^
–suffix “.whisper.chunks” ^
–verbose ^
“*.perfect-vad.srt”

echo.
echo — subtitles.gpt2srt —
“FunscriptToolbox.exe” ^
subtitles.gpt2srt ^
“*.gptresults”

echo.
echo — subtitles.srt2gpt —
“FunscriptToolbox.exe” ^
subtitles.srt2gpt ^
“*whisper.jp.srt”

echo.
echo — subtitles.singlewav2srt —
“FunscriptToolbox.exe” ^
subtitles.vadwav2srt ^
–suffix “.jp” ^
–verbose ^
“*.whisper.wav”

pause

If the FunscriptToolbox can start but it returns an error while running. It might be python after all.

In a command line, if you write “python.exe”, does it starts?

Python 3.9.0 (tags/v3.9.0:9cf6752, Oct  5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

If not, you might have to add the folder of python in the path in the script like this:

 set path=C:\AwesomeTools\FunscriptToolbox;[path of python bin];%path%

In my case, it was not needed but it might have been an option in the python installation that I selected. Usually, it’s in C:\python39 or C:\Users.…username…\AppData\Local\Programs\Python\Python39

If it still doesn’t work, please create an issue on Issues · Zalunda/FunscriptToolbox · GitHub, and attach the log file found in “%appdata\FunscriptToolbox\Logs” (i.e.
“C:\Users.…username…\AppData\Roaming\FunscriptToolbox\Logs”).

ok will do thanks

@Zalunda I’m trying to perform “pip install pytorch torchaudio IPython” as the FunscriptToolboxSubtitles.bat directs, but I received an error.

Exception: You tried to install “pytorch”. The package named for PyTorch is “torch”

So I installed “pip install torch torchaudio IPython” instead. That installation was successful. But I’m still getting an exception while running the .bat file:

Make sure that 'Python' is installed (https://www.python.org/downloads/).
If python is installed, make sure to run the following command in a command prompt: pip install pytorch torchaudio IPython

Script output:
Using cache found in C:\Users\r/.cache\torch\hub\snakers4_silero-vad_master
C:\Users\r\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
warnings.warn(“No audio backend is available.”)
Traceback (most recent call last):
File “F:\S1TBBackup\FunscriptToolbox\PythonSource\funscripttoolbox-extract-vad.py”, line 26, in
wav = read_audio(inputwav, sampling_rate=sampling_rate)
File “C:\Users\r/.cache\torch\hub\snakers4_silero-vad_master\utils_vad.py”, line 122, in read_audio
wav, sr = torchaudio.load(path)
File “C:\Users\r\AppData\Local\Programs\Python\Python39\lib\site-packages\torchaudio\backend\no_backend.py”, line 16, in load
raise RuntimeError(“No audio I/O backend is available.”)
RuntimeError: No audio I/O backend is available.

Should I create an issue as well?

I reinstalled python 3.9

When you reinstalled, did you check the box on the bottom of the installer labeled “add Python to PATH?” That was my issue. I also followed this video for pip install help.

I also noticed another issue:

This is the current bat file
REM set path=C:\AwesomeTools\FunscriptToolbox;%path%

Don’t you need to remove the “REM” here? It’s excluding that line of code.

Thx for a great tutorial. We are adding subtitles into the SLR app with upcoming updates.

Let me know if any suggestions

I’m not sure I would say it’s a ‘great’ tutorial… but it’s a start… :wink:

Good to know about SLR app, it will allow more people to be able to use subtitles. Right now, I think only people that use HereSphere on PC can see subtitles in VR.

Nice catch. Yes, you are right, REM needs to be removed. That would explain @lebeastmode problem. I’ll remove it in the ‘template’ batch file as well.

And for your “No audio backend is available” problem, yes, please open an issue. It’s easier to follow in a dedicated thread. You might need another pip install (i.e. chatgpt tell me that “torchaudio supports several different audio backends, including SoX, ffmpeg, and soundfile”). I installed a bunch of python modules before for other applications, so it was difficult for me to know what was the ‘bare minimum’ for silerio-vad to work. Once we figure out all the needed modules, I’ll update the message in the application.

Just to be clear to everyone, this is still a really ‘messy’ guide. I’m still hoping for a new version of OpenAI Whisper to be able to make it more ‘user friendly’.

2 Likes