How to create subtitles for a scene (2024)

TheFauxPro · May 2, 2024, 6:22pm

Is there a way to quickly choose one source of translation so I don’t have to delete the others in each line in SubtitleEdit? For example, I would only want translations that are sourced from [claude-3-haiku]:

Original:
“[singlevad] 気持ちいい
[google] Feels good
[mergedvad] 気持ちいい [2/3, 85%]
[claude-3-haiku] Feels good.
[google] Feels good
[full] 気持ちいい
[google] Feels good”

Desired:
“Feels good.”

Zalunda · May 2, 2024, 6:50pm

This is controlled by the WIPSrt output in the Config.json file:

    ...
    {
      "$type": "WIPSrt",
      "Description": "WIPSrt: .wip.srt",
      "Enabled": true,
      "FileSuffix": ".wip.srt",
      "TranscriptionsOrder": [
        "singlevad",
        "mergedvad",
        "*"
      ],
      "TranslationsOrder": [
        "claude-3-opus",
        "claude-3-sonnet",
        "claude-3-haiku-200k",
        "claude-3-haiku",
        "chatgpt-4",
        "mistral-large",
        "local-mistral-7b",
        "deepl",
        "deepl-files",
        "google",
        "*"
      ],
      ...

You could simplify it by removing all the transcriptionId & translationId (and the last “*”) that you don’t want to see (and then, rerun the application):

    {
      "$type": "WIPSrt",
      "Description": "WIPSrt: .wip.srt",
      "Enabled": true,
      "FileSuffix": ".wip.srt",
      "TranscriptionsOrder": [
        "mergedvad"
      ],
      "TranslationsOrder": [
        "claude-3-haiku"
      ],

In that case, you would get only:

[mergedvad] 気持ちいい [2/3, 85%]
    [claude-3-haiku] Feels good.

Right now, it’s not possible to reduce it even more (when using ‘perfect-vad/mergedvad/WIPSrt’ process, which gives the best result IMO).

At some point, I could add an option to detect if there is a single ‘choice’ for a subtitle and use it as-is (i.e. Feels good) and concatenate strings (i.e. Oh, I really like you.) if it has something like this (i.e. multiple transcriptions for a single ‘perfect-vad’ timing).:

[mergedvad,1/2] 気持ちいい
    [claude-3-haiku] Oh,
[mergedvad,2/2] 気持ちいい
    [claude-3-haiku] I really like you.

But I wouldn’t be able to simplify when a transcription overlaps multiple timings, or you would end up with 3 subtitles with the same text:

(subtitle #1)
    [mergedvad] 気持ちいい [1/3, 30%]
        [claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user. 
(subtitle #2)
    [mergedvad] 気持ちいい [2/3, 36%]
        [claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user. 
(subtitle #3)
    [mergedvad] 気持ちいい [3/3, 44%]
        [claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user.

shawn_keenan · May 3, 2024, 7:24am

Have you tried the adobe premier pro method described in this post ( Transcribe, translate, and subtitles to most languages! (Adobe Premiere) )

Do you know how it compares to your method?

Thanks

shawn_keenan · May 3, 2024, 8:38am

Reading more on this, looks like the whisper model method is more effective
Thanks for the tutotial @Zalunda

slap · May 3, 2024, 9:33am

gonna sound of topic but will this work for “regular” non porn stuff or help with translations I’ve got lot of media that isnt in a language I can speak with no english subtitles

Zalunda · May 3, 2024, 1:14pm

Yes, it could be used for transcribing/translating all types of content but the initial config that’s created when installing the application is ‘optimized’ for VR porn. The SystemPrompt/UserPrompt in the config tells the AI to play the role of a “translator specialized in adult film subtitles” with rules specific to that task. You would need to change the config to fit another domain (while keeping the rule about the format of the JSON).

yellownotepad44 · May 28, 2024, 11:33pm

@Zalunda My understanding is that the Private Poe bot is used to create more accurate subtitles that have additional context, is that correct? In the directions, it mentions a step where we go through each generated .txt file and paste the full prompt to the Poe bot to get back a translation.

Is there a step that I’m missing or was I doing it correctly? I’m pretty lazy so I was considering skipping this step lol.

Process so far is pretty straightforward. Going off of what @TheFauxPro mentioned, I used ChatGPT to create python script to filter only specific translations I wanted. I can try posting the program if anyone’s interested (No clue how to do this, probably just posting the code?).

Original

Reduced the translations to only [mergedvad] and [claude-3-haiku]

Zalunda · May 29, 2024, 12:48am

Creating a “private poe bot” is used mainly to reduce the size of the requests we must give to the chatbot. So, instead of providing the full ‘instructions’ in each of the prompts, we set the instructions once in the private bot “system instruction”, then we can only give the JSON when “talking” to the chatbot. It’s also useful because there is a limit on the size of a request.

Also, this is not related to the additional context. It’s possible to give more context using a private bot or not. BTW, giving additional context is done on the ‘refining subtitle’ step (i.e. creating .perfect-vad.srt file). That means taking a little bit more time and adding stuff like “{Context:…}” or “{Talker:…}” in the text of subtitles:

Once added, the additional context will then be included in the prompts to help the AI give better translations. Of course, even though it might give better translations, it’s optional.

Zalunda · May 30, 2024, 5:02pm

Added a 3rd tips in OP:

Tips #3 - Fixing WaveForm visualisation in SubtitleEdit (for some video)

When working with audio in SubtitleEdit, you may encounter a problem where the waveform appears noisy, making it difficult to identify the start and end of voice sections, like this:

Upon further inspection in Audacity, I noticed that the audio had “long amplitude signal” throughout the audio:

This low-frequency noise is causing the poor visualization in SubtitleEdit. To fix this, we need to filter out the low-frequency sound in the audio. It should be noted that it probably has no effect, good or bad, for whisper transcription since, I assume, Whisper already ignores those low-frequency signals (until they add ‘Whale’ to the supported list of languages).

Fixing the Issue for All Future Videos

Open --FSTB-SubtitleGeneratorConfig.json config file.
Locate the “AudioExtractor” section.
Add -af highpass=f=1000 to the “DefaultFfmpegWavParameters” to filter out frequencies below 1000 Hz. You can also add loudnorm=I=-16:TP=-1 to normalize the audio (unrelated to the problem):
```
"AudioExtractor": {   
    "DefaultFfmpegWavParameters": "-af \"highpass=f=1000,loudnorm=I=-16:TP=-1\""   
},   
```
Save the config file.

Applying the Fix

Open the .perfect-vad.srt file in SubtitleEdit, which should automatically load the associated video and audio from the .mp4 file.
Drag the \<videoname\>-full-all.wav file from the \<videoname\>_Backup folder over the waveform part of SubtitleEdit. It will only replace the waveform/audio. The video can still be played.

You should now see an improved waveform, like this:

This waveform is much easier to work with, as the voice sections are clearly distinguishable from the background noise.

Ricster89 · June 30, 2024, 12:26pm

would any of you who are learning to do this be willing to take requests? like we do here for scripts? lots of amazing jav scenes now getting AI scripts on SLR, would be next level to get subtitles for our faves?

Zalunda · October 10, 2024, 2:22am

FYI, I made a new version:
FunscriptToolbox 1.3.4 release

New Features:

Added an attribute PreviousTranslationId to AIMessageHandlerJson which allows refining another translation (ex. “claude-3.5-sonnet” and then “claude-3.5-sonnet-refined” in Example.json file).
Added a new output type “Wav” that can create a transformed wav file (ex. with option “highpass=f=1000”) that’s not used when doing the transcription. Transformed wav file can be used in SubtitleEdit, instead of the original audio.
Added a new TranscriberToolExternalSrt which can be used in 2 ways:
- Import an existing subtitles file as-is (i.e. when trying to improve an old subtitle file).
- Use an external service like CapCut to transcribe, and then import the transcription result as-is.
Removed “Whisper” in the transcriber name (“WhisperSingleVADAudio”, “WhisperFullAudio” and “WhisperMergedVADAudio” => “SingleVADAudio”, “FullAudio” and “MergedVADAudio”). All those transcribers can now also be used with an external tool.
Added a new “AutoSelectSingleChoice” property to WiPSrt. False by default.
Updated “–FSTB-SubtitleGeneratorConfigExample-1.2.json” with the new options.

Bug fixes:

Fixed bug when subtitles file contains only a number on a text line (ex. “3”).
Fixed bug when “WIPSrt\IncludeExtraTranscriptions == false”.
Fixed/Changed timestamp format in AIRequest files (1940.1 => 32:20.104). With the old format, it was possible to have duplicate entries and, also, it was harder to understand if you wanted to search for the location in the video.

Note: If you have already had a configuration file, you’ll need to replace those string in your .json file:

“WhisperFullAudio” => “FullAudio”
“WhisperMergedVADAudio” => “MergedVADAudio”
“WhisperSingleVADAudio” => “SingleVADAudio”

If not, I suggest starting with the new --FSTB-SubtitleGeneratorConfigExample-1.2.json that is created when running --FSTB-Installation.bat.

Stvmacd · October 16, 2024, 10:10pm

Thanks so much for putting this together. It’s awesome. Apologies if I missed this somewhere or if it’s overtly obvious, but what’s the best way to add in subtitles it missed manually? Best fix I’ve found is playing the audio into my internal mic with Google Translate but it’s not great at picking up whispers.

Zalunda · October 17, 2024, 2:47am

If you use SubtitleEdit and have configured Whisper like I wrote, you can add location by hand and then force a whisper transcription for that subtitles:

Stvmacd · October 17, 2024, 5:36am

Brilliant, thanks. I knew there was something but there’s so many options in here.

12341 · November 17, 2024, 10:20am

The grok beta in Poe does a good job with Japanese translation. It doesn’t refuse requests and can correct homophones in Japanese to the appropriate characters.

Zalunda · November 19, 2024, 4:04pm

I just did a test with Grok. It’s not bad, but I still prefer Sonnet-3.5+. Grok is not as good at understanding the context of the scene.

With a private bot based on Sonnet-3.5+ (as explained in the OP), I rarely get refused. And if it does, it’s a ‘soft refuse’. I just have to say something like, “Aren’t you an adult translator?”, and it will reply, “Yes, you are right. I’m an adult translator…” and translate it without complaining. To reduce even more the number of times that it does that, I also add this in each prompt: Remember that you are an adult movie translator..

12341 · December 3, 2024, 1:22pm

Is there any tool more accurate than Whisper? I tried V3 Turbo, but the results weren’t better than V2. Now I’ve tried improving the audio quality with ffmpeg, and the transcription results are pretty good.

bobo9339 · December 17, 2024, 10:40pm

I’m having an issue on the Transcribing ‘singlevad’ step, where it seems to endlessly be looping at a “detecting language” step. Any idea what’s going on here?

Zalunda · December 17, 2024, 10:47pm

Its probably a section where it can’t detect any language. Usually it default to english in that case but there might be case where it loop.

You should try specifying the source language in the batch file (–FSTB-CreateSubtitles.1.6.bat):
--sourcelangage Japanese

bobo9339 · December 18, 2024, 3:11am

Thanks! That seems to fix part of the issue, but it still appears to be looping. It’s also flashing text like “50% | 1/2 | 00:00 < 00:00 | 0.58 audio” or different variations of that, while the main text “[PurfviewWhisper][0/1969]” remains the same. It doesn’t seem to be making progress because it always stays at [0/1969]. I appreciate the help!