Thank you very much for the tutorial.
I have used it with several videos to translate it into spanish and it worked perfectly,
i don’t speak Japanese, so I’m happy with the result.
I will share the subs I make in case they are useful to anyone.
Great! Happy to hear it.
If you do the full process, including refining subtitles timings, your Spanish subtitles would be useful for others since they could use your file as a ‘perfect-vad’ and start from there to create subtitles in another language.
I just checked the code. The number you see (68) is the line number in the file, not the “subtitle number”. Can you show what’s around line 68? Or link the file, renamed as .txt, if you don’t mind.
nanami.txt (11.6 KB)
EDIT
I added a period to the end of 1 and it fixed it. The actress was counting down.
Ok, dawn, I thought I fixed that. Guess I didn’t yet.
Please change “1” with “One”, or something.
If a line of text is only a number, it takes it as a subtitle number and tries to read the next line as the timing.
Is there a way to quickly choose one source of translation so I don’t have to delete the others in each line in SubtitleEdit? For example, I would only want translations that are sourced from [claude-3-haiku]:
Original:
“[singlevad] 気持ちいい
[google] Feels good
[mergedvad] 気持ちいい [2/3, 85%]
[claude-3-haiku] Feels good.
[google] Feels good
[full] 気持ちいい
[google] Feels good”
Desired:
“Feels good.”
This is controlled by the WIPSrt output in the Config.json file:
...
{
"$type": "WIPSrt",
"Description": "WIPSrt: .wip.srt",
"Enabled": true,
"FileSuffix": ".wip.srt",
"TranscriptionsOrder": [
"singlevad",
"mergedvad",
"*"
],
"TranslationsOrder": [
"claude-3-opus",
"claude-3-sonnet",
"claude-3-haiku-200k",
"claude-3-haiku",
"chatgpt-4",
"mistral-large",
"local-mistral-7b",
"deepl",
"deepl-files",
"google",
"*"
],
...
You could simplify it by removing all the transcriptionId & translationId (and the last “*”) that you don’t want to see (and then, rerun the application):
{
"$type": "WIPSrt",
"Description": "WIPSrt: .wip.srt",
"Enabled": true,
"FileSuffix": ".wip.srt",
"TranscriptionsOrder": [
"mergedvad"
],
"TranslationsOrder": [
"claude-3-haiku"
],
In that case, you would get only:
[mergedvad] 気持ちいい [2/3, 85%]
[claude-3-haiku] Feels good.
Right now, it’s not possible to reduce it even more (when using ‘perfect-vad/mergedvad/WIPSrt’ process, which gives the best result IMO).
At some point, I could add an option to detect if there is a single ‘choice’ for a subtitle and use it as-is (i.e. Feels good
) and concatenate strings (i.e. Oh, I really like you.
) if it has something like this (i.e. multiple transcriptions for a single ‘perfect-vad’ timing).:
[mergedvad,1/2] 気持ちいい
[claude-3-haiku] Oh,
[mergedvad,2/2] 気持ちいい
[claude-3-haiku] I really like you.
But I wouldn’t be able to simplify when a transcription overlaps multiple timings, or you would end up with 3 subtitles with the same text:
(subtitle #1)
[mergedvad] 気持ちいい [1/3, 30%]
[claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user.
(subtitle #2)
[mergedvad] 気持ちいい [2/3, 36%]
[claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user.
(subtitle #3)
[mergedvad] 気持ちいい [3/3, 44%]
[claude-3-haiku] This is a really long sentence, that overlap multiple timing and that should be split the user.
Have you tried the adobe premier pro method described in this post ( Transcribe, translate, and subtitles to most languages! (Adobe Premiere) )
Do you know how it compares to your method?
Thanks
Reading more on this, looks like the whisper model method is more effective
Thanks for the tutotial @Zalunda
gonna sound of topic but will this work for “regular” non porn stuff or help with translations I’ve got lot of media that isnt in a language I can speak with no english subtitles
Yes, it could be used for transcribing/translating all types of content but the initial config that’s created when installing the application is ‘optimized’ for VR porn. The SystemPrompt/UserPrompt in the config tells the AI to play the role of a “translator specialized in adult film subtitles” with rules specific to that task. You would need to change the config to fit another domain (while keeping the rule about the format of the JSON).
@Zalunda My understanding is that the Private Poe bot is used to create more accurate subtitles that have additional context, is that correct? In the directions, it mentions a step where we go through each generated .txt file and paste the full prompt to the Poe bot to get back a translation.
Is there a step that I’m missing or was I doing it correctly? I’m pretty lazy so I was considering skipping this step lol.
Process so far is pretty straightforward. Going off of what @TheFauxPro mentioned, I used ChatGPT to create python script to filter only specific translations I wanted. I can try posting the program if anyone’s interested (No clue how to do this, probably just posting the code?).
Original
Reduced the translations to only [mergedvad] and [claude-3-haiku]
Creating a “private poe bot” is used mainly to reduce the size of the requests we must give to the chatbot. So, instead of providing the full ‘instructions’ in each of the prompts, we set the instructions once in the private bot “system instruction”, then we can only give the JSON when “talking” to the chatbot. It’s also useful because there is a limit on the size of a request.
Also, this is not related to the additional context. It’s possible to give more context using a private bot or not. BTW, giving additional context is done on the ‘refining subtitle’ step (i.e. creating .perfect-vad.srt
file). That means taking a little bit more time and adding stuff like “{Context:…}” or “{Talker:…}” in the text of subtitles:
Once added, the additional context will then be included in the prompts to help the AI give better translations. Of course, even though it might give better translations, it’s optional.
Added a 3rd tips in OP:
Tips #3 - Fixing WaveForm visualisation in SubtitleEdit (for some video)
When working with audio in SubtitleEdit, you may encounter a problem where the waveform appears noisy, making it difficult to identify the start and end of voice sections, like this:
Upon further inspection in Audacity, I noticed that the audio had “long amplitude signal” throughout the audio:
This low-frequency noise is causing the poor visualization in SubtitleEdit. To fix this, we need to filter out the low-frequency sound in the audio. It should be noted that it probably has no effect, good or bad, for whisper transcription since, I assume, Whisper already ignores those low-frequency signals (until they add ‘Whale’ to the supported list of languages).
Fixing the Issue for All Future Videos
- Open
--FSTB-SubtitleGeneratorConfig.json
config file. - Locate the “AudioExtractor” section.
- Add
-af highpass=f=1000
to the “DefaultFfmpegWavParameters” to filter out frequencies below 1000 Hz. You can also addloudnorm=I=-16:TP=-1
to normalize the audio (unrelated to the problem):"AudioExtractor": { "DefaultFfmpegWavParameters": "-af \"highpass=f=1000,loudnorm=I=-16:TP=-1\"" },
- Save the config file.
Applying the Fix
- Open the
.perfect-vad.srt
file in SubtitleEdit, which should automatically load the associated video and audio from the.mp4
file. - Drag the
\<videoname\>-full-all.wav
file from the\<videoname\>_Backup
folder over the waveform part of SubtitleEdit. It will only replace the waveform/audio. The video can still be played.
You should now see an improved waveform, like this:
This waveform is much easier to work with, as the voice sections are clearly distinguishable from the background noise.
would any of you who are learning to do this be willing to take requests? like we do here for scripts? lots of amazing jav scenes now getting AI scripts on SLR, would be next level to get subtitles for our faves?
FYI, I made a new version:
FunscriptToolbox 1.3.4 release
New Features:
- Added an attribute PreviousTranslationId to AIMessageHandlerJson which allows refining another translation (ex. “claude-3.5-sonnet” and then “claude-3.5-sonnet-refined” in Example.json file).
- Added a new output type “Wav” that can create a transformed wav file (ex. with option “highpass=f=1000”) that’s not used when doing the transcription. Transformed wav file can be used in SubtitleEdit, instead of the original audio.
- Added a new TranscriberToolExternalSrt which can be used in 2 ways:
- Import an existing subtitles file as-is (i.e. when trying to improve an old subtitle file).
- Use an external service like CapCut to transcribe, and then import the transcription result as-is.
- Removed “Whisper” in the transcriber name (“WhisperSingleVADAudio”, “WhisperFullAudio” and “WhisperMergedVADAudio” => “SingleVADAudio”, “FullAudio” and “MergedVADAudio”). All those transcribers can now also be used with an external tool.
- Added a new “AutoSelectSingleChoice” property to WiPSrt. False by default.
- Updated “–FSTB-SubtitleGeneratorConfigExample-1.2.json” with the new options.
Bug fixes:
- Fixed bug when subtitles file contains only a number on a text line (ex. “3”).
- Fixed bug when “WIPSrt\IncludeExtraTranscriptions == false”.
- Fixed/Changed timestamp format in AIRequest files (1940.1 => 32:20.104). With the old format, it was possible to have duplicate entries and, also, it was harder to understand if you wanted to search for the location in the video.
Note: If you have already had a configuration file, you’ll need to replace those string in your .json file:
- “WhisperFullAudio” => “FullAudio”
- “WhisperMergedVADAudio” => “MergedVADAudio”
- “WhisperSingleVADAudio” => “SingleVADAudio”
If not, I suggest starting with the new --FSTB-SubtitleGeneratorConfigExample-1.2.json
that is created when running --FSTB-Installation.bat
.
Thanks so much for putting this together. It’s awesome. Apologies if I missed this somewhere or if it’s overtly obvious, but what’s the best way to add in subtitles it missed manually? Best fix I’ve found is playing the audio into my internal mic with Google Translate but it’s not great at picking up whispers.
If you use SubtitleEdit and have configured Whisper like I wrote, you can add location by hand and then force a whisper transcription for that subtitles:
Brilliant, thanks. I knew there was something but there’s so many options in here.