This is a refresh to my previous guide on creating subtitles for scenes (mainly JAV).
This version is a MAJOR improvement over the last iteration of the tool.
Here are the new features:
- Optimized for API calling.
- Support AI transcription that can take into account the context and that can transcribe ‘small segments’ of audio well (which Whisper couldn’t do). That means that the tool doesn’t have to deal with transcription that overlaps multiple “.manual-input timings”. The whole process can work one-to-one with the “.manual-input timings”.
- AI Image Transcriber that do the OCR for the text that appears on screen (i.e. add a
{GrabOnScreenText}tag in the .manual-input file). - AI Image/Audio Transcriber that can create Translation Analysis based on the voice delivery and what’s happening on screen (even between VOD detection). It can also identify characters if you provide something like
{VisualTraining:Hana on the left, Ena on the right}for one of the subtitles in the .manual-input. It will generates something like this for every subtitles in the file (Ongoing metadata is generated only when something changes):
{TranslationAnalysis-Audio:Gap Summary: She moved from the center of the room to the sofa, kneeling on it to clear away the clutter.
She gestures towards a spot on the sofa she is clearing. Her tone is efficient and slightly bossy, directing him to the space she decided on rather than asking him where he wants to sit. 'Kocchi' (Here/This way) serves as a command disguised as guidance.
Option 1 (Directive): "Ah, right. Over here."
Option 2 (Casual Command): "Okay, yeah. This way."}
- A coded subtitled conformer step (i.e. it’s not AI) that can merge consecutive subtitles and add new lines to adhere to subtitles standards (max 2 lines of 50 characters, etc).
- Can analyse multi-part scenes as if it was one long video (i.e., need to create a small ‘.vseq’ file with the filenames).
- Added a UI to identify speakers in a multiple-speaker scene.
Note: The tool work best if a ‘.mp4.FAST.mp4’ is created in the same folder using the script below.Script to create optimized video for multi-speaker tool
echo off REM This script re-encodes a video REM It takes only the left half of the video (for side-by-side sources) and scales it down. REM and sets every frame as a keyframe (-g 1) for "looping" playback. FOR %%A IN (%*) DO ( "ffmpeg.exe" ^ -y ^ -i "%%~dpA\%%~nxA" ^ -filter_complex "crop=iw/2:ih:0:0,scale=-1:512" ^ -g 1 ^ -c:a copy ^ "%%~dpA\%%~nA.mp4.FAST.mp4" ) pause
Warning:
- I’m not responsible for what you do with the tool…
The new process is explained in the Wiki from the tool HERE.
The application binary can be found here: github releases.


