Hey @Zalunda - first - thanks for the guide!
Hey, potentially stupid question incoming:
Do you need an API key for both Gemini and Poe? Reason Iâm asking is b/c Iâm running into a â429 - too many requestsâ error on step 3 (part 7, when I run FSTB-CreateSubtitles.bat after making edits), which I think is coming from Gemini since I have free account. While on Poe, I have points. Are they not interchangeable in that regard?
Again, very sorry if this is a dumb question. Very new to ai stuff and just wanted to translate some scenes I like.
Create a new version 2.0.11.
Main changes are:
- SubtitleVerb: Merged TranscriberAudioSingleVADAI and TranscriberImageAI into a âmultimodalâ TranscriberAI that can receive audio and/or image. This allows us to add the image to singevad-ai-refinedâs requests and get a much better TranslationAnalysis that contains targeted image context.
- SubtitleVerb: Added the possibility to âinjectâ context-only binaries (mainly images) in the request. The idea is to send images taken from the âgapâ between the VOD so that the AI has an idea of what happened since the last time someone spoke. For example,
[Gap Summary: She shifted her hand to playfully touch his chest.] Trying to rouse herself, she initiates a lazy conversation about the day's plans. Her hand stroking his chest (visible in POV) adds a tactile layer to the casual question. - SubtitleVerb: Fixed bug where my âjson fixingâ code could add a â,â inside the value, like a translated text (ex.
"Say \"Please\""was replaced by"Say \",Please\") because I didnât ignore quote inside a string value. - SubtitleVerb: Changed the response handling if the StartTime received from the AI didnât match any of the nodeâs StartTime. Now, Iâll match the closest node if itâs less then 50 ms away. If not, the same exception as before is raised.
- SubtitleVerb: Adjustement to use gemini-3-pro-preview instead of 2.5.
- SubtitleVerb: Fixed so that an empty JSON array (i.e.
[]) is considered a valid JSON. It was not before. - SubtitleVerb: Re-added an explicit rule to break subtitles on characters like
ăďźďźin âfull-aiâ step. - SubtitleVerb: Allowed more formats when parsing TimeSpan (i.e., accept 1.200, 1.20, or 1.2). Before, it only accepted 3 numbers after the dot.
- SubtitleVerb: Default configuration for ManualHQWorkflow is using Gemini 3.0 (preview) for all the steps, except full-ai, which still uses Gemini 2.5 because 3.0 is really bad with the timings.
Overall, the main change (for manual HQ, at least) is that all steps now use Gemini. The problem with Poe is that it doesnât support audio data. That mean that we always needs Google Gemini for that step anyway.
I still have trouble knowing how much it costs to do an âaverageâ scene (~500 subtitles), or what is the quota when only using the a free account because I have created a paid account in august and it gave me 400$ promotional to be used until november. Iâm still in that state, but Iâll know soon.
Itâs not a dumb question and, yes, â429 too many requestâ comes from Gemini because you reached you limit for the day (rate limit documentation => Requests per day (RPD) quotas reset at midnight Pacific time.).
From what I understand on the page:
- Gemini 2.5 Pro: 50 requests per day.
- Gemini 3.0 Pro preview (not listed) but might be in same âgroupâ as 2.5 pro.
If true, itâs not too bad. For a 500 subtitles scene, 90 minutes duration, it would be:
- full-ai transcription: 90 minutes / 5 minutes per request => 18 requests.
- singlevad-ai: 500 / ~100 subtitles per request => 6 requests.
- singlevad-ai-refined: 500 / ~30 subtitles per request => 17 requests (Iâm using 15 to get better result => 34 requests)
- translated-texts_maverick: 500 / ~100 subtitles per request => 6 requests.
- finalized_maverick: 500 / ~200 subtitles per request => 3 requests.
Add a few for empty response / partial response => 50 to 60. It would take 2 days of free requests to do.
Note: Of that list, only translated-texts_maverick and finalized_maverick can be redirected to Poe.
Thanks for the clarification! Do I need to change anything to redirect translated-texts_maverick and finalized_maverick to Poe? Or is that handled automatically?
Iâm certainly learning a lot just trying to translate these vids haha
Itâs not handled automatically but I might do something in the future for stuff like that. Iâm thinking of doing âAIEnginePriorityâ object where you could list a few AIEngine it would use them in order (ex. âAIEngine using free Gemini API keyâ, âAIEngine using Poe API keyâ, âAIEngine using paid Gemini API keyâ).
Right now, you would need to add this in ââFSTB-SubtitleGenerator.override.configâ at the root of structure, in the SharedObjects array (anywere in the array):
{
"$id": "AIEngineGeminiPro3.0ViaPoe",
"$type": "AIEngineAPI",
"BaseAddress": "https://api.poe.com/v1",
"Model": "gemini-3-pro",
"ValidateModelNameInResponse": false,
"APIKeyName": "APIKeyPoe",
"TimeOut": "00:15:00",
"RequestBodyExtension": {
"max_tokens": 65536,
"extra_body": {
"google": {
"thinking_config": {
"include_thoughts": true
}
}
}
},
"UseStreaming": true,
"PauseBeforeSendingRequest": false,
"PauseBeforeSavingResponse": false
},
And then, in the same file, modify the engine attached to translated-texts_maverick and finalized_texts:
{
"Id": "translated-texts_maverick",
"Engine": {
"$ref": "AIEngineGeminiPro3.0ViaPoe"
}
},
{
"Id": "finalized_maverick",
"Engine": {
"$ref": "AIEngineGeminiPro3.0ViaPoe"
}
},
Note: I havenât done a lot of test with that config but, in theory, it should be identical that calling Gemini directly.
Got it, Iâll give it a shot. Thanks so much for the help!
Ok, I did more testing. I donât know when but Poe improved their application and API. They now list âInput (audio)â for Gemini-3 and it works (for gemini 2.5 too):
I see that videos is not a lot more expensive then audio (i.e. only twice more points per seconds). I might try adding doing test with that in the future.
And they now support some settings in the API:
Before, I had to find way to add the settings in the prompt (ex --reasoning_effort medium for GPT-5).
When using Gemini 3.0 via Poe, I was getting bad responses (hallucination, etc) when calling the API, and I wasnât seeing the âthinkingâ of the AI, but it got better when I added a âthinking budgetâ to the requests. It seems that, by default, thinking is disabled or extremely low.
{
"$id": "AIEngineGeminiPro3.0ViaPoe",
"$type": "AIEngineAPI",
"BaseAddress": "https://api.poe.com/v1",
"Model": "gemini-3-pro",
"ValidateModelNameInResponse": false,
"APIKeyName": "APIKeyPoe",
"TimeOut": "00:15:00",
"RequestBodyExtension": {
"max_tokens": 65536,
"extra_body": {
"thinking_budget": 8192
}
},
"UseStreaming": true,
"PauseBeforeSendingRequest": false,
"PauseBeforeSavingResponse": false
},
Anyway, Iâll work on a new version with those settings, and with my âpriorityâ AIEngine idea.
Create a new version 2.0.12.
Changes are:
- SubtitleVerb: Fixed the format for audio data when sending requests to Poe (i.e. itâs not the same format as OpenAI protocol).
- SubtitleVerb: Added a AIEngineCollection class that allows setting a list of AIEngines that get used in order (until the first has used all it quotas). Note: I didnât do a lot of âreal worldâ tests on this.
- SubtitleVerb: Improved prompts used in HQManualWorkflow.
- SubtitleVerb: Create AIEngineCollection for GTP5, Gemini2.5 and Gemini3.0 in the config.
With this version, youâll be able to use Gemini on Poe. Only difference is that I had to âhardcodeâ (well, in the config) the number of thinking token. When calling Google API, you can set it at -1 and it decide how much token to use automatically. I havenât found a way to do that with Poe. If I set the thinking_budget to -1, I get a Bad Request response.
Created a new version 2.0.13:
Changes are:
- Created vendor-specific AIEngineAPI (Google, OpenAI, Poe). This allows the use of specific vendor features like âmediaResolution = âMEDIA_RESOLUTION_LOWââ on Google (saves a lot of tokens). For Google, it also gives more detailed information on the number of tokens used by modality (text, audio, or images).
- Create a new TranslatiorSubtitleConformer âin codeâ, instead of using AI. Itâs faster, costs nothing, and, I think, itâs doing a better job than the AI.
- Improved the CostReport (â.cost.txtâ) a lot (token used by modality and âprompt sectionâ, estimated $ based on the engine, etc).
- Moved all the AIEngine definitions in the .override file to make it easier to adjust for the user, if needed.
- Removed the âarbitrerâ steps as an option (translated-texts_naturalist, arbitrer-choice, arbitrer-final-choice). That path was far inferior to the other process, and I was tired of updating it.
- A lot of small changes to try to find a good balance between subtitle quality vs. the cost to create.
- Changed the config to use Gemini 2.5 by default because I found out that a âfree accountâ cannot use 3.0 via the API. If you have a Poe or paid Google Account, itâs possible to change it back to 3.0 in the override file. Gemini 3.0 seems to be noticeably better than 2.5 only when analyzing images (i.e., singlevad-ai-refined).
- Minor: Added the file index when converting time from global time to the time specific to a file (i.e. 1:23:14.222 => â2, 0:14:16.252â when 2 at the start is the second file).
Using a paid account, it still costs about 5-8 US$ for a 90-minute, ~500-subtitle scenes. It can get more expensive when you are dealing with 1500+ subtitle scenes.
That being said, if you are patient and donât mind doing the subtitles over a few days, the 50 requests per day on the free Google account is all you need⌠If you plan on doing a scene in the future then, whenever you have âfree requestsâ available, I suggest creating the .vseq file and running the application so that the âfull-aiâ step can be completed already (i.e. ~18 requests for a 90 minutes scenes). For smaller scenes, 50 requests should be enough to complete the rest of the process when you are ready to take the time to correct the timings of the subtitles, add some context, etc.
Iâm now able to do a âsingle actressâ 90-minute scene in less than 90 minutes of my time, if the API is not too finicky (i.e., returns badly formatted JSON, or is too prude and returns âprohibatedâ error). I sometimes have to switch to Grok-4 to complete some sections of a scene.
Note: My free ârequests per dayâ didnât seem to reset today. I will stop speculating on how the Google Free Tier works since itâs not well documented by design, it seems.
First of all, fantastic tool, incredible progress on the ease to generate subtitles.
I used the automatic flow and it generated pretty good results I think without needing much editing afterwards.
Took me a couple hours but most of it was understanding the process, I have no doubt it would be much faster now.
A few small details regarding the documentation that could maybe be improved:
- when I first used it, it asked for an OpenAI key to be able to do the visual analysis, but this isnât mentioned in the doc
- when getting a cost error, itâs not immediately clear which backend is causing the issue, it generates a cost file but that was not clear to me how to know where the issue come from from it (it was mostly about Poe when the issue was with Google)
- how to update the configuration to select the backend could be a bit more documented (or maybe I missed something?), it took me a while to understand what to change. I setup my Google account so I have plenty of free credits there so I wanted to use that instead of Poe, but it took a bit to realize that I needed to update the Workers section to do that.
Apologies if this is obvious or if I just missed this. Where do I put the vseq file for multi-part scenes? Best regards!
Created a new version 2.0.14:
Changes are:
- Implemented BatchMode for Google API (which costs half the price).
- Changed full-ai engine to AIEngineGeminiPro2.5OnGoogle because itâs the only combination that returns valid timings. Gemini-3.0 and Poe seem to âshortenâ the audio before giving it to the AI.
- Added code to fix âunescaped quoteâ inside of a string in an AI JSON response (ex.
"VoiceText": "Say "Please" now"=>"VoiceText": "Say \"Please\" now"). - Added a TranscriptionToIgnorePatterns array to the full-ai step. If the text returned by the AI for a node fits exactly one of the patterns, it will not be included in the transcription (ex. aah, hmm).
- Added an Enabled property to AIEngine. Used only when the engine is part of a collection. For the application, when an engine needs an APIKey, and itâs not set in â.private.configâ, the engine is also considered âdisabledâ.
- Added a SupportAudio and SupportImage properties on AIEngine. If an engine does not support Audio/Image, that information will be stripped from the request.
Yes, the doc might need some updating now that the process is more stable. Itâs been a while since I updated it. The basic process is the same, but there have been âtweaksâ everywhere.
Note:
I really encourage you to try the âmanualâ path. The only difference is that you will have to fix the timings by hand. With the Gemini full-ai transcription, itâs pretty reliable that it wonât miss any talking (it might have too much thought). That means that, when watching the video to fix the timings, you can skip when there is a long time when no one is talking (ex, during sex). For some of the video, I end up taking only ~%50 of the duration to fix the timings (i.e. 30 minute for a 60 minutes scene).
The manual path has better TranslationAnalysis in my view since it contains information about âhow it was saidâ and only the important visual cues that is important for this specific translation.
Manual:
{TranslationAnalysis-Audio:Gap Summary: She moved from the center of the room to the sofa, kneeling on it to clear away the clutter.
She gestures towards a spot on the sofa she is clearing. Her tone is efficient and slightly bossy, directing him to the space she decided on rather than asking him where he wants to sit. 'Kocchi' (Here/This way) serves as a command disguised as guidance.
Option 1 (Directive): "Ah, right. Over here."
Option 2 (Casual Command): "Okay, yeah. This way."}
Automatic, with the âoldâ visual-analysis give a lot of visual information but most of is redundant, unnecessary. And it doesnât include any âvoice deliveryâ analysis.
OngoingEnvironment:School nurseâs office: desk with monitor and supplies, rolling chair, metal cabinet with books, examination bed, folding privacy screens, closed pink curtains.}
{OngoingAppearance:Nurse: white lab coat over a light-blue blouse (partly unbuttoned), black skirt, black thighâhigh stockings with visible garter straps, dark heels, long wavy hair. Man (POV): seated in dark clothing (top and pants).}
{ParticipantsPoses:Man: seated, legs apart, hands resting on his thighs, facing the woman. Nurse: standing very close in front of him, torso slightly leaned forward, facing him.}
{TranslationAnalysis:Her apology reads as a casual, friendly greeting; standing very close in front of the seated listener makes it feel personal and attentive rather than formal.}
Note #2:
I might take some time to update the automatic path to use similar steps to the manual, maybe with âpaddingâ on the audio since I canât trust 100% the timings.
It might be a documentation problem, but the cost file should not be used to fix/understand problems. Itâs only there to give you a summary of all the calls/costs that were made so far.
When an error occurred, you should look at the <videoname>.TODO_<stepname>_<number>.txt file or a TODO_E file in the folder.
TODO_E will contain an error that occurred that cannot be fixed by the user. If you rerun the application, the file will be moved to the backup folder, and the request will be sent again to the AI.
TODO will contain an error that occurred, usually a JSON parsing error. It will also contain the response received. It expects the user to fix the error, save the file, and rerun. If the JSON is fixed, the application will parse the nodes in the fixed files and continue with the next nodes. If you donât know how to fix the error, you can also simply delete the file and rerun the request. Fixing it by hand mainly saves you the cost of rerunning the request twice.
Note: Most of the JSON errors are auto-fixed by the application, but there are still a few types of errors that I donât auto-fix.
Yes, this is a part that needs documenting. Iâve been making a lot of changes in the config file and overrides on each version, so it might be hard to follow.
To change the engine used (ex. Gemini-Pro instead of Poe), you can change the worker:
"Workers": [
{
"TranscriptionId": "full-ai",
"Engine": {
"$ref": "AIEngineGeminiPro2.5OnGoogle"
}
},
But, now that Iâve defined collection for most of the model, you can change the collection in the SharedObjects section also:
"SharedObjects": [
{
"$type": "AIEngineCollection",
"$id": "AIEngineGeminiPro2.5",
"Engines": [
// [Option] Change the order if you want to use your Poe points first.
{ "$ref": "AIEngineGeminiPro2.5OnGoogle" },
{ "$ref": "AIEngineGeminiPro2.5OnPoe" }
],
"SkipOnQuotasUsed": true,
"SkipOnServiceUnavailable": true
},
...
And, when itâs a collection, there is yet another way, only the engine that have a Key will be considered. If you donât have a APIKey defined for Gemini, for example, Poe will be used even if itâs the second item in the list.
Ok, good to know, Iâll try that next time then.
The scene I did was pretty simple, so I only had a few timings to fix at the end in the generated file.
I thought I was doing everything to the letter but I get an error
Do you happen to know an easy fix, I couldnât find anything in the troubleshooter. Or I didnât understand it :P. That could also be a major problem ![]()
Do you get the same error if you try to rerun the application?
This looks like a local error when I call âffmpegâ to try to generate a screenshot. ffmpeg might return without an error but it didnât generate the file.
Can you try running this in a command prompt in the folder you are doing? (with the right video name)
"%APPDATA%\FunscriptToolbox\ffmpeg\ffmpeg.exe" -ss 2:00 -i "KAVR-440-4-8K.mp4" -frames:v 1 test.jpg
And see if it generates a test.jpg file.
Try in a ânormalâ command prompt. I start a prompt with âcmdâ. Not a Powershell prompt. It should so the same thing.
Or run this in powershell:
& "$env:APPDATA\FunscriptToolbox\ffmpeg\ffmpeg.exe" -ss 2:00 -i "KAVR-440-4-8K.mp4" -frames:v 1 test.jpg




