Yes, I did. Great suggestion. I like that Poe allows NSFW (even if the underlying model from other companies might refuse to answer).
All 3 models (Haiku, Sonnet and Opus) of Claude-3 are great. They give good translation but Opus is a lot more cooperative then Haiku & Sonnet.
Gemini-pro and even, GPT-4, were not as good.
This was an option for my old process. Now, I simply use Whisper to do the draft, it does a better job than SilaroVAD.
Mistral Next was too concise, it didn’t translate the ‘whole JSON’ that I gave it to translate. Mistral Large translated everything. But I don’t know about other use cases thought.
My new process allows to ‘fake’ having vision by telling the AI who’s talking (which could be done in the future with good Speaker Diarization AI) or by giving a general description of what will happen on the screen in the next portion of the video, and see if it improve the translation.