OSRChat makes your OSR device smarter. It can connect to any LLM as long as it follows the OpenAI API specification. You can also connect to Ollama for local deployment, ensuring 100% privacy. Additionally, you can customize prompts to build your own conversational scenarios, similar to SillyTavern.
If you plan to use Ollama, it is recommended to use the huihui_ai/glm-4.7-flash-abliterated:q4_K_S model (requires 22 GB of VRAM). Also, make sure to load the model once on the Ollama side before making a request, to avoid the first conversation getting stuck or becoming unresponsive.
The current prompts are not perfect, so you may need to test and adjust them yourself.
If you want to try the text-to-image feature, make sure the correct model exists on the ComfyUI server and that the model name is entered correctly, for example SDXL\illustrij_v18.safetensors. Also ensure the port is configured properly. It is recommended to use the latest version of ComfyUI and deploy it from source rather than using ComfyUI Portable. In addition, before submitting the workflow, it is best to preload the required models into VRAM or system memory to avoid request timeouts caused by slow model loading.
License
This project is licensed under the MIT License.
Update
v1.4.3
Added an image aspect ratio option for text-to-image generation.
Fixed a bug where the retry button could not be used in certain situations.
Improved the input experience of the message input box.
Optimized the Examples in character cards.
Optimized the system prompt.
v1.4.0 - v1.4.2
Added a text-to-image generation feature, enabling image creation by connecting to a ComfyUI server.
While the model is generating text, it is displayed as small gray scrolling text until the output is complete.
Fixed an issue where chat bubbles were being squashed.
v1.3
Added a retry button to the bottom-right corner of the chat bubble.
Fixed an issue where the backend could not properly handle interruption when the frontend stopped a streaming response.
Added a system role prompt called “S2O” to convert SillyTavern character cards into a format compatible with OSRChat.
v1.2
Optimized the prompt structure, enabling functionality similar to character cards in SillyTavern. Users can browse, add, edit, delete, export, and import cards on the settings page.
Improved controller support.
Refactored and optimized the code structure.
v1.1
Added a context editing feature that allows you to click the edit icon in the top-right corner of a message bubble to modify that context during a conversation.
Updated the import/export context icons to make them less confusing.
OSRChat is essentially a server, and you can call its API from other projects. However, please do not expose it to the public internet—only use it within a secure and controllable local network, as I did not take any network security issues into account when developing it.
I don’t trust any cloud service providers, even though they repeatedly promise privacy and security. By using OSRChat together with Ollama or vLLM, all data can be processed entirely within your local network, ensuring 100% privacy. OSRChat also does not record any chat history; if you really need to keep it, I’ve provided an export button that lets you save the current conversation.
You may have noticed a row of buttons at the top of the Chat page. The upward arrow button allows you to export the full context of the current conversation, along with the system prompt, as a JSON file. You can modify the exported JSON file (for example, by editing the LLM’s responses), and then use the downward arrow button to import the modified file. This will allow you to achieve the functionality you described.
Being able to modify the context more quickly would indeed be a great improvement. I will consider implementing this in a future version.
I actually tried developing it based on SillyTavern. However, SillyTavern is simply too bloated—so many settings are piled together that it’s hard to know where to start. I don’t want to add even more complexity to it.
My product design philosophy has always been “simple is better,” which is why OSRChat exists. The OSRChat installation package is only 19.3MB.
To support more LLM APIs, I used the OpenAI library. Since the release of GPT-3.5, many providers have adopted or made their APIs compatible with the OpenAI API specification so that they can be easily swapped with one another. This includes commonly used local deployment tools such as Ollama and vLLM.
As far as I know, KoboldCpp is compatible with the OpenAI-style API (the base URL might be http://localhost:5001/v1), so it should be able to connect directly.
I’m glad you like it.
Regarding prompts, since everyone has their own preferences and prompts may need slight adjustments depending on the model being used, I can only provide the most basic and widely compatible prompts as examples.
You can open the prompts directory from the tray icon’s right-click menu and modify them according to your preferences.
I got this all working this evening and I had a pretty fun time with it. I am using the recommended model in ollama, I only have an rtx 5080 16GB so its not perfect but it works. Is there anyway to get it to not show its “thinking text” and just show the final results? Im a noob at all this stuff, I can usually follow directions etc, but to actually code and stuff is a bit beyond me. I appreciate your work and I plan on getting your other software up and running soon.
Thank you for your recognition. You can send the code from the src/llm_client.py Python module to ChatGPT and ask it how to prevent the thinking tokens from being returned. However, I don’t recommend doing that.
The model’s thinking process can sometimes take quite a while. If nothing is displayed on the page during that time, users might think the program has frozen.
My current approach is to display the thinking text while the model is reasoning, but once the thinking phase is finished, the thinking section is removed so that only the model’s final answer remains visible.
right on, thanks for the explanation. I find it kinda cool to see the reasoning and all but I feel like it detracts from the experience. anyway thanks again
This is actually not an error. I noticed that your browser language doesn’t seem to be English. Since there are no translation files available for that language, a 404 error occurs.
You can find files like index_xx.json and settings_xx.json in either project_root/public/i18n or software_root/_internal/public/i18n, where “xx” represents the language code.
You’re welcome to submit language files on GitHub and contribute to the project’s i18n support.
You also have another option: use a non-thinking model, such as “lfm2.” However, models that can be deployed locally usually have a relatively small number of parameters. The purpose of “thinking” is to make the model “smarter.” I’ve tested lfm2 and found it usable, but it often gives responses that don’t meet expectations, requiring you to repeatedly roll back messages and try again.
Added a context editing feature that allows you to click the edit icon in the top-right corner of a message bubble to modify that context during a conversation. @hue04476.test1
Optimized the prompt structure, enabling functionality similar to character cards in SillyTavern. Users can browse, add, edit, delete, export, and import cards on the settings page.
I’ve added a new system role prompt called “S2O.” It’s used to convert SillyTavern character cards into a format compatible with OSRChat, making it possible to use the large number of SillyTavern character cards in OSRChat. The conversion works fairly well, though sometimes it requires a bit of manual adjustment. It hasn’t been updated in the software yet, but it has already been pushed to the GitHub repository. You can also add it manually to system_roles.json:
{
...,
"S2O": "Your task is to process the SillyTavern character card or Initial Messages provided by the player.\n\nIf processing a character card, reorganize it into several well-structured paragraphs that clearly describe the world setting, background, characters, gameplay mechanics, and other relevant information. The language must be smooth, concise, and refined. Retain all key settings, conflicts, and details—especially the character's appearance, personality, and identity. Describe everything in third-person perspective. Replace {{user}} with \"the player\" and {{char}} with the actual character name. Do not use markdown or emojis.\n\nIf processing Initial Messages, rewrite and simplify them in a light novel style, reformat the text appropriately, and replace {{user}} with the second-person \"you\" (referring to the player). Replace {{char}} with the actual character name. Be sure to preserve all critical information, especially the core plot conflicts and contradictions."
}