Recently, I was exploring the latest developments in the world of text-to-speech (TTS) and stumbled upon Kokoro—an open-source TTS model that's been making waves for its impressive performance despite only using 82 million parameters. It currently ranks among the top models on the TTS leaderboard and I was eager to test it on my own machine.
My goal was simple: install Kokoro TTS ONNX using Docker on my Ubuntu home server, which has a decent setup—64GB RAM and an NVIDIA RTX 2060 SUPER GPU. Here’s a step-by-step rundown of how I made it work, including how to handle GPU permissions, Docker/PODMAN usage, and why this setup is worth it.
Why Kokoro?
One of the biggest advantages of Kokoro is its performance. Despite its relatively small size (82M parameters), the model delivers speech quality that's competitive with heavyweight TTS systems. Plus, it's fully open-source, making it ideal for personal projects without the cost of premium services like ElevenLabs.
Preparing My Environment
First, I needed Docker (or Podman) installed. Since I'm comfortable with Docker, I proceeded with that, but if you're on Fedora or prefer Podman, it works just as well.
If you're using Podman with NVIDIA GPUs, you'll want to install:
sudo dnf install golang-github-nvidia-container-toolkit
This avoids installing NVIDIA’s official container toolkit which might cause dependency conflicts on Fedora.
Running Kokoro TTS with Docker
The easiest way to spin up Kokoro is to use the Docker image provided in the Kokoro-FastAPI GitHub repository. Here's the command I used:
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.2
If you’re not using a GPU or just testing, you can switch to the CPU image:
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.2
Solving SELinux Issues
When using Podman or Docker with SELinux enabled, you might run into permission errors. I fixed this with:
sudo setsebool -P container_use_devices true
This allows containers to access GPU hardware without changing SELinux to permissive mode.
What Happens Next
Once you run the container, head over to http://localhost:8880 and you’ll be greeted with a minimal UI. From here, you can select a voice, type in your text, and hit generate to get lifelike audio playback.
My first TTS generation with Kokoro was blazing fast. I generated an entire short paragraph in just a few seconds using my GPU. The quality? Honestly, on par with what I’ve been paying premium for with other services.
Future Plans
I plan to integrate Kokoro into my content automation stack using n8n. The idea is to pipe article text or chatbot responses directly into Kokoro for fast TTS generation—and since everything runs locally, it’s zero cost per request!
For developers, you can also access Kokoro’s FastAPI endpoints directly for advanced usage. It supports real-time inference via HTTP, and I plan to explore integrating it with voice bots or even smart assistants at home.
Conclusion
If you’re looking for a powerful, fast, and free TTS engine to run locally, Kokoro is absolutely worth your time. Using Docker makes installation a breeze, and with GPU support, performance is more than adequate for production needs.
I hope this guide helps you get Kokoro TTS up and running on your own machine. If you found this helpful, feel free to share, comment, or follow along for future updates on integrating Kokoro with n8n and more AI workflows!