How It All Started
One day I asked myself — what if I could automate the process of creating short, animated story videos from scratch? I had an idea to use AI to generate scripts, voiceovers, images, and combine it all into a video — and I wanted it all to run on my home server.
With my Ubuntu server (64GB RAM, NVIDIA RTX 2060 SUPER), and a passion for automation, I dove in using n8n as the core of the orchestration.
The Goal
I wanted to turn a simple row in a spreadsheet (in my case, Baserow) into a full animated video story — complete with AI-written text, synthesized voice narration, AI-generated illustrations, and FFmpeg-powered video editing — automatically and cost-free.
Core Tools I Used
- n8n – for building the automation workflows
- MinIO – local S3-compatible storage
- Kokoro TTS – blazing fast text-to-speech via GPU
- Stable Diffusion WebUI – local AI image generation
- NCA Toolkit – video composition, audio merging, and clip processing
- Baserow – open-source Airtable alternative
The Workflow in Action
The magic begins with a Baserow table — a single row containing the story topic, desired voice, character names, and number of scenes. From there:
- n8n pulls the row marked as
pending
- It sends the topic to OpenRouter (Gemini Flash) to generate the full story
- Then it formats the story text for use in TTS
- Kokoro TTS generates the audio in MP3 format locally
- n8n uploads that MP3 to MinIO for access by other tools
- n8n sends the story to another LLM to generate six image prompts
- Each image prompt is sent to Stable Diffusion WebUI (via API)
- Base64 images are turned into PNGs and uploaded to MinIO
- NCA Toolkit converts each image into a 22-second video clip
- All clips are merged and combined with the TTS audio
- The final video file is saved, and the Baserow row is marked as
done
From input to video output — all in under 5 minutes, without spending a dime.
What I Learned
I learned how powerful open-source tools can be when combined. I discovered how to work with APIs in n8n, how to run Stable Diffusion locally with LORA models, and how to manipulate audio, images, and video using FFmpeg.
Most importantly — I now have my own storytelling factory that runs on my own terms, hosted completely on my own hardware.
Final Thoughts
If you're a creator or developer with a spare PC and a dream of automation — this is 100% doable. No subscriptions. No paywalls. Just free, powerful technology working together.
The full JSON workflow and setup guide are linked in the description. If you’d like a video walk-through of how to set this up from scratch, let me know — I’d love to help others get started.