01 // project

Jarvis AI

A voice-activated personal assistant running on a Raspberry Pi. Speaks to a Cloudflare Worker that classifies intent, pulls live system data, and synthesises an audio response — all in under a second.

live
cloudflare workers
fish audio
ionos s3
Architecture Visualization
Why i came up with this [IDEA]
I always wanted to be able to automatically control my house, if its my lights, blinds or PC. Coming home from school and having everything open and running, or waking up in the morning with my lights on and my blinds going up is a nice feeling.

This also starts off easy, because i love designing systems and having them present in my daily life.
What parts and why Components
I did pick some unique options, but in general i am pretty happy, because my design is pretty straightforward.
When it came to the high level design i didnt want to use an AI Realtime kit, because that would frankly be pretty lame.

My goal was to have a custom voice, and the bland AI ones didnt seem cool. Then i remembered the GlaDOS voice, from the portal video game. I had it running on my GPU as a self hosted option, but that wouldnt cut it because the home automation would be reliant on the status of my PC. This means i had to find an audio provider who would be able to do this for cheap and with an API. Fish audio seemed like a good choice, so i went with that For STT they offer pretty cheap rates and it came out as the best choice.

When it came to the actual intelligence i had to make a choice:
- use a custom trained model
- add examples in the instructions
Not having a high end GPU makes the latter a better choice because its easy to set up and customisable-er. This also allows me to switch to more voice options without retraining the AI on the patterns of the other characters.

When first making this i had picked a cheap GPT model, because i didnt have the reason to pay for a high end AI, as it would only have to do actual tasks without talking. Furthermore it was easy to track, easy to integrate etc. In playtesting this was a bit slow, we will come to that in the next part though. My friend told me about Groq AI and after some checking i found out they boasted a speed of 500 tokens/sec, which definetly would change the game. What finally hooked me, was that it was FREE... Like hats off, fast and free? Count me in!
Playtesting and real feedback Integration
In production this was pretty good, but there were some pushbacks. The quickly noticable one was the slow response time. I wasnt able to send and wait for a response within the same sec, or even 2. The average response time was around 10 secs. This also had to do with my waterfall style archicture in the beginning. This meant my n8n backend had to respond before anything else would happen, then the AI, then the TTS and so on.

To solve this i simply made the entire thing run in parallel, atleast the things able to. Like shown in the graph all of the actual Agentic actions are done in parallel, which detaches all of the actions from each other and allowing things to break and still keep going, saving time and also add more modularity.
Where it is now Post mortem
For the future i will add more things i can control via GlaDOS, like more lights, my shutters and so on. But right now i have to make a permanent solution like a wall power brick so i dont need my bench power supply. No updates are needed, so i just need to add more functionality when needed. This wraps it up to be a simple and maintainable project.