Their first version is most likely already 10x better than Siri.
> Understands when it is in a particular area and does not ask “which light?” when there is only one light in the area, but does correctly ask when there are multiple of the device type in the given area.
alex_young•Mar 16, 2026
One of my favorite episodes:
I set 2 timers for the same thing somehow. I then tried to cancel one of them.
>“Siri, cancel the second timer”
“You have 2 timers running, would you like me to cancel one of them?”
>“Yes”
“Yes is an English rock band from the 70s…”
>“Siri, please cancel the timer with 2 minutes and 10 seconds on it”
“Would you like me to cancel the timer with 2 minutes and 8 seconds on it?”
>“Yes”
“Yes is an English rock band from the 70s…”
Eventually they both rang and she listened when I said stop.
0_____0•Mar 16, 2026
> "Stop" is a song by English girl group the Spice Girls from their second studio album, Spiceworld (1997).
abroadwin•Mar 16, 2026
My favorite is when I ask Siri to set a timer and get back "there are no timers running."
Skidaddle•Mar 16, 2026
My other favorite is when I ask Siri to set a timer on my watch and it does a web search.
sanswork•Mar 16, 2026
My favourite is when I ask siri to stop the alarm(that is currently going off) and it decides to disable my morning wake up alarm but keep the current alarm going off.
jazzyjackson•Mar 16, 2026
“Siri stop”
“There’s nothing to stop”
> me, suddenly aware of how the AI takeover will happen
xp84•Mar 16, 2026
Helping my kid get ready for shower I had this exchange:
Me: "Text Jane Would you mind dropping down the robe and underpants"
Siri: Sends Jane "Would you mind dropping down"
Me: rolls eyes "Text Jane robe and underpants"
Siri: "I don't see a Jane Robe in your contacts."
Me: wishes I could drown Siri in the bathtub
It's wild to me that Apple got the ability to do the actual speech-to-text part pretty much 100% solved more than half a decade ago, yet struggles in 2026 to turn streams of very simple, correctly-transcribed text into intents in ways that even a local model can figure out. Siri is good STT, a bunch of serviceable APIs that can control lots of stuff, with the digital equivalent of a brain-damaged cat sitting at the center of it guaranteeing the worst possible experience.
yanis_t•Mar 16, 2026
I'm still waiting till the promise of voice AI that was showed during the OpenAI demo in 2024 turn real somehow. It's not clear to me, why there has been zero progress since then.
j45•Mar 16, 2026
What tech can do vs applying it requires it often to be configured and packaged to be usable in that way.
phito•Mar 16, 2026
It also needs to work at least 99% of the time if not more. Not easy to do this with indeterministic models.
recursive•Mar 16, 2026
If my lights and heat were 99% reliable, I'd be getting new lights and heat.
bigstrat2003•Mar 16, 2026
In those cases yeah, 99% isn't reliable enough. I'm not going to tolerate having power down for 3 days out of the year. But in fairness, home automation is less critical than that so 99% reliability is still acceptable to me. I don't think LLMs are anywhere near that, though, nor is there any sign of them getting there any time soon. So it does concern me to use an LLM as the backbone of home automation.
bombcar•Mar 16, 2026
I took 99% reliable as meaning not having to repeat the command, which given that Siri is something like 50% reliable by that metric, 99% sounds like heaven.
voidUpdate•Mar 16, 2026
Do people like talking to voice assistants? I've used one occasionally (mostly for timers when I'm cooking), but most of the time it would be faster for me to just do it myself, and feels much less awkward than talking to empty air, asking it to do things for me. It might be because I just really don't like making more noise than I have to
(Yes, I appreciate that some people may be disabled in such a way that it makes sense to use voice assistants, eg motor problems)
freeone3000•Mar 16, 2026
I use it frequently for reminders and calendar events when not at a computer, as voice is faster than the mobile interface (with so many screens) for setting something up
RankingMember•Mar 16, 2026
I love it for lists- like my hands are full making something in the kitchen and I can just tell it to add things to my grocery list as soon as I notice I'm out of something.
hamdingers•Mar 16, 2026
I consider each time I need to pull out my phone and "do it myself" to be a failure of my smart home system.
If a light cannot be automatically on when I need it (like a motion sensor) or controlled with a dedicated button within arms reach (like a remote on my desk) then the third best option is one that lets me control it without interrupting what I'm doing, moving from where I am, using my hands, or possessing anything (a voice assistant).
voidUpdate•Mar 16, 2026
Do you not just turn the light on when you go in a room, and turn it off again when you go out? All the rooms in my flat have switches next to the door
hamdingers•Mar 16, 2026
Yes, that would be a button within arms reach, something I explicitly prefer over the voice assistant. I use them frequently.
I don't have just one light per room though, some spaces like my workshop or living room have a lot of lighting options, and flitting around the room flipping a bunch of switches is clumsy and unnecessary. The preference is always towards automation (e.g. when I play a movie in Jellyfin, the lights dim) but there are situations where I just need to ask for the workbench light.
johannes1234321•Mar 16, 2026
The Sun moves around, while I am in a room. It might be high up when I enter a room, but after a while there may be clouds or it may have set.
When watching a movie one may dimthe light. Once finished one may need more lights.
When going to bed I may want to switch all lights off. When getting up it may need some extra light.
A switch on the door is nice. More switches is better. Being able to control from anywhere may be even nicer.
voidUpdate•Mar 16, 2026
So I grab my phone, open the homeassistant app, and mess with the settings on my light, or use homeassistant through my browser on my desktop. No yelling at a computer needed
nickthegreek•Mar 16, 2026
My lights adjust their brightness and color spectrum automatically throughout the day while also understanding the time of year and sun position. This alone is next level. All are voice/tablet controlled. When I start a movie at night, lights will adjust automatically in my open floor plan first level. All of this operates without me ever having to give any mental energy beyond the initial setup.
This is not just flip a switch territory.
wan23•Mar 16, 2026
Many homes have a bunch of lights with their own switches, like lamps. Also there are rooms with multiple entrances, like a living room with a bedroom on the other side from the from the front door entrance, which would involve walking to the side of the room with the switch then walking back through a dark room after you turn it off. Being able to just get into bed and say "Alexa, turn off all of the lights" is way more convenient than checking 14 light switches around my home.
Insanity•Mar 16, 2026
I guess most of my use is whilst driving, to start/stop music or audiobooks, change navigation etc. Although changing navigation through Siri is somewhat painful as it often gets my intended destination wrong lol.
nickthegreek•Mar 16, 2026
I prefer voice strongly. I don't want to stop what i am doing, find a device, open the app, wait for it refresh, navigate and click to get Milk on a list. Sure you can bring this down a few steps, but all of which still require me to move, have a hand and eye free.
harrall•Mar 16, 2026
I would, if they worked even 90%.
I mostly set timers because it’s one of the few things that always works.
phatskat•Mar 16, 2026
I pretty much only use them for timers and weather, and the occasional lookup for quick random info. And this is all only if I don’t have a phone handy or eg the toddler is going to timeout and I need to set his timer in the midst of him having a meltdown about it.
It’s why I haven’t and won’t enable Gemini, and I’ll likely chuck my nest minis once I’m forced to have an LLM-based experience. Hopefully they’ll be able to at least function as dumb Bluetooth speakers still but I’m not holding out hope on that end
nancyminusone•Mar 16, 2026
I don't. I pretty much don't like talking in general, especially if I'm alone. Accordingly, no voice assistants; I don't think I've ever triggered one except accidentally.
hamdingers•Mar 16, 2026
If you're less concerned about privacy, I use Gemini 2.5 Flash for this and it's exceptionally good and fast as a HA assistant while being much cheaper than the electricity that would be needed to keep a 3090 awake.
The thing that kills this for me (and they even mentioned it) is wake word detection. I have both the HA voice preview and FPH Satellite1 devices, plus have experimented with a few other options like a Raspberry Pi with a conference mic.
Somehow nothing is even 50% good as my Echo devices at picking up the wake word. The assistant itself is far better, but that doesn't matter if it takes 2-3 tries to get it to listen to you. If someone solves this problem with open hardware I'll be immediately buying several.
Alexa devices have these (or used to at least), but Google Home's never did. So it shouldn't be necessary.
jcims•Mar 16, 2026
Yeah a small (ideally personalized) wakeword model would probably outperform just about any audio wizardry.
ethagnawl•Mar 16, 2026
That's a good call. I have a PS3(?) mic/camera that I was using when I was running the original Mycroft project on a Pi. I wonder if that would help with the inbuilt HA mic not waking for most of my family, most of the time. I will have to look at my VA Preview device and its specs later because I'm not sure if you can connect an external mic to it out-of-the-box.
_spduchamp•Mar 16, 2026
How about a button?
I'd prefer to physically press a button on an intercom box than having something churning away constantly processing sound.
pwillia7•Mar 16, 2026
I'm in if I can embed it into my forearm
hamdingers•Mar 16, 2026
In the mid 2000s I had a setup where some children's walkie talkie "spy watches" could be used to issue commands to a completely DIY, relay based smart home system.
Rules out a bunch of cases where your hands are busy handling ingredients in the kitchen, etc
hamdingers•Mar 16, 2026
If I have to go to a thing and push a button, I'd rather the button do the thing I wanted in the first place. Voice assistants are for when my hands are full or I don't want to get up. (I wrote more about my home automation philosophy in another comment[1]).
Also I have all my voice assistant devices mounted to the ceiling
Did you have any luck with the power issues on the new board?
stavros•Mar 16, 2026
The new board hasn't come yet, but a friend gave me a great idea, to power the mic from a GPIO, which powers it off completely when the ESP is off.
Hopefully the new boards will be here soon, but another issue is that I don't really have anything that can measure microamp consumption, so any testing takes days of waiting for the battery to run down :(
I do think these clones are the issue, though. They had a LED I couldn't turn off, so they'd literally shine forever. They don't seem engineered for low quiescent current, so fingers crossed with the new ones.
CurleighBraces•Mar 16, 2026
Makes a lot of sense :) thanks for the update.
I'll try to remember to creepy stalk you for updates as the device sounds great!
stavros•Mar 16, 2026
You can sign up to my mailing list to get emails if you want! It's at the end of each post.
croes•Mar 16, 2026
Time for a real life Star Trek comm badge
senkora•Mar 16, 2026
Why not use an easier to detect wake “word”, like two claps in quick succession? Or a couple of notes of a melody?
hamdingers•Mar 16, 2026
Can't clap if your hands are full and I would not subject my family to my attempts at delivering a melody.
I haven't tried training my own wake word though, I'm tempted to see if it improves things.
airstrike•Mar 16, 2026
Personally I'd pick "Cthulhu"
otikik•Mar 16, 2026
What about whistling?
ethagnawl•Mar 16, 2026
What's been surprising in my experience regarding the wake word is that it recognizes me (adult male) saying the wake word ~95% of the time. However, it only registers the rest of my family (women and children) ~30% of the time.
vineyardmike•Mar 16, 2026
I have no firsthand knowledge, but I’d strongly bet that the home-assistant effort to donate training data is mostly get adult males, and nearly zero children.
ethagnawl•Mar 16, 2026
Oh, I'm sure you're right. I've had people in my personal life (non-technical; "AI enthusiasts") laugh at me over concerns about training bias but this is likely a real world example of it.
stavros•Mar 16, 2026
I think you can train your own wake word with microWakeWord but I've never done it.
dghlsakjg•Mar 16, 2026
This was 2021 (so pre-llm), but I used to work for a company that gathered data for training voice commands (Alexa, Toyota, Sonos, were some clients). Basically, we paid people to read digital assistant scripts at scale.
Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.
gmueckl•Mar 16, 2026
You are reporting on a deliberately curated effort vs. what I understand is effectively voluntary data donation without incentives. It's not surprising to me that the later dataset ends up biased due to the differences in sourcing.
robotswantdata•Mar 16, 2026
What about your wifi APs sensing which room you are in, with your choice of hilarious dance moves as the trigger ?
Funky chicken for Gemini
Penguin dance for OpenAI
Claude?
daveoc64•Mar 16, 2026
I've recently purchased a couple of the Home Assistant Voice Preview Edition devices, and they leave a lot to be desired.
The wake word detection isn't great, and the audio quality is abysmal (for voice responses, not music).
Amazon has ruined their Alexa and Echo devices with ads and annoying nag messages.
I'd really like an open alternative, but the basics are lacking right now.
touristtam•Mar 16, 2026
Can those devices (Amazon) be _jail broken_? I was just wondering that this morning while taking a shower.
locao•Mar 16, 2026
Youtube is trying to push me to watch a video about jail breaking the Echo Show for a week now. I didn't watch it, but it's probably easy to find.
vineyardmike•Mar 16, 2026
Generally no. Big tech companies have gotten good at locking down devices to the boot loader. Some of the signing keys for certain OTA versions have leaked, but you can’t rely on that.
Some of the devices contain browsers, and people have set up hacky ways to turn them into thin clients through that, but it’s not particularly reliable IME.
I heard some Chinese brands which made similar hardware for Chinese consumers don’t lock their devices down, letting you flash an open install of Android on them, but I haven’t seen anyone try that IRL.
One that I have been experimenting with is using analog phones (including rotary ones!) to act as the satellites. I live in an older home and have phone jacks in most of the rooms already so I only had to use a single analog telephone adapter. [0] The downside is I don't have wake word support, but it makes it more private and I don't find myself missing my smart speakers that much. At some point I would like to also support other types of calls on the phones, but for now I need to get an LLM hooked up to it.
actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.
the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works.
the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro.
btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.
80% of my home voice assistant requests really need no response other than an affirmative sound effect.
nickthegreek•Mar 16, 2026
100% agree. I dont want a Yes, Got it, Will do or even worse, I have turned on the Bedroom Light. I want soft success ding or a low failure boop.
cptskippy•Mar 16, 2026
> actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.
I would argue that the hardest part is correctly recognizing that it's being addressed. 98% of my frustration with voice assistants is them not responding when spoken to. The other 2% is realizing I want them to stop talking.
xrd•Mar 16, 2026
I've been having a lot of fun using my old Mycroft AI device. Neon is the new software package. It didn't solve the issues highlighted in this thread, but it is a fun open device to hack on. I wrote a little web app that will speak in the standard voice and say things like "hey kids, I'm AI and know everything, and your dad is really cool." They love to yell at me when I do that.
8 Comments
> Understands when it is in a particular area and does not ask “which light?” when there is only one light in the area, but does correctly ask when there are multiple of the device type in the given area.
I set 2 timers for the same thing somehow. I then tried to cancel one of them.
Eventually they both rang and she listened when I said stop.“There’s nothing to stop”
> me, suddenly aware of how the AI takeover will happen
Me: "Text Jane Would you mind dropping down the robe and underpants"
Siri: Sends Jane "Would you mind dropping down"
Me: rolls eyes "Text Jane robe and underpants"
Siri: "I don't see a Jane Robe in your contacts."
Me: wishes I could drown Siri in the bathtub
It's wild to me that Apple got the ability to do the actual speech-to-text part pretty much 100% solved more than half a decade ago, yet struggles in 2026 to turn streams of very simple, correctly-transcribed text into intents in ways that even a local model can figure out. Siri is good STT, a bunch of serviceable APIs that can control lots of stuff, with the digital equivalent of a brain-damaged cat sitting at the center of it guaranteeing the worst possible experience.
(Yes, I appreciate that some people may be disabled in such a way that it makes sense to use voice assistants, eg motor problems)
If a light cannot be automatically on when I need it (like a motion sensor) or controlled with a dedicated button within arms reach (like a remote on my desk) then the third best option is one that lets me control it without interrupting what I'm doing, moving from where I am, using my hands, or possessing anything (a voice assistant).
I don't have just one light per room though, some spaces like my workshop or living room have a lot of lighting options, and flitting around the room flipping a bunch of switches is clumsy and unnecessary. The preference is always towards automation (e.g. when I play a movie in Jellyfin, the lights dim) but there are situations where I just need to ask for the workbench light.
When watching a movie one may dimthe light. Once finished one may need more lights.
When going to bed I may want to switch all lights off. When getting up it may need some extra light.
A switch on the door is nice. More switches is better. Being able to control from anywhere may be even nicer.
This is not just flip a switch territory.
I mostly set timers because it’s one of the few things that always works.
It’s why I haven’t and won’t enable Gemini, and I’ll likely chuck my nest minis once I’m forced to have an LLM-based experience. Hopefully they’ll be able to at least function as dumb Bluetooth speakers still but I’m not holding out hope on that end
The thing that kills this for me (and they even mentioned it) is wake word detection. I have both the HA voice preview and FPH Satellite1 devices, plus have experimented with a few other options like a Raspberry Pi with a conference mic.
Somehow nothing is even 50% good as my Echo devices at picking up the wake word. The assistant itself is far better, but that doesn't matter if it takes 2-3 tries to get it to listen to you. If someone solves this problem with open hardware I'll be immediately buying several.
I'd prefer to physically press a button on an intercom box than having something churning away constantly processing sound.
I'm looking forward to whenever my Pebble ships so I can recreate that experience with this: https://github.com/skylord123/pebble-home-assistant-ws
Also I have all my voice assistant devices mounted to the ceiling
1. https://news.ycombinator.com/item?id=47399909
https://repebble.com/index
Could be pressed even if your hands were busy.
Hopefully the new boards will be here soon, but another issue is that I don't really have anything that can measure microamp consumption, so any testing takes days of waiting for the battery to run down :(
I do think these clones are the issue, though. They had a LED I couldn't turn off, so they'd literally shine forever. They don't seem engineered for low quiescent current, so fingers crossed with the new ones.
I'll try to remember to creepy stalk you for updates as the device sounds great!
I haven't tried training my own wake word though, I'm tempted to see if it improves things.
Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.
Funky chicken for Gemini
Penguin dance for OpenAI
Claude?
The wake word detection isn't great, and the audio quality is abysmal (for voice responses, not music).
Amazon has ruined their Alexa and Echo devices with ads and annoying nag messages.
I'd really like an open alternative, but the basics are lacking right now.
Some of the devices contain browsers, and people have set up hacky ways to turn them into thin clients through that, but it’s not particularly reliable IME.
I heard some Chinese brands which made similar hardware for Chinese consumers don’t lock their devices down, letting you flash an open install of Android on them, but I haven’t seen anyone try that IRL.
https://xdaforums.com/t/unlock-root-twrp-unbrick-amazon-echo...
[0] https://www.home-assistant.io/voice_control/worlds-most-priv...
the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works.
the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro.
btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.
[1] https://github.com/coqui-ai/TTS [2] https://rime.ai
I would argue that the hardest part is correctly recognizing that it's being addressed. 98% of my frustration with voice assistants is them not responding when spoken to. The other 2% is realizing I want them to stop talking.