Launch HN: Cua (YC X25) – Open-Source Docker Container for Computer-Use Agents

169 points 75 comments 3 days ago
suninsight

Very cool product !

We, at NonBioS.ai [AI Software Dev], built something like this from scratch for Linux VM's, and it was a heavy lift. Could have used you guys if had known about it. But can see this being immediately useful at a ton of places.

frabonacci

Thank you, really appreciate that! Curious - did you end up using QEMU for your Linux VMs? And are you running your system locally or in the cloud?

We’re currently focused on macOS but planning to support Linux soon, so I’d love to hear more about your use case. Feel free to reach out at founders@trycua.com - always great to learn from others building in this space.

suninsight

No we dont use QEMU - never heard of them till now. We built our own software from scratch - using Ubuntu - for AI. We are completely on the cloud. Every user gets a full Ubuntu Cloud VM for his NonBioS AI Engineer to work on.

We covered this a fair bit on our blogs: - https://www.nonbios.ai/post/why-nonbios-chose-cloud-vms-for-... - https://www.nonbios.ai/post/private-linux-vms-for-every-nonb...

omneity

I guess the "lucky 10000" effect is extremely strong today!

This is like an OS developer who has never heard of Linux.

PufPufPuf

They just provision cloud VMs. That's the level of tech understanding you need to build an "AI startup" nowadays. "Never heard of QEMU, we just use Ubuntu" doesn't really strike confidence.

brap

Congrats on the launch!

I don’t know if this is a problem you’ve faced, but I’m curious: how do LLM tool devs handle authn/authz? Do host apps normally forward a token or something? Is there a standard commonly used? What if the tool needs some permissions to act on the user’s behalf?

alexchantavy

There are companies like https://www.keycard.sh/ taking this on. There are other competitors too but I can't think of them atm

frabonacci

Good question! Specifically around computer-use agents (CUAs), I haven't seen much exploration yet - and I think it’s an area worth exploring for vertical products. For example, how do you securely handshake between a CUA agent and an API-based agent without exposing credentials? If everything stays within a local cluster, it's manageable, but once you start scaling out, authn/authz becomes a real headache.

I'm also working on a blog post that touches on this - particularly in the context of giving agents long-term and episodic memory. Should be out next week!

rahimnathwani

I tried this three times. Twice a few days ago and once just now.

First time: it opened a MacOS VM and started to do stuff, but it got ahead of itself and starting typing things in the wrong place. So now that VM has a Finder window open, with a recent file that's called

  plt.ylabel('Price(USD)').sh
The second and third times, it launched the VM but failed to do anything, showing these errors:

  INFO:cua:VM run response: None
  INFO:cua:Waiting for VM to be ready...
  INFO:cua:Waiting for VM macos-sequoia-cua_latest to be ready (timeout: 600s)...
  INFO:cua:VM status changed to: stopped (after 0.0s)
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  INFO:cua:VM status changed to: running (after 12.4s)
  INFO:cua:VM macos-sequoia-cua_latest got IP address: 192.168.64.2 (after 12.4s)
  INFO:cua:VM is ready with IP: 192.168.64.2
  INFO:cua:Initializing interface for macos at 192.168.64.2
  INFO:cua.interface:Logger set to INFO level
  INFO:cua.interface.macos:Logger set to INFO level
  INFO:cua:Connecting to WebSocket interface...
  INFO:cua.interface.macos:Waiting for Computer API Server to be ready (timeout: 60s)...
  INFO:cua.interface.macos:Attempting WebSocket connection to ws://192.168.64.2:8000/ws
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 10.0s, attempts: 11)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 20.0s, attempts: 21)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 30.0s, attempts: 31)
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 40.0s, attempts: 41)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 50.1s, attempts: 51)
  ERROR:cua.interface.macos:Could not connect to 192.168.64.2 after 60 seconds
  ERROR:cua:Failed to connect to WebSocket interface
  DEBUG:cua:Computer initialization took 76856.09ms
  ERROR:agent.core.agent:Error in agent run method: Could not connect to WebSocket interface at 192.168.64.2:8000/ws: Could not connect to 192.168.64.2 after 
  60 seconds
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
This was using the gradio interface, with the agent loop provider as OMNI and the model as gemma3:4b-it-q4_K_M

These versions:

  cua-agent==0.1.29
  cua-computer==0.1.23
  cua-core==0.1.5
  cua-som==0.1.3
frabonacci

Thanks for trying out c/ua! We still recommend pairing the Omni loop configuration with a more capable VLM, such as Qwen2.5-VL 32B, or using a cloud LLM provider like Sonnet 3.7 or OpenAI GPT-4.1. While we believe that in the coming months we'll see better-performing quantized models that require less memory for local inference, truth is we're not quite there yet.

Stay tuned - we're also releasing support for UI-Tars-1.5 7B this week! It offers excellent speed and accuracy, and best of all, it doesn't require bounding box detection (Omni) since it's a pixel-native model.

rahimnathwani

Thanks. I'll try that, but right now it's not working at all, i.e. cua can't interact with the VM at all. That's a not a model issue.

frabonacci

If you're running Cua from VS Code or Cursor, have you checked out this issue? https://github.com/trycua/cua/issues/61

Feel free to ping me on Discord (I'm francesco there) - happy to hop on a quick call to help debug: https://discord.com/invite/mVnXXpdE85

jeol_wa

Amazing, I was thinking of implementing something like this after taking a course on Building Code Agents with Smolagents from Deeplearning.ai

I wanted to look at a Docker alternative to e2b

frabonacci

Thank you! If you're looking for a Docker alternative to something like e2b, we're planning to ship a containerized version of c/ua that also handles VNC and model hosting. Right now we're using the Lume CLI (https://github.com/trycua/cua/tree/main/libs/lume) with an API server on the host as a lightweight alternative, but the Docker setup will make it easier to self-host and extend. Would love to hear what kind of workloads or use cases you had in mind!

gavinbains

Legendary. This is going to be very helpful, and the TAM is getting bigger. Thank you guys for this, and for all the learnings in-batch -- I'm excited for the future!

I reckon I could run this for buying fashion drops, is this a use case y'all have seen?

frabonacci

Appreciate that a lot! Yep - buying fashion drops, limited releases, ticketing, etc. are all great fits. Cua can also bypass CryptoJS-based encryption and other anti-bot measures, so it plays nicely with modern web apps out of the box.

otterley

Can someone ELI5 what problem is being solved here?

frabonacci

Cua's basically a virtual Mac/Linux box that any LLM can drive, move the mouse, click buttons, type stuff. So it can use any desktop app like human would do, even if there’s no API

otterley

Is there a big market for this? What are the envisioned use cases?

SpaceL10n

Not sure about market size, but we're evaluating computer use agents for public kiosks. Lots of local government authorities are deploying kiosks to improve access to services. Housing authorities, police departments, courthouses, etc. In most cases, this means running some preexisting govt website inside of a full screen webview application (electron,nwjs,etc...)

Agents seems exciting to us because have you ever tried getting an 80 year old man to figure out how to pay his town taxes online? Or how to register for some obscure permit?

We hope agents will be able to guide these users to some degree. So many users struggle with basic information and interfaces.

Picture this:

User walks up to kiosk. Wants to pay property tax bill. They have to study the kiosk/website homepage, sift through dozens or hundreds of options/menus/pages (or go through "wizards") to get to the right page for their issue. Then they have to figure out how to use that page!

These kiosks/websites usually support many functions, not just paying property tax.

So the user gets frustrated and says, "I just want to pay my property tax."

Enter the agent.

Anything that "improves access to public services" is what our customers are paying for. And we def see this as a viable option.

[deleted]
badmonster

Congrats on the launch! love this idea. How does the LLM interact with the VM—screen+metadata as JSON, or higher-level planning?

frabonacci

Thanks, really appreciate it!

The LLM interacts with the VM through a structured virtual computer interface (cua-computer and cua-agent). It’s a high-level abstraction that lets the agent act (e.g., “open Terminal”, “type a command”, “focus an app”) and observe (e.g., current window, file system, OCR of the screen, active processes) in a way that feels a lot more like using a real computer than parsing raw data.

So under the hood, yes, screen+metadata are used (especially with the Omni loop and visual grounding), but what the model sees is a clean interface designed for agentic workflows - closer to how a human would think about using a computer.

If you're curious, the agent loops (OpenAI, Anthropic, Omni, UI-Tars) offer different ways of reasoning and grounding actions, depending on whether you're using cloud or local models.

https://github.com/trycua/cua/tree/main/libs/agent#agent-loo...

baritone

First off- this is great, and I think there are use-cases for this. Being able to even partially isolate could be helpful.

Second, as a user, you’d want to handle the case where some or all of these have been fully compromised. Surreptitiously, super-intelligently, and partially or fully autonomously, one container or many may have access to otherwise isolated networks within homes, corporate networks, or some device in a high security area with access to a nuclear weapons, biological weapons, the electrical grid, our water supply, our food supplies, manufacturing, or even some other key vulnerability we’ve discounted, like a toy.

While providing more isolation is good, there is no amount of caution that can prevent calamity when you give everyone a Pandora’s box. It’s like giving someone a bulletproof jacket to protect them from fox tapeworm cancer or hyper-intelligent, time-traveling, timespace-manipulating super-Ebola.

That said, it’s the world we live in now, where we’re in a race to our demise. So, thanks for the bulletproof jacket.

dhruv3006

One-shot VM would be nice. ephemeral VM spins up, agent runs task, VM is deleted —perfect for CI pipelines.

frabonacci

100% - ephemeral VMs are on the roadmap. Perfect for CI: spin up, run the agent, nuke it

jeol_wa

perfect

orliesaurus

bravi! the future is the Agent OS - How robust is the UI element detection and interaction across different apps and inside navigating complex menus? Is it resistant to UI changes? That's often where these automations get brittle.

thank you e forza Cua

frabonacci

UI detection’s a big focus - we use visual grounding + structured observations (like icons, OCR, app metadata, window state), so the agent can reason more like a user would. It’s surprisingly robust even with layout shifts or new themes

winwang

Congrats! How do you guys deal with SOC2/HIPAA/etc.? Or are those separate concerns?

frabonacci

Thanks! Great question - those are definitely relevant, but they depend a lot on the deployment model. Since CUAs often run locally or in controlled environments (e.g. a user’s own VM or cluster), we can sidestep a lot of traditional SOC2/HIPAA concerns around centralized data handling. That said, if you're running agents across org boundaries or processing sensitive data via cloud APIs, then yeah - those frameworks absolutely come into play.

We're designing with that in mind: think fine-grained permissioning, auditability, and minimizing surface area. But it’s still early, and a lot of it depends on how teams end up using CUAs in practice.

tomatohs

Would love to use this for TestDriver, but needs to support Windows :*(

frabonacci

Windows host support is on our roadmap - we're currently exploring virtualization options with KVM/QEMU. Please join the discussion on our Discord: https://discord.com/invite/mVnXXpdE85

brene

will this also be available as a hosted service? Or do you have instructions on how to manage a fleet of these manually while you're building the orchestration workflows?

frabonacci

Yes, we’re currently running pilots with select customers for a hosted service of Cua supporting macOS and Windows cloud instances. Feel free to reach out with your use case at founders@trycua.com

sagarpatil

Love your accent!

frabonacci

Thank you!!

taikon

How's it different from e2b computer use?

frabonacci

We’re still figuring things out in public, but a few key differences:

- Open-source from the start. Cua’s built under an MIT license with the goal of making Computer-Use agents easy and accessible to build. Cua's Lume CLI was our first step - we needed fast, reproducible VMs with near-native performance to even make this possible.

- Native macOS support. As far as we know, we’re the only ones offering macOS VMs out of the box, built specifically for Computer-Use workflows. And you can control them with a PyAutoGUI-compatible SDK (cua-computer) - so things like click, type, scroll just work, without needing to deal with any inter-process communication.

- Not just the computer/sandbox, but the agent too. We’re also shipping an Agent SDK (cua-agent) that helps you build and run these workflows without having to stitch everything together yourself. It works out of the box with OpenAI and Anthropic models, UI-Tars, and basically any VLM if you’re using the OmniParser agent loop.

- Not limited to Linux. The hosted version we’re working on won’t be Linux-only - we’re going to support macOS and Windows too.

orliesaurus

Active development of CUA, according to GitHub

xdotli

THIS IS FIRE been wanting this for ages

frabonacci

Thank you for your support!

throw03172019

This is precisely what I am looking for but for Windows. We need to automate some Windows native apps.

In the meantime, I’ll give this a shot on macOS tonight. Congrats!

frabonacci

Yes - pig.dev is a great product! You should definitely check it out.

Also, let us know on Discord once you’ve tried out c/ua locally on macOS: https://discord.com/invite/mVnXXpdE85

shykes

Check out pig: https://pig.dev

(I am not affiliated)

throw03172019

I do recall looking at it before but was concerned about HIPAA if they are storing data on their servers as well.

Also, is the project still active? No commits for 2 months is odd for a YC startup in current batch :)

farazmsiddiqi

i love this — isolation and permissioning for computer use agents. why can’t i use regular docker containers to deploy my computer use agent?

frabonacci

Glad you love it! Right now, we’re relying more on the Lume CLI and its API server rather than a full Docker setup. However, we’ll soon be shipping a Docker interface that’ll handle VNC and model hosting (through docker model runner). Stay tuned for that!

gitroom

man this is insane - being able to spin up secure agent vms this easy would save me so much pain lmao

frabonacci

Thanks! I'd love to hear more about your use case!

contr-error

This is amazing, especially if it helps facilitate astroturfing, such as these comments made by fresh users, all with AI-generated responses from frabonacci:

https://news.ycombinator.com/threads?id=SkylerJi

https://news.ycombinator.com/threads?id=zwenbo

https://news.ycombinator.com/threads?id=ekarabeg

https://news.ycombinator.com/threads?id=jameskuj

dang

It's not intentional - it's YC founders excitedly telling their friends (and especially their YC batchmates) that they launched on HN. They didn't ask anyone to vote or comment, and the responses were not AI-generated. (That last point should be obvious btw; no one needs AI to write "Thank you - we appreciate it", and frabonacci was obviously just being polite.)

Here's what you guys need to understand:

(1) Not everyone spends hours on Hacker News—many casual users have no idea about the culture of this place re voting rings, booster comments, and so on.

(2) Many people enjoy congratulating their friends when they reach a major milestone.

(3) Other sites have a culture where this kind of thing is fine.

HN is different, of course, and we tell founders to stop this from happening. In fact, I basically yell it at them in the Launch HN guide: https://news.ycombinator.com/yli.html#noboost. I also yell it at them in person every chance I get—I do my best to scare them! But if you think that including something in a list of rules plus repeating it over and over in person is sufficient to get a message across, may I introduce you to the Measure Zero Effect: no matter how often you repeat something, the set of users who receive the message has measure zero (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...)

As it happens, I saw those comments in the thread (mostly the same ones you listed), marked them offtopic, and emailed the founders as soon as I could:

"Btw, did you send a message to batchmates/friends about this thread? I"m seeing a lot of booster comments in there now. This is not good for you! (See https://news.ycombinator.com/yli.html.)

Fortunately though, there are a lot of organic comments as well so I can just move the booster ones lower down and they shouldn't harm anything. Still, if you have a way to tell your friends not to do that, it would be good. Send them to https://news.ycombinator.com/yli.html as well, if you like :) - the text about that is repeated and in a bold font for a reason!"

They replied that their Discord was probably spreading word of the launch and they'd add a message asking people to stop. After that, it mostly stopped.

contr-error

Thanks for taking the time to spell this out and sorry for making it necessary. As it happens, I thought the product looked cool, so a belated congratulations to the founders, and an apology to frabonacci! I'm glad there was a benign explanation for what I was seeing, and I'm a little disappointed in my lack of imagination. Dead internet may be coming, but it's not here yet ;-)

dang

No need to apologize but thanks for the kind reply! It feels good to read something like this and end up on more or less the same wavelength. Doesn't happen often enough!

TehCorwiz

The dead internet is real.

Seriously though, this kind of behavior should be considered a violation of the social contract.

3s

this is really cool! congrats on the launch

frabonacci

Thank you - we appreciate your support!

zfiber

[dead]

ekarabeg

Congrats on the launch! Awesome product!

frabonacci

Thanks — we really appreciate your support!

jameskuj

A superfan of this product!

frabonacci

Thank you - your support means a lot to us!

mountainriver

This is cool! We built a similar thing with AgentDesk https://github.com/agentsea/agentdesk

Would love to chat sometime!

frabonacci

I love AgentDesk’s take on Kubernetes - it’s something we had considered as well, but it didn’t make much sense for macOS since you can only spin up two macOS VMs at a time due to Apple’s licensing restrictions.

Feel free to join our Discord so we can chat more: https://discord.com/invite/mVnXXpdE85

[deleted]
mountainriver

Thats a fantastic way to get your IP banned :)

abshkbh

https://github.com/abshkbh/arrakis Also building in this space using MicroVMs. Currently working on a Mac port. Would love to connect - abshkbh AT gmail.com

reindent

That's great.

Also built something on top of Browser Use (Nanobrowser) and Docker.

https://github.com/reindent/nanomachine

Just finished planning and shell capabilities

Lets chat @reindentai (X)

frabonacci

Sure - just followed you back!

zwenbo

Amazing product! Congrats on the launch!

frabonacci

Thank you so much - we truly appreciate your support!

swanYC

Love this !

frabonacci

Thank you - we appreciate it!

SkylerJi

This is insane y'all

frabonacci

Thank you - we appreciate it!

Made by @calebRussel