will have a hosted platform soon with GPU support (vulkan)
apitman•Jun 26, 2026
Can confirm smolvm accelerated Vulkan compute worked great in my tests. Excellent project.
patabyte•Jun 26, 2026
This seems roughly similar to Google's Cloud Run gen2 instance types. My understanding is with the second generation, they are running microvms which are bootstrapped from a container image.
metadat•Jun 26, 2026
How does this compare to E2B?
zobeirhamid•Jun 26, 2026
e2b supports UDP and the pricing structure is different.
ushakov•Jun 26, 2026
i’d say what AWS released looks closer to a bare compute primitive. E2B is up the stack and ships everything around VM like snapshots, networking, integrations.
also, there’s no lock-in, E2B is open-source and can be hosted on any cloud (AWS included).
plus supports bigger boxes, higher concurrency, longer timeouts (24hr).
disclaimer: i work at E2B
mdeeks•Jun 26, 2026
> MicroVMs support up to 8 hours of total runtime
Does this mean you effectively can't use them as long-lived developer environments? It sounds like even if you suspend them, this is the hard limit on the total time it can run.
amw-zero•Jun 26, 2026
You can use them for dev environments.
You just have to finish development in 8 hours.
lab14•Jun 26, 2026
I'm assuming you can launch them again after 8 hours.
mmastrac•Jun 26, 2026
They are long-lived if you're a mayfly.
But I think the point is that they should be cheap to set up, and because of the short life, never really contain anything except the potential to compute when needed, not important data.
8note•Jun 26, 2026
lambdas are ephemeral on compute, but couldn't you connect up EFS for your long lived data?
then when you launch the next one, its like you are still there?
mdeeks•Jun 26, 2026
EFS is extremely slow for many workloads. We tried it for builds and various other common use cases for coding agents and the performance just isn't there. I'm guessing lots of small random reads/writes just isn't going to ever work well.
topspin•Jun 26, 2026
It just a time limit of the life of a single MicroVM.
Using this for a long lived "developer environment" would be extraordinarily expensive anyhow. Scaling the vCPU + RAM cost of these to the same shape compute optimized Graviton On-Demand EC2 instance (16 vCPU x 32 GB RAM) shows about 4x the cost.
So don't do that. Just use an EC2 instance.
tomComb•Jun 26, 2026
But these have near instant suspended/resume, and they even have vertical scaling of the ram, which is a great feature that’s not very common.
alFReD-NSH•Jun 26, 2026
In theory, you could set up a process to move data/filesystem between sessions into and out of s3.
fizx•Jun 26, 2026
Firecracker doesn't reclaim RAM well, so they put this limit in to make sure you don't suffer from the eventual memory bloat problems.
yiyingzhang•Jun 26, 2026
How's this different from Firecracker?
tptacek•Jun 26, 2026
Presumably it is Firecracker. It's just a different shape of offering, along with Lambda and Fargate, which are also Firecracker.
tekla•Jun 26, 2026
The literal first paragraph has a highlighted link that says this runs on Firecracker
simonw•Jun 26, 2026
It's a product that runs on top of Firecracker.
robmccoll•Jun 26, 2026
What does the actual startup latency look like? Does it depend on the size of the resulting image?
simonw•Jun 26, 2026
I tried this a few days ago. Once you have an image built and ready startup time is fast, but building that original image took 5-10 minutes.
I think it's designed for building an image once and then reusing it many, many times.
It's about time AWS got into the agent sandbox game.
The startups in this space right now don't provide much value on top of the cloud providers they're wrapping. They don't tend to be run by experienced infra people either so they seem very vibecoded, insecure, janky, etc. They're also significantly overpriced because they're marking up already expensive providers.
Something surprising from my own experience is that while there's certainly a huge role for async agents in cloud sandboxes, async agents running locally seem more useful in many cases.
colesantiago•Jun 26, 2026
Agreed.
Most of the startups are just wrappers around AWS and significantly more expensive.
Agents need sandboxes that are cheaper so that they can run thousands
I feel that AWS, GCP and all the other cloud providers can provide this natively.
But still it would be nice to self host.
The best part of self hosting is that you own it as well, no rug pulls from the laundry list of reselling providers that could go away at anytime.
It would be nice to have a one click sandbox agent on a self hosted instance that is, free, fast (can pay a bit more for more intensive operations) and that is open source.
tastyeffectco•Jun 27, 2026
There are plenty of OSS solutions available for your needs. Do you need real isolation, or is Docker hardening sufficient? If hardening is suddicient check out https://github.com/tastyeffectco/sandboxd/ which i'm using internaly for so many use cases
To be fair to jacobgold, at this point there is more or less an AWS services announcement singularity: if you didn't see the announcement when it happened you may never catch up or even find it in the wretched console website.
Though I did know about this one! (Because I saw the announcement.)
jacobgold•Jun 26, 2026
It just seems pretty different to me? I've lots of similar stuff and yet I still don't understand what it's for and how it works after scanning the docs quickly.
thundergolfer•Jun 26, 2026
Major Sandbox providers (e.g. Modal) run on non-hyperscaler bare metal not AWS and so don't need to markup on AWS's markup. Thus, prices are comparable or better than AWS.
jacobgold•Jun 26, 2026
In that case it's still overpriced because they're charging hyperscaler prices without offering a hyperscaler level service in terms of scalability, reliability, security, trust, etc.
ilaksh•Jun 26, 2026
What's the best provider to self-host Firecracker? I feel that AWS is not a safe or cost-effective option for a self-funded startup or small business. Although is anything cost effective anymore? Hetzner just had a massive price hike.
Part of it might just be that I am old and inflation is catching up with my understanding of prices.
But as far as AWS I still have to say no thanks. Imagine some group actually started using my hosted AI agent service for something compute and network intensive. It could turn into $2000 overnight and if I didn't account for one of the numerous types of AWS charges, I might have only collected $500 for credits purchases.
Or it could easily be ten times that. But who am I kidding. No one is going to use my agents. So it doesn't matter if it's gvisor or Firecracker or whatever.
CuriouslyC•Jun 26, 2026
Cloudflare is cost effective for certain types of workloads, I've heard of businesses getting surprisingly far on the $5/mo worker plan.
Multicomp•Jun 26, 2026
At my day job, workers and sqlite-backed durable objects that quickly hibernate and quickly resume are quite nice, I prefer that to standard lambda.
Multicomp•Jun 26, 2026
This reminds me of Fly.io's model off the top of my head, though its not a self-hosted firecracker as such.
ilaksh•Jun 26, 2026
I specifically complained to a fly.io staff on here about their "gotcha, b*tch" usage based pricing which they basically copied from AWS, and they stood by it and other people here backed them up. No one is giving me a pile of free money, so I can't risk that kind of thing.
tptacek•Jun 26, 2026
Exactly what did we copy from AWS here? You could get a long way in our decisionmaking process generally by just consciously avoiding what AWS does.
ilaksh•Jun 27, 2026
The short version is it seems like a big "gotcha" that there is no way to limit bandwidth or spending on that or other resources ahead of time, and that might be a deliberate business model that is more aimed at well-funded startups or large companies that are monitoring costs much less closely than an individual or small business.
It's not necessarily too hard to just not dynamically spawn a bunch of machines, but the bandwidth one is going to sneak up on people.
tptacek•Jun 27, 2026
Lots of people want limits. They might make sense for something like Sprites, where the end-users are often (but not always) individual developers. They're terrible for hosting fixed-function applications. The real gotcha is having limits, because that's the host effectively taking your app down for you.
I know talk is cheap, but I've been in the room for every one of these discussions over the last 6 years at Fly.io, and if we could have come up with a system to make limits workable, we would have done it. Charging for stuff you don't want is bad business, and we make our money from happy, growing customers (the open secret of hosting is that a huge chunk of usage is basically a loss leader search for a much smaller number of ultra-profitable customers).
These pricing models --- at least outside of AWS (I'm not cynical about them but their incentives are different from indies) --- are not meant to fuck you.
ilaksh•Jun 27, 2026
Why is it that you claim limits are unworkable? If you can track or enforce it (others have been for years) then couldn't you make it an optional field or checkbox?
tptacek•Jun 27, 2026
Every limit is a commanded outage. We can refund an unexpectedly high bill. We can't refund downtime.
tough•Jun 27, 2026
You’re optimizing for one class of customer by denying another class a choice they explicitly want.
tptacek•Jun 27, 2026
That's true, in the sense that it is true of every coherent business. What would not be a true statement is that we somehow profit from the discomfort of the "disfavored" cohort of customers here. If we could serve them reasonably without making terrible compromises for everybody else, we would do so, and we would profit from that.
ilaksh•Jun 27, 2026
So write "WARNING: your service will go down if you exceed this limit. We cannot provide refunds for this. To avoid outages in the event of unanticipated traffic, do not enter a spend limit."
tptacek•Jun 27, 2026
A learning from the past 6 years: warnings do absolutely nothing. If your answer to a problem is a warning label, you have no answer for the problem.
vidarh•Jun 26, 2026
Hetzner is still cheap compared to AWS.
magnio•Jun 26, 2026
Yeah, the big 3 cloud markup is so high that most VPS providers can hike price 10x and they are still cheaper.
Even on older system types, you can provision .metal sizes and run anything on them.
dbmikus•Jun 26, 2026
Why do you want to self-host vs. using one of the many providers out there?
Daytona, E2B, OpenComputer, Freestyle, Blaxel, Vercel, Modal, Cloudflare, Tensorlake, Superserve, etc. etc.
Some of them work by pre-purchasing credits, so you can control the blast radius of spend.
Also, if you want a more embedded sandbox runtime as a library instead of a daemon + REST API, you can check out libkrun (and friendly layers on top of it like https://microsandbox.dev/ and https://smolmachines.com/)
khurs•Jun 26, 2026
self host = better spec machine for same price.
rvz•Jun 26, 2026
Even with the Hetzner price increase, it is still far cheaper than all of them with self-hosting.
alexellisuk•Jun 26, 2026
For self-hosting, have a look at what we're building with SlicerVM.com (disclosure: I'm the founder). Also runs just as well on Apple Silicon.
We run quite a few Slicer instances on mini PCs and Ryzen builds - also on Hetzner (and yes ouch 120 EUR / mo up to ~ 550 EUR / mo for 16core / 128GB RAM feels almost unfair)
ilaksh•Jun 26, 2026
Interesting. How does this compare to Firecracker? Also PhoenixNap looks really interesting. Do you happen to know if Linux software compatibility holds up on Ampere? 80 cores for $400 a month seems pretty good.
nyrikki•Jun 26, 2026
Are you looking for highly ephemeral nodes, where you are writing automation that will use the API to orchestrate it? Or do you just want small microVMs that you launch and kill?
Firecracker just has a ReSTful unix socket with a defined API and launches KVM vms with limited options.
For custom SMB I still think libvirt is a lower entry cost and may have transferable use cases to longer lived VMs, so you can just launch a qemu microvm[0] and use virsh and/or libvirt xml to set up the networking.
The ~400ms boot time of a qemu microvm vs ~120ms for firecracker may not be an issue for some loads, but qemu will also allow you a bit more density of placement than firecracker. qemu microvms will use a bit more memory individually, but they will also tend to use less real system memory with a larger number of microVMs.
It is all tradeoffs, and kata containers are yet another option that may apply depending on your use case.
You can run your own firecracker or qemu/kvm microvms on most instances that allow nested hypervisors, or on a local host. If cost containment is critical to you this is one possible way forward.
Really it just depends on if you want/need ReSTful control, or need to support short lived serverless functions, or if CLIs fit better and you many want to support full VMs.
They both are just Virtual Machine Monitors that targeted different use cases and decided on different tradeoffs.
Just be careful about hosting traditional containers and microVMs on the same system, that config is going to be problematic do to fundamental reasons that are too complex to properly address here.
Thanks. I just looked into qemu microvms. Might be an option but I already have gvisor set up.
coppsilgold•Jun 26, 2026
The simplest worthwhile DIY sandbox you can have is to layer two tools: bwrap and gvisor.
bwrap args -- gvisor args do args -- /path/sandboxee args
bwrap will set up the environment and then gvisor elevates it into a true sandbox.
Standalone gvisor (not the 'do' subcommand) used to be a mess with the OCI json requirement, but recently they began work on presenting their own bwrap interface (likely to pursue AI agent uses) though I wouldn't use it myself yet.
People often look down on gvisor because they think it's some kind of syscall filter, it is not. It can use one of ptrace, seccomp or even KVM to intercept ALL syscalls and service them with it's own logic (which is in Go). Basically it's a VMM and kernel in one.
eperot•Jun 26, 2026
Any reason why you wouldn't use gVisor's bwrap interface yet? We're working on it precisely to make DIY sandboxing on Linux as easy as possible in order to get Linux-sandboxing-at-home to mature beyond the current syscall-filter-and-namespaces duct tape stage, so I'm curious to know what you'd like to see.
coppsilgold•Jun 26, 2026
It just didn't seem fully baked yet, the 'do' subcommand works fine while the 'bwrap' alias has this problem: `bash: cannot set terminal process group (1): Not a tty`. When executing 'bash -li'. Also the EROFS feature of 'do' should probably be included in 'bwrap', it can be useful. Include overlay options.
Also some things you can do to make gvisor better are Wayland passthrough, vulkan support (or virtio native context). Being able to get gvisor to populate a network interface inside itself through a 'passt' (or 'containers/gvisor-tap-vsock') socket on the host would also be ergonomic. All of those are available on 'muvm' (based on libkrun) which if you have the time to set up is the next step in DIY sandboxing of graphical apps as well. See: <https://git.clan.lol/clan/munix>
eperot•Jun 26, 2026
Thanks. We're working on rootless network setup to make `runsc do --rootless` work with networking enabled when `passt` is installed right now. See issue #13337 (yes that's a cool issue number) which should unblock this.
The tty issue is known, should be fixed soon too, though contributions welcome as it sounds like it should be simple fix and we love more contributions :)
FWIW, X11 apps work well, I have a personal hacky project in which I've been running Librewolf in gVisor, with the window being reflected as a native Wayland window. It uses `Xvfb -fbdir` aimed at a bound tmpfs mount to get a shared memory region containing the window's pixel data which can be read directly from out of the sandbox, has Pulseaudio audio passthrough, and a socket server passing through mouse/keyboard events to make the window interactive. Works smoothly even for YouTube playback, and I successfully played a game of Unreal Tournament 2004 at 24fps in it, with no noticeable mouse/keyboard latency :)
We're basically making baby steps to get there less hackily.
Thanks for the feedback!
coppsilgold•Jun 26, 2026
That's good to hear! Hopefully the passt approach you are pursuing will include the ability to use an existing passt socket and not just launch one for you.
Wayland is tricky because there are memory buffers being shared between the compositor and the client. crosvm (also by google) adopted 2 custom solutions to it of which one got merged into mainline.
Achieving audio passthrough is trivial as it's just a unix socket. `-host-uds=all`
eperot•Jun 27, 2026
That's the approach I initially took, but experienced some combination of noticeable stuttering and latency regardless of which buffering strategy I tried... Had to switch to a shared memory ring buffer, along with some adaptive playback speed shenanigans (sometimes imperceptibly speeding up playback when falling behind production of audio samples, sometimes imperceptibly slowing down when there's less than a few milliseconds' worth of samples left in the ringbuffer), in order to achieve actually-gapless playback.
Flawless playback. I think it's a default pipewire configuration.
rendaw•Jun 27, 2026
Why do you need gvisor and not just bwrap? Is this a "more is better" thing?
coppsilgold•Jun 27, 2026
Properly configured (including strict seccomp) bwrap on its own will be sufficient 99% of the time. But ultimately you are at the mercy of the enormous kernel attack surface and the 0days that result from it.
If you do anything valuable and are compromised it may get brought to the attention of whoever organized the automated attack (ex. AI agent doing interesting proprietary work that installed something it shouldn't have, chat logs got uploaded and analyzed) and they will then sell you to someone with the 0days to extract more value from you. Assuming you didn't screw up and leave a back door open somewhere of course.
crabmusket•Jun 27, 2026
Not Firecracker, but Incus's system containers seem like a good middle ground between Docker and VMs.
I just went with qemu and run it in my own machine. It is portable so you run it on other OSes which is handy when everything is under the same desktop app. But I was after better isolation and the ability to be fully in control of the agent environment to pair with local llms. As soon as you lift it to some managed environment it becomes hard to justify all of the necessary steps to manage connections, encryption etc., eg passing credentials for access to other resources.
colesantiago•Jun 26, 2026
How does this compare to Fly.io
Which is more cheaper for me?
Ideally maybe self hosting would be better?
simonw•Jun 26, 2026
Fly.io doesn't set a maximum of 8 hours of alive time on your instance.
Also, MicroVMs can't be exposed directly to the web. Your code running in them can only be executed via API calls with attached auth tokens - so if you wanted to host a public facing API or website with them you'd need to implement your own additional layer in front.
Something I appreciate about Fly (disclaimer: they support my work) is that the pricing is fixed - you pay $1.94/month (less if you suspend your machine) for the smallest instance, up to $976.25/month for the largest (16 CPUs, 128GB) plus predictable costs for volume storage.
The only variable outside your control is bandwidth, and that's unlikely to cause a nasty shock.
Contrast with any of the more "elastic" hosting providers - Vercel, Cloud Run - and you're much less likely to get a horrifying bill if something gets overly-crawled or goes viral.
anamexis•Jun 26, 2026
Fly.io's Sprites [1] do offer public web access as an option. They also have dynamic pricing.
To a first approximation everything in this space has dynamic pricing. If it's not priced dynamically, you're presumably paying a premium either on a commit or in gym pricing.
anamexis•Jun 26, 2026
I don't know what the right term is, but maybe "deterministic" pricing (this is not the right term, but maybe closer). That is, I'm not going to know how much a sprite cost until I see the bill (or look up the live usage report), whereas if I spin up a Fly Machine, I know exactly how much I'm going to pay per unit of time.
(Both make sense for their respective use cases.)
tptacek•Jun 26, 2026
Ah, that makes sense. Yeah, that's a technical limitation! I'm sure we'll work through it at some point this year, but it's a consequence of the fact that for most people, most of their Sprites are dormant most of the time; it's how you comfortably get to having 20-30 Sprites (making a new one any time you do something new) for every user.
It's a good callout, a genuine difference between Sprites and Fly Machines. Believe it or not, it's intended to make Sprites cheaper than Machines.
anamexis•Jun 27, 2026
I absolutely believe it! And feel the pricing model makes perfect sense for the use case.
A way we simply suck at business: we didn't keep beating the drum about this after we wrote the policy up. We just sort of figured everyone read the blog post and moved on. We probably should have been continuously making noise about it.
What you get from having a company made almost entirely of engineers.
fcarraldo•Jun 26, 2026
Shouldn’t the title be “AWS Lambda MicroVMs”? MicroVMs are an existing concept.
alexellisuk•Jun 26, 2026
Yeah, I'm surprised Justin posted this like it was new(s). Wasn't it doing the rounds on the 22nd when it launched?
justincormack•Jun 26, 2026
I didn't post it 3 hours ago, it must have gone through the magic HN re-up process.
There are sooooo many sandbox providers out there.
They do spike on different features like:
- snapshotting and forking
- good SSH and VPN access for end-users
- agent-friendly features, like obscuring secrets at network layer
Then there's also the option to use libkrun to run local sandboxes on your own computer. That doesn't scratch the itch for hosted services, but works if your goal is to run agents inside isolated environments for your own work.
I've been working on some open-core stuff[1] to coordinate sandboxes, and we're making changes to have a library that lets people coordinate any number of remote or local sandboxes using any provider, kinda like how the Docker CLI works for managing containers, git repos, and coding agents. Flue[2] is another player in this space, and is more of a pure framework, while we're building it as an interactive product for using sandboxed agents and workflows.
Why isn't libkrun good enough for hosted stuff? I use it as a podman backend in a microservice architecture.
dbmikus•Jun 26, 2026
Firecracker has more tooling for the orchestration layer that manages many sandboxes at once. Stuff like K8S integration, an external REST API control plane, more first-class support for snapshotting, etc.
You'd have to build more of that with libkrun
The core tech of both are great though.
kodama-lens•Jun 27, 2026
Firecracker has more tooling, but setting ist up and managing it is also more complicated, at least for k8s workloads. Libkrun is so easy for k8s! Compile crun with Libkrun support, crate a symlink of crun with the name krun, done. Works like any normal pod. Firecracker with kata-containers is a lot more brittle and complicated. I've invested quite some time getting this running for a talk I'm working on
dbmikus•Jun 27, 2026
Is the talk going to be shared online anywhere? Would be interested in checking it out later!
veverkap•Jun 26, 2026
That's super interesting - have you written up anything on this? I'd love to read it.
Then one can just pass `--runtime krun` to most podman subcommands. Alternatively, set the runtime key in the config file to make it the default.
Podman itself has "hardening" techniques, e.g. turning off the network or volumes that can be combined with this.
rvz•Jun 27, 2026
libkrun is not production ready compared to Firecracker which the latter is used in 99.9% of many companies.
sureglymop•Jun 27, 2026
For what exact reason is it not production ready? Or is that the stance of its maintainers?
PeterStuer•Jun 26, 2026
Setting up your own is not that hard and if you bought some compute before the Altman squeeze, very cheap.
dbmikus•Jun 26, 2026
Def!
My personal belief is that the future of an "app" is a combo:
1. micro VM
2. agent on the VM
3. software bundled into the VM
So, it should be stupid simple to run these local sandboxed apps/agents. Right now, not too hard for technical users (esp. with things like https://smolmachines.com/ and https://microsandbox.dev/), but not as easy as clicking an app icon or typing `/path/to/binary` in the CLI
spockz•Jun 26, 2026
Microsandbox claims to start faster than docker, and it is isolated from the host, and to work with OCI. Why would I still want to use docker? The only reason I can imagine is that I actually want to be able to dynamically share resources between containers instead of dividing up VMs a priori.
Starting faster than a container will need evidence since starting a container on Linux is basically instant.
spockz•Jun 27, 2026
It is instant for me when using podman but by no means instant when using docker. Docker on Linux native is stay way faster than on macOS and Windows. But so far running with podman has the lowest overhead I have seen.
mikeocool•Jun 27, 2026
> dynamically share resources
This has been a big pain point me with various VM solutions I’ve tried. Having to allocate say 8GB to a sandbox, and a) having that RAM eaten up when I’m not using it and b) only having 8GB when I am using kinda sucks.
Yes, I could stop the sandboxes when I’m not using them, but that also kinda sucks.
happens•Jun 27, 2026
The performance difference in that post is due to wasm, not the container runtime, which is also stated at the end of the post.
chrisweekly•Jun 26, 2026
I was going to add a comment praising smolmachines' smolvms. Simple, fast (sub-200ms cold start), OCI-compat, and has trivial packing to standalone 0-dep executables. No need for Docker Desktop / colima / orbstack. For those who prioritize security, kernel isolation is a meaningful benefit.
NamlchakKhandro•Jun 27, 2026
No programmable network stack though, so can't pass fake credentials to things inside vm and exchange them on the boundary
indigodaddy•Jun 26, 2026
You basically described exe.dev
dbmikus•Jun 27, 2026
exe.dev is great, but the VMs are not really "apps". They are durable computers / VMs.
An example of a "sandboxed agent app", would be: give the app all your past emails. An agent scans them and finds sales emails you need to follow up on. It shows you the suggested follow ups in a UI, and you approve/reject them. Then, it mass sends the approved emails and emits an update to your CRM with the changes.
The sandbox is deleted when the app runs. It's ephemeral for the lifecycle of the app. And you can re-run the same app repeatedly with new inputs, but it gets the same clean starting slate.
digitaltrees•Jun 27, 2026
I am building this.
tough•Jun 27, 2026
heh I vibe-coded a little local app to have smolmachines and tart, for smolmachines i had to vibe-fork 2 deps deep to get GUI support working, but now i have linux desktop computers on smol machines!
also have support for lima/colima/podman
binsquare•Jun 27, 2026
Would like to see this! Wonder how you got GUI support working because with vulkan support - you effectively enable running games cross platform
tough•Jun 27, 2026
VNC/RFB as the transport, but not just a guest-side x11vnc. I forked the local SmolVM path to start libkrun with display enabled, expose the framebuffer + keyboard/pointer input, then serve that over a loopback passworded RFB endpoint.
Local Machines waits for display_ready and embeds it. It has to be selected at VM start; no hot attach yet.
The interesting bit is the libkrun GPU/framebuffer/input plumbing; VNC is just how I got the pixels into the macOS app. The guest still needs a real graphical workload/compositor, e.g. Weston.
skybrian•Jun 26, 2026
How do you do it?
chickensong•Jun 26, 2026
It probably depends on your use case. I have a nice setup for putting claude code in a sandbox for development, but that's likely quite different from running production workloads for customers at scale.
mzaccari•Jun 27, 2026
+1 for microsandbox. I've been using their golang SDK (https://docs.microsandbox.dev/sdk/go/sandbox) @v0.5.10 to create sandboxes, attach them to agent sessions to execute, and then throw away, all in a raspberry pi 5 k3s cluster (as they have ARM support, if you're into that sort of thing). The microsandbox code is still a bit in flux (since it hasn't reached v1.0 API stability yet), but it's definitely worth checking out as it looks to have a solid foundation.
(edit: ahh sorry, meant to post this to above comment)
raesene9•Jun 27, 2026
Yep I've got one I built and it's absolutely fine for my use cases has a web interface/API custom kernels and rootfs, even the facility to set-up custom Kubernetes clusters. It's been really useful for other work like testing out vulnerabilities or security features in isolated envs.
reinitctxoffset•Jun 26, 2026
What people aren't getting with `firecracker` is utilization. Don't get me wrong, `firecracker` is great software and it's what I'm using for lightweight virtualization, but workloads are really bursty over really short periods of time now, even with the snapshot and restore that you can get if you're willing to hack on `firecracker` substantially, you hit walls where it's like, this is too much against the grain, this thing wasn't designed to bounce from 1 core to 32 to 8 to 16 to 4 to 32 to 1 seamlessly, and that's what it takes to get extreme utilization even with extremely good ML on the prediction.
I am quite sure I'm not the only person working on post-firecracker KVM.
binsquare•Jun 26, 2026
I designed my take to basically eliminate the concept of vm being a rigid box of cpu/memory with CPU oversubscription and virtio-ballooning on memory + sparse ext4.
That way it can be elastic in CPU, memory and somewhat disk.
How far are you on your take?
znpy•Jun 27, 2026
We’re adopting agent-sandbox (https://agent-sandbox.sigs.k8s.io/) as we already run most if our stuff in kubernetes and it’s been working very well, the only downside being it’s a moving target as it’s still essentially in development.
stubbi•Jun 26, 2026
Interesting, I have recently started working on a project which is similar and fully open source, maybe interesting to some here. Happy to receive any kind of feedback on it.
> Didn't mean to highjack for self advertisement.
>
> As the topic matches, .... my project might be appealing to some here
That's exactly what you intended to do. That is the definition of advertising. It is true, many people might like it, so own it. Don't lie about it, even to yourself.
stubbi•Jun 26, 2026
.
wasting_time•Jun 26, 2026
Can you provide a link to your project? Self-plugs are fairly common around here, and usually appreciated (or at least not frowned upon) when it comes with juicy source code.
1. We support more than 32GB disk (as a shareable device, ideal for agentic memory)
2. We provide egress control
3. We provide vault for secret injection (to counter prompt injection)
4. Snapshot / forking.
5. long lived sandboxes.
Everything supported in APIs and CLI for agents.
Can be used via - npx skills add instavm/skills
0xbadcafebee•Jun 26, 2026
> Containers launch in seconds, yet their shared-kernel architecture requires significant custom hardening to safely contain untrusted code
That's literally why they made Fargate. It's managed firecracker VMs with containers. They invented firecracker for this purpose. This new product is competing with Fargate, but they don't mention Fargate at all in the announcement.
> you create a MicroVM Image by supplying a Dockerfile and code packaged as a zip artifact in Amazon S3
>
> MicroVMs support up to 8 hours of total runtime
So you're already using containers with this new thing, same as Fargate! And not only that, it's more limited in runtime than Fargate! The only thing different with this service is stateful file storage, which is actually a problem you later have to engineer around, which is why containers are stateless.
This smells like a competing team building something to capitalize on AI hype, but the product isn't differentiated enough for this to make sense long term. If this was a service called managed AI agents, and you added features specific to AI agents, that has value. But "here's Fargate with a different name" isn't gonna last.
baxter_pad•Jun 26, 2026
Fargate does not use Firecracker, it is simply ec2 instances.
And also, you’ll notice that Fargate takes minutes to launch while Lambda takes a second or less. You’re waiting on AWS to launch a EC2 with your config and pull your containers into it.
(that article matches things I heard from Amazon when I asked why my stuff is slow)
everfrustrated•Jun 26, 2026
Container pulls are slow. Lambda starts fast as it's not unpacking your container to a local disk on every start.
luhn•Jun 26, 2026
I can't tell you why Fargate is so slow, but the reason Lambda containers are so fast is because it doesn't actually load the image. It loads a manifest of the layers and the files in each, and then each file is loaded on-demand from a multi-tier cache. 90% of the image is never loaded, and 90% of the remainder is served from local cache. It's a pretty cool architecture.
binsquare•Jun 27, 2026
Confirming, I worked directly on fargate.
Fargate does not use firecracker. It was used for some internal workloads but was being migrated off at the end of 2025.
my123•Jun 26, 2026
It was true a long time ago for some particular cases, but is not true since quite some years now.
Pretty sure they invented Firecracker for Lambda. Iirc they were previously using a hot pool of EC2 instances behind the scenes with each customer getting their own instances and lambdas sharing capacity on an instance. Firecracker made it possible to spin up VMs in realtime instead of having spare capacity laying around.
That said, Fargate does kind of seem like a superior option
Edit: I guess this supports suspend and fast resume so invocation time should be somewhat better than Fargate.
simonreiff•Jun 26, 2026
I don't think Fargate fits for the use case they are describing. If you're running your own (trusted) code, then of course there's no reason to worry about containment threats. But the threat here is that you have to execute arbitrary, untrusted code that is presumptively malicious. It's a very different scenario and requires considerable measures to safeguard properly. You can't have a Fargate Task that runs multiple containers, one for each user, for instance, or even run multiple Fargate Task instances, one for each user, because you're still having them all share a virtual EC2 host (well technically a pool of EC2 servers but it's one hypervisor and shared virtual kernel, essentially) that would be compromised if any one container escapes. If you need true hypervisor-level host kernel isolation on a per-user basis due to the risk of containment, with guest worker microVM threads, plus the whole thing needs to scale and also needs to pause and restore very quickly and keep track of state upon restoration, it's actually a pretty hard challenge to build on AWS with existing tools. The problem arises with any interactive AI agent environment that scales on a per-user basis, for instance, but it also applies to any scenario in which the user needs to execute untrusted arbitrary code on your infrastructure in a sandbox. Fargate isn't the secure choice in that scenario; you would instead use VPC + EC2 + Firecracker + Docker (plus S3 and many others) and use a lot of orchestration scripting and fiddling with load balancers and the like to try to get everything working and scaling. When you combine it with tracking state and also restoring quickly from a paused or suspended state, I can see reasons why this might be the right choice if you want to implement something with an interactive AI agent that isolates at the per-user or per-session layer from the host kernel and is highly secured against containment escape and other vulnerabilities. I'm curious if anyone has used this for the use case described, maybe from AWS? Is this like the AgentCore orchestration that came out maybe like last year?
fulafel•Jun 27, 2026
Also you've been able to run container images in Lambda for some years now.
Fargate (with Firecracker/Docker) is hellishly complex and forces you into all sorts of AWS BS, first of which is it can't run without some sort of orchestration layer like EKS or ECS.
To deploy on ECS, the simplest option, means that you have to create a private Docker registry, sync base images from docker.io you use, set up IaC, set up a deployer and user identity, create a multiple subnets to allow redundancy, set up health checks, and I didn't even write down the half of it.
If you want a simple enterprise CRUD interal tool, its crazy. And the WORST thing, is that it doesn't have persistent disk, so you either are forced to use slow and expensive EFS or buy into AWS's expensive managed database systems.
And every update to your app goes through a k8s style 'sync image-drain old servers-create new ones-switch over once healthy cycle'. Which has a tendency to fail for mysterious reasons, oh and its undebuggable, and should you notice that an env var is set up incorrectly, you can't just fix it, it means a whole deploy cycle.
I wouldn't wish that stuff on my worst enemy.
In contrast, if you want okay DX, you either go with Lambda, and manage your own EC2s.
Half of AWS's offerings exist to work around the arbitrary limitations they put on their services, as even this thing (as others have correctly noted), comes with this weird 8 hour limit, but even that's far easier to work around (for stuff like running a simple server), than having to deal with the other stuff.
icedchai•Jun 27, 2026
ECS/Fargate is annoyingly complex but you can have Claude or whatever you prefer crap out some terraform to set most of it up in a few minutes. We did that for a recent project and it worked well enough that nobody complained... much. Still, the redeploy / app update scenario required some fiddling.
Still, it does feel overly complicated. Google's "Cloud Run" is way simpler.
(Lambda also has its own DX issues.)
TacticalCoder•Jun 26, 2026
What's the point of microVMs for running agents?
Are you guys literally spinning up agents where a 100 ms boot time vs a 3 seconds boot time makes a difference?
I'm asking because I understand the appeal of micro VMs but every time the subject comes up people talk about "isolating agents": what's wrong about isolating agents in a regular VM (or in a container which, itself, is in a VM)?
FWIW I've got my stuff nicely isolated in regular VMs that are regularly up for hours and hours.
It's like the microVMs boots in 100 ms, then the agent does... What? And exits after another 100ms and now you need to launch another one?
What's the use case of "microVMs to isolate agents"?
0xbadcafebee•Jun 26, 2026
This is for people who want both faster execution, and better security isolation for agents/subagents. It is a different use case than yours
TacticalCoder•Jun 26, 2026
I understand that but micro VMs don't provide better security isolation than regular VMs.
So that leaves faster boot times.
Faster boot times and then the agent does what? And at how many token/s? And what's the "time to first token" anyway?
How do the time to first token and then the token/s inherent limitations of LLMs not totally dominate the running time?
I just don't get the use case.
nok22kon•Jun 26, 2026
imagine installing an agent in slack at a company with 1000 employees, and you want each request to have its own VM for data analysis, downloading repos and working on them, ...
regular VMs just use too much memory, a typical ubuntu uses 512 MB as a baseline
0xbadcafebee•Jun 26, 2026
^ this. a single long session may use 20 subagents, each of which need their own VM, on top of the parent agent's VM, all of which may need separate security credentials, isolation, in addition to the spinup time, and resources used. each user might do 100 sessions a week. so that's 2,000 VMs per week per user. each regular VM takes, let's say, 10s to boot up. that's 5.5 hours per week just waiting for VMs to start (for a single user).
then there's the disk iops used for spinning up all these VMs (loading and booting a whole distro), the security attack vectors of an entire VM vs microVM, the maintenance of the images, the hypervisor abstraction to handle all this automation, ssh for the agent to run in the VM, etc.
compared to mounting an extracted container image to a folder, starting a microVM kernel with folder mount, with specific credentials attached. minimum memory and CPU allocated, minimum possible system resource use, fastest operation, least maintenance. you get more time, more resources, more security.
(micro VMs do provide better security isolation. they have kernels with fewer built-in vulnerabilities, fewer hardware drivers to exploit, a more locked-down network, and they lack a full OS's applications and filesystem permissions to exploit)
sublimefire•Jun 27, 2026
This example is a bit over the top and is more of an edge case, subagents of the same session can use the same VM because what is the point to isolate among them? If at least one subagent is trying to hack you then I would consider the whole session was compromised anyway as you cannot guarantee the agents leaking this among themselves.
victorbjorklund•Jun 26, 2026
I imagine you can have a situation where you let an agent run in a shared env but to access certain tools you spin up a VM just for the tool call duration and then shut it down again. Let’s say you wanna allow the agent to write and run code then you need it to run it somewhere safe
vmg12•Jun 26, 2026
Microvms are better for the VM provider. They use less memory and have a smaller attack surface. Also starting in 100ms means you don't need to add a bunch of async machinery when launching the vms.
tastyeffectco•Jun 27, 2026
in so many cases, docker is more than sufficient for major agent workloads... with no hostile users of course
sonink•Jun 27, 2026
I dont get it either - I was going to ask the same question but found this.
We have been doing the exact opposite - instead of micro VM's we are giving agents larger VMs.
Previously we were giving them 1GB RAM VM's - now we have upped to 4 GB RAM VM's. When the agent is working - the real cost is in the inference. There is no reason to keep the agent waiting because your VM is too damn slow. So we moved to larger and faster VMs.
The agent might install a package, or run a script - and now it moves along just faster. Not to mention that if the agent is installing a 'fat' SDK, like maybe android sdk, a thicker RAM just moves along everything smoothly without breakages. The incremental amount we pay for the bigger VM is more than justified by the increase in agent performance.
And all the tooling that has already been built up for standard human operated VM's just works pretty well out of the box. We are able to spin up VM's pretty much on demand and purge them clean once the work is done.
We are moving to 8 GB RAMs/4CPUs sometime this year, and GPU's hopefully sometime next.
alasano•Jun 26, 2026
We have this page which compares a whole bunch of sandbox providers in different categories
Will add MicroVMs there today (and any others that are missing if you let me know!)
emirb•Jun 26, 2026
Do you mind adding https://isorun.ai? We just launched last week. Founder here (Staff SRE with 20 years in Linux, fastest and cheapest SaaS agentic runtime running on heavily modified Firecracker)
alasano•Jun 26, 2026
Sure thing I'll add it!
tastyeffectco•Jun 27, 2026
Would love to see sandboxd added — it's the self-hosted open-source option with Docker hardening, built-in coding agents (Claude Code, OpenCode), and live preview URLs. https://github.com/tastyeffectco/sandboxd
alasano•Jun 27, 2026
I'll add it! I like your website by the way, something satisfying about the design.
lysecret•Jun 26, 2026
I don’t get it we are paying at least hundreds or maybe thousands per month on ai costs. Just get a regular vm ?
mjb•Jun 26, 2026
You absolutely can run agents on a regular VM. But if you want to build multi-tenant and multi-agent systems with strong security boundaries, then having a VM or MicroVM per agent session (or session with a group of agents) really simplifies things.
When we did AWS AgentCore Runtime last year we introduced session isolation, with MicroVMs per session. You can think of Lambda MicroVMs as the same stack, but generalized to fit a larger number of application patterns.
retinaros•Jun 26, 2026
why use agentcore runtime then
victorbjorklund•Jun 26, 2026
Isn’t the point that you wanna be able to spin up and down thousands of VM:s on demand (literally a VM just to run a tool and then shut it down until the next tool call)
skybrian•Jun 26, 2026
You don’t have to pay that much. I did pay a couple hundred for a while, but not since I switched to Chinese models along with a $20 ChatGPT subscription.
Also, a single VM is pretty limiting.
crawshaw•Jun 26, 2026
For those looking to run agents: the short lifecycle of the typical “sandbox” seems surprisingly limiting to me. I have no actual workflow where I want one of these products. Sometimes a VM can live for 30 minutes, but it also might need to live for a month, and I don’t know beforehand.
This is why I have been avoiding the word sandbox for exe.dev. I don’t think developers agents need something “sandbox” shaped.
messh•Jun 26, 2026
Checkout https://shellbox.dev for exactly thisnusecase: boxes can be stopped, they are snapshotted to disk then cost just $0.5/month. They wakeup with the same state (memory and processes too) on ssh connections, or web endpoint activity, or just just a cron schedule. When you dont need the box... Just delete it and stop paying. No subscription, managed via ssh
I'm a relatively basic claude code user, basically just running a few instances in different terminal tabs and monitoring them pretty closely, but I could definitely see value in being able to dump a bunch of code and tools into a workspace where there's no credentials present and just set an agent some goals to research or try a bunch of things in a mostly unsupervised manner.
crawshaw•Jun 27, 2026
I think it’s worth trying. There’s a lot of value in having the agent in the box. You can give it root so it can do something like tcpdump unsupervised. And if you happen to build a new server, you can keep it serving indefinitely. That’s the whole motivation behind exe.dev.
sublimefire•Jun 27, 2026
Yeah I have some stuff which is supposed to be “there” for months with the agents continually moving it forward. Not to mention the need to run different software. Running local VMs for now.
crawshaw•Jun 27, 2026
If you have a good local VM flow that’s great. I couldn’t make it work for me. I ended up needing it to run when my laptop was shut, both as an agent and the servers I am building.
It’s a real tension, working with a remote dev env has never been my first choice. But agents seem to tip the balance enough in favor of remote that I have switched.
skybrian•Jun 26, 2026
Does anyone understand the pricing? The pricing page says “Lambda MicroVMs are priced per instance-second” but MicroVM’s aren’t otherwise mentioned.
Thanks! These tabs render badly on mobile, but you can click on “Functions” to hide it and then click the “MicroVMs” tab to show it.
This pricing model looks very complicated and unfriendly for hobbyists. Maybe it’s cheaper than exe.dev’s $20/month, but I have no idea. I’d have to a complicated calculation based on guesses to tell.
otterley•Jun 26, 2026
I don't think it's that complicated, but yeah, it's not as simple as $X/month.
The primary difference is that with Lambda you pay by the second, not by the month. According to my math, the break-even point for a 8GB allocation (the minimum exe.dev supplies) would be about 1.65 days of continuous runtime. Less than that, and you're better off with Lambda. More than that, and you're better off with exe.dev (assuming we're just talking about money and not opportunity cost). Lambda allows you to use just 2GB of memory, though, so being more memory efficient would change the break-even point to 6.61 days.
skybrian•Jun 27, 2026
I’m running a web server in a VM and I use it every day. It’s mostly idle, but it’s continually available. I wonder how much “continuous runtime” that is?
otterley•Jun 27, 2026
The stopwatch starts when a request arrives and stops after your processor sends the response. You’re not charged for idle time. For low-demand services, it’s a bargain. The tradeoff is a bit of extra latency for cold starts (i.e. when a request hasn’t been processed in a while). Nowhere near a full classic VM launch though—typically under a second.
I think they have one of the best sandbox environments on the market with pay per utilized resources pricing, it's a huge cost reduction for agentic workloads when you have 95%+ idle CPU time and occasional spikes for CPU heavy work (e.g. agent run tests or something like this).
I use railway to host my openclaw like personal agent for friends and family (9 instances) and it costs like 1-2$/mo with scale to zero.
dj0k3r•Jun 26, 2026
Have you tried using unikraft? I think it might be cheaper imo. Worth a try.
simon84•Jun 26, 2026
I am wondering what type of workload this is for.
They give a tiny example and insist on micro, fast start, but the say it lasts up to 8 hours and is up to 16 vCPU.
What sort of app require faster boot (than lambda or ec2), but only for a limited interval, and with possibly plenty of processing power...
Maybe I am not the right target, but if you have examples so that I can better appreciate, I'd love that
leetrout•Jun 26, 2026
SaaS offering the usage of LLMs via API. You want to launch something isolated, as quickly as possible, do the minimal amount of work and not have to throwaway all your state.
otterley•Jun 26, 2026
It's in the very first body paragraph of the article:
"A new class of multi-tenant applications has emerged that all share the need to hand each end user their own dedicated execution environment in which to safely run code that the application developer did not write. AI coding assistants, interactive code environments, data analytics platforms, vulnerability scanners, and game servers that run user-supplied scripts all fit this pattern."
adobrawy•Jun 26, 2026
AI agents. Chatbot session of 8 hours is a lot. 16 vCPU might be useful when developing heavy application and agent need run application tests. You can think what infrastructure https://claude.ai/code needs.
spullara•Jun 26, 2026
Added support for configuring and running these directly from beamshell (.com). Really cool being able to spin these and use them any mcp client.
beamshell microvm deploy && beamshell microvm run
ChuckMcM•Jun 26, 2026
Not informational but I kept reading that as 'MicroVMS' which would be a scaled down version of the DEC VMS operating system?!? And I was trying to figure out if they had added containers or something to it.
mohsen1•Jun 26, 2026
I’ve been working with AgentCore that uses the same MicroVMs. They are capable in many ways but for coding agents that load a big got repo they get bloated quickly with the git repo.
I’m building this google3 style mounting to address this.
Still work in progress but for now I am seeing promising results
mrud•Jun 26, 2026
ramp i think just prebuilts the image/snapshot with the latest checkout version regularly [0]. do you think putting it into the image would address it?
Nice thing about the microvm is that you can snapshot it and restore it. Keeping the fs minimal is my goal. Snapshots restore much faster if they do not include 5GB of source code
apitman•Jun 26, 2026
The holy grail microVM for me is one that can properly share a GPU across VMs, similar to what you can do with containers.
Shout out to https://smolmachines.com/ for supporting Vulkan over virtio-gpu/Venus. Currently the best implementation I'm aware of. Unfortunately my use case is running a full desktop inside the VM, and streaming it out over something like Sunshine/Moonlight. For this you need GPU rendering and video encoding. Venus rendering works, but you have to pass the frames back and forth between the host and the guest multiple times which is inefficient. Also Venus doesn't support video encode as far as I can tell.
Teknoman117•Jun 26, 2026
The problem is that this feature is generally restricted to enterprise customers because VDI systems are such a profitable market. NVIDIA and AMD both only offer this on enterprise cards, and Intel has been very wishy-washy on support in their cards.
If you're looking for a thing to google, look up SR-IOV support on (consumer) GPUs.
Also if you're wondering who the customers of these things tend to be, it's generally the CAD market, law firms, etc. If no one's laptop contains sensitive data and can only stream the desktop of a remote system, the loss or theft of an employee's computer isn't nearly the same kind of a security worry.
apitman•Jun 26, 2026
I'm aware of SR-IOV. Widespread support would go a long way, but doesn't it require pre-slicing the GPU into discrete chunks? I want microVMs that can share share a GPU dynamically the same way they share overprovisioned CPU resources. Much more like containers.
praveenhm•Jun 26, 2026
what is the trend right now on mac to run microvm?
I am using OrbStack.. is anything micro than this?
bkircher•Jun 27, 2026
Yes. On macOS particularly you can do sandbox-exec(1) with custom / per-task SBPL profiles. Combined with strict control over environment variables that are passed into the agent process plus an outbound firewall like LittleSnitch.
Important is to isolate tasks from each other. Example: for work related tasks I let the agent access Datadog or Docker socket. Everything else does not have access to these.
rbbydotdev•Jun 26, 2026
Anyone have a price chart comparing all the sandbox providers? (microvm included)?
dev_l1x_be•Jun 26, 2026
I am not sure how much this changes the landscape.
messh•Jun 26, 2026
An interesting alternative: https://shellbox.dev - manage linux vms via ssh, pay only for what you use. It is much cheaper, no subscription is needed, supports nested virt, docker, custom images, duplication of boxes, gives an ipv6, auto-stop on optional auto stop on disconnect, wakeup on web endpoint hit, email endpoint, exposed ipv6, and more. Parked boxes are just $0.5/month. Create small or large boxes up to 16vcpu with 32gb ram and 400gb hdd
I feel like most of these are solely used for RL environments and training, when else are you doing enough rollouts with LLM code writing and execution. Maybe there are some online applications of this but seems like Modal, Daytona, etc. already have this + a much richer feature set tailored to the aforementioned. Wonder if AWS play is going to be swallow one of those whole.
taikon•Jun 27, 2026
How's this different from AWS firecracker?
taikon•Jun 27, 2026
Since no one answered, I looked into it. Firecracker you have to provision yourself whereas microVMs are server less and AWS manages all the infra.
pugz•Jun 27, 2026
I've been kicking the tyres on this. I managed to get Kubernetes (k3s) running across a cluster of MicroVMs. Useful? No. Funny? To me, yes. https://github.com/aidansteele/microvm-fun
ziyzhu•Jun 27, 2026
Looks like it only support up to 8 hours runtime so no persistence after that?
34 Comments
will have a hosted platform soon with GPU support (vulkan)
also, there’s no lock-in, E2B is open-source and can be hosted on any cloud (AWS included).
plus supports bigger boxes, higher concurrency, longer timeouts (24hr).
disclaimer: i work at E2B
Does this mean you effectively can't use them as long-lived developer environments? It sounds like even if you suspend them, this is the hard limit on the total time it can run.
You just have to finish development in 8 hours.
But I think the point is that they should be cheap to set up, and because of the short life, never really contain anything except the potential to compute when needed, not important data.
then when you launch the next one, its like you are still there?
Using this for a long lived "developer environment" would be extraordinarily expensive anyhow. Scaling the vCPU + RAM cost of these to the same shape compute optimized Graviton On-Demand EC2 instance (16 vCPU x 32 GB RAM) shows about 4x the cost.
So don't do that. Just use an EC2 instance.
I think it's designed for building an image once and then reusing it many, many times.
https://taoofmac.com/space/blog/2026/06/18/1845
https://github.com/rcarmo/pve-microvm
The startups in this space right now don't provide much value on top of the cloud providers they're wrapping. They don't tend to be run by experienced infra people either so they seem very vibecoded, insecure, janky, etc. They're also significantly overpriced because they're marking up already expensive providers.
Something surprising from my own experience is that while there's certainly a huge role for async agents in cloud sandboxes, async agents running locally seem more useful in many cases.
Most of the startups are just wrappers around AWS and significantly more expensive.
Agents need sandboxes that are cheaper so that they can run thousands
I feel that AWS, GCP and all the other cloud providers can provide this natively.
But still it would be nice to self host.
The best part of self hosting is that you own it as well, no rug pulls from the laundry list of reselling providers that could go away at anytime.
It would be nice to have a one click sandbox agent on a self hosted instance that is, free, fast (can pay a bit more for more intensive operations) and that is open source.
Though I did know about this one! (Because I saw the announcement.)
Part of it might just be that I am old and inflation is catching up with my understanding of prices.
But as far as AWS I still have to say no thanks. Imagine some group actually started using my hosted AI agent service for something compute and network intensive. It could turn into $2000 overnight and if I didn't account for one of the numerous types of AWS charges, I might have only collected $500 for credits purchases.
Or it could easily be ten times that. But who am I kidding. No one is going to use my agents. So it doesn't matter if it's gvisor or Firecracker or whatever.
It's not necessarily too hard to just not dynamically spawn a bunch of machines, but the bandwidth one is going to sneak up on people.
I know talk is cheap, but I've been in the room for every one of these discussions over the last 6 years at Fly.io, and if we could have come up with a system to make limits workable, we would have done it. Charging for stuff you don't want is bad business, and we make our money from happy, growing customers (the open secret of hosting is that a huge chunk of usage is basically a loss leader search for a much smaller number of ultra-profitable customers).
These pricing models --- at least outside of AWS (I'm not cynical about them but their incentives are different from indies) --- are not meant to fuck you.
Daytona, E2B, OpenComputer, Freestyle, Blaxel, Vercel, Modal, Cloudflare, Tensorlake, Superserve, etc. etc.
Some of them work by pre-purchasing credits, so you can control the blast radius of spend.
Also, if you want a more embedded sandbox runtime as a library instead of a daemon + REST API, you can check out libkrun (and friendly layers on top of it like https://microsandbox.dev/ and https://smolmachines.com/)
We run quite a few Slicer instances on mini PCs and Ryzen builds - also on Hetzner (and yes ouch 120 EUR / mo up to ~ 550 EUR / mo for 16core / 128GB RAM feels almost unfair)
Firecracker just has a ReSTful unix socket with a defined API and launches KVM vms with limited options.
For custom SMB I still think libvirt is a lower entry cost and may have transferable use cases to longer lived VMs, so you can just launch a qemu microvm[0] and use virsh and/or libvirt xml to set up the networking.
The ~400ms boot time of a qemu microvm vs ~120ms for firecracker may not be an issue for some loads, but qemu will also allow you a bit more density of placement than firecracker. qemu microvms will use a bit more memory individually, but they will also tend to use less real system memory with a larger number of microVMs.
It is all tradeoffs, and kata containers are yet another option that may apply depending on your use case.
You can run your own firecracker or qemu/kvm microvms on most instances that allow nested hypervisors, or on a local host. If cost containment is critical to you this is one possible way forward.
Really it just depends on if you want/need ReSTful control, or need to support short lived serverless functions, or if CLIs fit better and you many want to support full VMs.
They both are just Virtual Machine Monitors that targeted different use cases and decided on different tradeoffs.
Just be careful about hosting traditional containers and microVMs on the same system, that config is going to be problematic do to fundamental reasons that are too complex to properly address here.
[0] https://www.qemu.org/docs/master/system/i386/microvm.html
Standalone gvisor (not the 'do' subcommand) used to be a mess with the OCI json requirement, but recently they began work on presenting their own bwrap interface (likely to pursue AI agent uses) though I wouldn't use it myself yet.
People often look down on gvisor because they think it's some kind of syscall filter, it is not. It can use one of ptrace, seccomp or even KVM to intercept ALL syscalls and service them with it's own logic (which is in Go). Basically it's a VMM and kernel in one.
Also some things you can do to make gvisor better are Wayland passthrough, vulkan support (or virtio native context). Being able to get gvisor to populate a network interface inside itself through a 'passt' (or 'containers/gvisor-tap-vsock') socket on the host would also be ergonomic. All of those are available on 'muvm' (based on libkrun) which if you have the time to set up is the next step in DIY sandboxing of graphical apps as well. See: <https://git.clan.lol/clan/munix>
The tty issue is known, should be fixed soon too, though contributions welcome as it sounds like it should be simple fix and we love more contributions :)
FWIW, X11 apps work well, I have a personal hacky project in which I've been running Librewolf in gVisor, with the window being reflected as a native Wayland window. It uses `Xvfb -fbdir` aimed at a bound tmpfs mount to get a shared memory region containing the window's pixel data which can be read directly from out of the sandbox, has Pulseaudio audio passthrough, and a socket server passing through mouse/keyboard events to make the window interactive. Works smoothly even for YouTube playback, and I successfully played a game of Unreal Tournament 2004 at 24fps in it, with no noticeable mouse/keyboard latency :) We're basically making baby steps to get there less hackily.
Thanks for the feedback!
Wayland is tricky because there are memory buffers being shared between the compositor and the client. crosvm (also by google) adopted 2 custom solutions to it of which one got merged into mainline.
Achieving audio passthrough is trivial as it's just a unix socket. `-host-uds=all`
I just tried:
Flawless playback. I think it's a default pipewire configuration.If you do anything valuable and are compromised it may get brought to the attention of whoever organized the automated attack (ex. AI agent doing interesting proprietary work that installed something it shouldn't have, chat logs got uploaded and analyzed) and they will then sell you to someone with the 0days to extract more value from you. Assuming you didn't screw up and leave a back door open somewhere of course.
https://linuxcontainers.org/incus/docs/main/explanation/cont...
Which is more cheaper for me?
Ideally maybe self hosting would be better?
Also, MicroVMs can't be exposed directly to the web. Your code running in them can only be executed via API calls with attached auth tokens - so if you wanted to host a public facing API or website with them you'd need to implement your own additional layer in front.
Something I appreciate about Fly (disclaimer: they support my work) is that the pricing is fixed - you pay $1.94/month (less if you suspend your machine) for the smallest instance, up to $976.25/month for the largest (16 CPUs, 128GB) plus predictable costs for volume storage.
The only variable outside your control is bandwidth, and that's unlikely to cause a nasty shock.
Contrast with any of the more "elastic" hosting providers - Vercel, Cloud Run - and you're much less likely to get a horrifying bill if something gets overly-crawled or goes viral.
https://sprites.dev
(Both make sense for their respective use cases.)
It's a good callout, a genuine difference between Sprites and Fly Machines. Believe it or not, it's intended to make Sprites cheaper than Machines.
https://fly.io/blog/accident-forgiveness/
A way we simply suck at business: we didn't keep beating the drum about this after we wrote the policy up. We just sort of figured everyone read the blog post and moved on. We probably should have been continuously making noise about it.
What you get from having a company made almost entirely of engineers.
They do spike on different features like:
Then there's also the option to use libkrun to run local sandboxes on your own computer. That doesn't scratch the itch for hosted services, but works if your goal is to run agents inside isolated environments for your own work.I've been working on some open-core stuff[1] to coordinate sandboxes, and we're making changes to have a library that lets people coordinate any number of remote or local sandboxes using any provider, kinda like how the Docker CLI works for managing containers, git repos, and coding agents. Flue[2] is another player in this space, and is more of a pure framework, while we're building it as an interactive product for using sandboxed agents and workflows.
[1] https://github.com/gofixpoint/amika/blob/main/ROADMAP.md
[2]: https://flueframework.com/
You'd have to build more of that with libkrun
The core tech of both are great though.
Then one can just pass `--runtime krun` to most podman subcommands. Alternatively, set the runtime key in the config file to make it the default.
Podman itself has "hardening" techniques, e.g. turning off the network or volumes that can be combined with this.
My personal belief is that the future of an "app" is a combo:
So, it should be stupid simple to run these local sandboxed apps/agents. Right now, not too hard for technical users (esp. with things like https://smolmachines.com/ and https://microsandbox.dev/), but not as easy as clicking an app icon or typing `/path/to/binary` in the CLIAh, the significant compute overhead: https://josecastillolema.github.io/podman-wasm-libkrun/. Much more cpu and ram usage at worse performance.
This has been a big pain point me with various VM solutions I’ve tried. Having to allocate say 8GB to a sandbox, and a) having that RAM eaten up when I’m not using it and b) only having 8GB when I am using kinda sucks.
Yes, I could stop the sandboxes when I’m not using them, but that also kinda sucks.
An example of a "sandboxed agent app", would be: give the app all your past emails. An agent scans them and finds sales emails you need to follow up on. It shows you the suggested follow ups in a UI, and you approve/reject them. Then, it mass sends the approved emails and emits an update to your CRM with the changes.
The sandbox is deleted when the app runs. It's ephemeral for the lifecycle of the app. And you can re-run the same app repeatedly with new inputs, but it gets the same clean starting slate.
also have support for lima/colima/podman
The interesting bit is the libkrun GPU/framebuffer/input plumbing; VNC is just how I got the pixels into the macOS app. The guest still needs a real graphical workload/compositor, e.g. Weston.
(edit: ahh sorry, meant to post this to above comment)
I am quite sure I'm not the only person working on post-firecracker KVM.
That way it can be elastic in CPU, memory and somewhat disk.
How far are you on your take?
https://github.com/mitos-run/mitos
That's exactly what you intended to do. That is the definition of advertising. It is true, many people might like it, so own it. Don't lie about it, even to yourself.
Apart from the above features.
Everything supported in APIs and CLI for agents.Can be used via - npx skills add instavm/skills
This smells like a competing team building something to capitalize on AI hype, but the product isn't differentiated enough for this to make sense long term. If this was a service called managed AI agents, and you added features specific to AI agents, that has value. But "here's Fargate with a different name" isn't gonna last.
https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir... says
> Battle-Tested – Firecracker has been battled-tested and is already powering multiple high-volume AWS services including AWS Lambda and AWS Fargate.
And also, you’ll notice that Fargate takes minutes to launch while Lambda takes a second or less. You’re waiting on AWS to launch a EC2 with your config and pull your containers into it.
(that article matches things I heard from Amazon when I asked why my stuff is slow)
Fargate does not use firecracker. It was used for some internal workloads but was being migrated off at the end of 2025.
That said, Fargate does kind of seem like a superior option
Edit: I guess this supports suspend and fast resume so invocation time should be somewhat better than Fargate.
https://docs.aws.amazon.com/lambda/latest/dg/images-create.h...
To deploy on ECS, the simplest option, means that you have to create a private Docker registry, sync base images from docker.io you use, set up IaC, set up a deployer and user identity, create a multiple subnets to allow redundancy, set up health checks, and I didn't even write down the half of it.
If you want a simple enterprise CRUD interal tool, its crazy. And the WORST thing, is that it doesn't have persistent disk, so you either are forced to use slow and expensive EFS or buy into AWS's expensive managed database systems.
And every update to your app goes through a k8s style 'sync image-drain old servers-create new ones-switch over once healthy cycle'. Which has a tendency to fail for mysterious reasons, oh and its undebuggable, and should you notice that an env var is set up incorrectly, you can't just fix it, it means a whole deploy cycle.
I wouldn't wish that stuff on my worst enemy.
In contrast, if you want okay DX, you either go with Lambda, and manage your own EC2s.
Half of AWS's offerings exist to work around the arbitrary limitations they put on their services, as even this thing (as others have correctly noted), comes with this weird 8 hour limit, but even that's far easier to work around (for stuff like running a simple server), than having to deal with the other stuff.
Still, it does feel overly complicated. Google's "Cloud Run" is way simpler.
(Lambda also has its own DX issues.)
Are you guys literally spinning up agents where a 100 ms boot time vs a 3 seconds boot time makes a difference?
I'm asking because I understand the appeal of micro VMs but every time the subject comes up people talk about "isolating agents": what's wrong about isolating agents in a regular VM (or in a container which, itself, is in a VM)?
FWIW I've got my stuff nicely isolated in regular VMs that are regularly up for hours and hours.
It's like the microVMs boots in 100 ms, then the agent does... What? And exits after another 100ms and now you need to launch another one?
What's the use case of "microVMs to isolate agents"?
So that leaves faster boot times.
Faster boot times and then the agent does what? And at how many token/s? And what's the "time to first token" anyway?
How do the time to first token and then the token/s inherent limitations of LLMs not totally dominate the running time?
I just don't get the use case.
regular VMs just use too much memory, a typical ubuntu uses 512 MB as a baseline
then there's the disk iops used for spinning up all these VMs (loading and booting a whole distro), the security attack vectors of an entire VM vs microVM, the maintenance of the images, the hypervisor abstraction to handle all this automation, ssh for the agent to run in the VM, etc.
compared to mounting an extracted container image to a folder, starting a microVM kernel with folder mount, with specific credentials attached. minimum memory and CPU allocated, minimum possible system resource use, fastest operation, least maintenance. you get more time, more resources, more security.
(micro VMs do provide better security isolation. they have kernels with fewer built-in vulnerabilities, fewer hardware drivers to exploit, a more locked-down network, and they lack a full OS's applications and filesystem permissions to exploit)
We have been doing the exact opposite - instead of micro VM's we are giving agents larger VMs.
Previously we were giving them 1GB RAM VM's - now we have upped to 4 GB RAM VM's. When the agent is working - the real cost is in the inference. There is no reason to keep the agent waiting because your VM is too damn slow. So we moved to larger and faster VMs.
The agent might install a package, or run a script - and now it moves along just faster. Not to mention that if the agent is installing a 'fat' SDK, like maybe android sdk, a thicker RAM just moves along everything smoothly without breakages. The incremental amount we pay for the bigger VM is more than justified by the increase in agent performance.
And all the tooling that has already been built up for standard human operated VM's just works pretty well out of the box. We are able to spin up VM's pretty much on demand and purge them clean once the work is done.
We are moving to 8 GB RAMs/4CPUs sometime this year, and GPU's hopefully sometime next.
https://engine.build/lab/agent-sandboxes
Will add MicroVMs there today (and any others that are missing if you let me know!)
When we did AWS AgentCore Runtime last year we introduced session isolation, with MicroVMs per session. You can think of Lambda MicroVMs as the same stack, but generalized to fit a larger number of application patterns.
Also, a single VM is pretty limiting.
This is why I have been avoiding the word sandbox for exe.dev. I don’t think developers agents need something “sandbox” shaped.
It’s a real tension, working with a remote dev env has never been my first choice. But agents seem to tip the balance enough in favor of remote that I have switched.
This pricing model looks very complicated and unfriendly for hobbyists. Maybe it’s cheaper than exe.dev’s $20/month, but I have no idea. I’d have to a complicated calculation based on guesses to tell.
The primary difference is that with Lambda you pay by the second, not by the month. According to my math, the break-even point for a 8GB allocation (the minimum exe.dev supplies) would be about 1.65 days of continuous runtime. Less than that, and you're better off with Lambda. More than that, and you're better off with exe.dev (assuming we're just talking about money and not opportunity cost). Lambda allows you to use just 2GB of memory, though, so being more memory efficient would change the break-even point to 6.61 days.
I think they have one of the best sandbox environments on the market with pay per utilized resources pricing, it's a huge cost reduction for agentic workloads when you have 95%+ idle CPU time and occasional spikes for CPU heavy work (e.g. agent run tests or something like this).
I use railway to host my openclaw like personal agent for friends and family (9 instances) and it costs like 1-2$/mo with scale to zero.
They give a tiny example and insist on micro, fast start, but the say it lasts up to 8 hours and is up to 16 vCPU.
What sort of app require faster boot (than lambda or ec2), but only for a limited interval, and with possibly plenty of processing power...
Maybe I am not the right target, but if you have examples so that I can better appreciate, I'd love that
"A new class of multi-tenant applications has emerged that all share the need to hand each end user their own dedicated execution environment in which to safely run code that the application developer did not write. AI coding assistants, interactive code environments, data analytics platforms, vulnerability scanners, and game servers that run user-supplied scripts all fit this pattern."
beamshell microvm deploy && beamshell microvm run
I’m building this google3 style mounting to address this.
https://github.com/mohsen1/git-lazy-mount
Still work in progress but for now I am seeing promising results
[0] https://builders.ramp.com/post/why-we-built-our-background-a...
Shout out to https://smolmachines.com/ for supporting Vulkan over virtio-gpu/Venus. Currently the best implementation I'm aware of. Unfortunately my use case is running a full desktop inside the VM, and streaming it out over something like Sunshine/Moonlight. For this you need GPU rendering and video encoding. Venus rendering works, but you have to pass the frames back and forth between the host and the guest multiple times which is inefficient. Also Venus doesn't support video encode as far as I can tell.
If you're looking for a thing to google, look up SR-IOV support on (consumer) GPUs.
Also if you're wondering who the customers of these things tend to be, it's generally the CAD market, law firms, etc. If no one's laptop contains sensitive data and can only stream the desktop of a remote system, the loss or theft of an employee's computer isn't nearly the same kind of a security worry.
Important is to isolate tasks from each other. Example: for work related tasks I let the agent access Datadog or Docker socket. Everything else does not have access to these.