Relevant Tony Hoare quote: “There are two approaches to software design: make it so simple there are obviously no deficiencies, or make it so complex there are no obvious deficiencies”.
tekacs•Apr 15, 2026
I think this is so relevant, and thank you for posting this.
Of course it's trivially NOT true that you can defend against all exploits by making your system sufficiently compact and clean, but you can certainly have a big impact on the exploitable surface area.
I think it's a bit bizarre that it's implicitly assumed that all codebases are broken enough, that if you were to attack them sufficiently, you'll eventually find endlessly more issues.
Another analogy here is to fuzzing. A fuzzer can walk through all sorts of states of a program, but when it hits a password, it can't really push past that because it needs to search a space that is impossibly huge.
It's all well and good to try to exploit a program, but (as an example) if that program _robustly and very simply_ (the hard part!) says... that it only accepts messages from the network that are signed before it does ANYTHING else, you're going to have a hard time getting it to accept unsigned messages.
Admittedly, a lot of today's surfaces and software were built in a world where you could get away with a lot more laziness compared to this. But I could imagine, for example, a state of the world in which we're much more intentional about what we accept and even bring _into_ our threat environment. Similarly to the shift from network to endpoint security. There are for sure, uh, million systems right now with a threat model wildly larger than it needs to be.
slow_typist•Apr 16, 2026
Problem is, the way economic activity is organised in general, there is no transition path from complex bloated systems to well designed completely human auditable systems. For example given the inherent (and proven) security risks of the Wordpress ecosystem, nobody should run WP anymore.
PunchyHamster•Apr 16, 2026
I'd hazard a guess 90% of WP instances could be replaced by static site generator + some tiny app to handle forms, and the 9/10th of remaining ones with static gen + form + some external commenting system, whether in cloud or something like commento.
balamatom•Apr 16, 2026
Correct. And yet, people are not doing it.
self_awareness•Apr 16, 2026
The question is what "complex" means. Complex for us doesn't mean it's complex for LLM. And vice-versa. So I wouldn't value this approach at all.
misja111•Apr 16, 2026
I disagree. Much of what makes software complex for us, makes it complex for LLM just as well. E.g:
- a very large codebase
- a codebase which is not modularized into cohesive parts
- niche languages or frameworks
- overly 'clever' code
xeyownt•Apr 16, 2026
Yeah, the main problem is that most companies / people don't give a f*ck about security because it is not a key feature. It's only a marketing stamp. You want it good enough to sell the products, but you don't want to spent too much on it. So instead you go vibe coding. The baby is dead born.
goodpoint•Apr 16, 2026
It's just wrong.
jp0001•Apr 15, 2026
I'm starting to think that Opus and Mythos are the same model (or collection of models) whereas Mythos has better backend workflows than Opus 4.6. I have not used Mythos, but at work I have a 5 figure monthly token budget to find vulnerabilities in closed-source code. I'm interested in mythos and will use it when it's available, but for now I'm trying to reverse engineer how I can get the same output with Opus 4.6 and the answer to me is more tokens.
snowwrestler•Apr 15, 2026
It looks like proof of work because:
> Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.
So, the author infers a durable direct correlation between token spend and attack success. Thus you will need to spend more tokens than your attackers to find your vulnerabilities first.
However it is worth noting that this study was of a 32-step network intrusion, which only one model (Mythos) even was able to complete at all. That’s an incredibly complex task. Is the same true for pointing Mythos at a relatively simple single code library? My intuition is that there is probably a point of diminishing returns, which is closer for simpler tasks.
In this world, popular open source projects will probably see higher aggregate token spend by both defenders and attackers. And thus they might approach the point of diminishing returns faster. If there is one.
janalsncm•Apr 15, 2026
Knowing nothing about cybersecurity, maybe the question is whether it costs more tokens to go from 32 steps to 33, or to complete the 33rd step? If it’s cheaper to add steps, or if defense is uncorrelated but offense becomes correlated, it’s not as bad as the article makes it seem.
For instance, if failing any step locks you out, your probability of success is p^N, which means it’s functionally impossible with enough layers.
necovek•Apr 16, 2026
This is not like adding one bit of randomness to improve security: this was a model system which required 32 steps to break in if I understood correctly.
It is not that one would design a system in this manner because you'd never design a loophole in no matter the steps it takes to get there: it is just a benchmark.
SyneRyder•Apr 15, 2026
Worth pointing out that as impressive as the 32-step network takeover is, Mythos wasn't able to achieve it on every attempt, and the network itself did not have the usual defence systems.
I wouldn't use those as excuses to dismiss AI though. Even if this model doesn't break your defences, give it 3 months and see where the next model lands.
nickdothutton•Apr 15, 2026
Although not an escape from the "who can spend the most on tokens" arms race, there is also the possibility to make reverse engineering and executable analysis more difficult. This increases the attacker's token spend if nothing else. I wonder if dev teams will take an interest.
Better to write good, high-quality, properly architected and tested software in the first place of course.
Edited for typo.
chromacity•Apr 15, 2026
I discussed this in more detail in one of my earlier comments, but I think the article commits a category error. In commercial settings, most of day-to-day infosec work (or spending) has very little to do with looking for vulnerabilities in code.
In fact, security programs built on the idea that you can find and patch every security hole in your codebase were basically busted long before LLMs.
Muromec•Apr 15, 2026
Commercial infosec is deleting firefox from develop machines, because it's not secure and explaining to muggles why they shouldn't commit secret material to the code repository. That and blocking my ssh access to home router of course.
chromacity•Apr 15, 2026
I mean, often, yep. The real reason why they are unhappy with you having an unsupported browser is simply that it's much harder to reason about or enforce policies across bespoke environments. And in an enterprise of a sufficient scale, the probability that one of your employees is making a mistake today is basically 1. Someone is installing an infostealer browser extension, someone is typing in their password on a phishing site, etc. So, you really want to keep browsers on a tight leash and have robust monitoring and reporting around that.
Yeah, it sucks. But you're getting paid, among other things, to put up with some amount of corporate suckiness.
gerdesj•Apr 15, 2026
"The real reason why they are unhappy with you having an unsupported browser"
I tend to encourage Firefox over Cr flavoured browsers because FF (for me) are the absolute last to dive in with fads and will boneheadedly argue against useful stuff until the cows come home ... Web Serial springs to mind (which should finally be rocking up real soon now).
Oh and they are not sponsored by Google errm ... 8)
I'm old enough to remember having to use telnet to access the www (when it finally rocked up and looked rather like Gopher and WAIS) (via a X.25 PAD) and I have seen the word "unsupported" bandied around way too often since to basically mean "walled garden".
I think that when you end up using the term "unsupported browser" you have lost any possible argument based on reason or common decency.
direwolf20•Apr 16, 2026
> Web Serial
why in the absolute fuck would I want random web pages to be able to control all the devices connected to my computer?
venzaspa•Apr 16, 2026
It's essentially for programming microcontrollers, ESP32's and the like. It's really handy. You have to confirm the connection every time.
direwolf20•Apr 16, 2026
I have an idea - what if a webpage could just run arbitrary code? With a confirmation every time. Then you wouldn't need a WebX for every X.
GuB-42•Apr 16, 2026
The thing that kills me every time is how IT treat development machines the same way as the rest of the corporate network.
Developers usually need elevated privileges, executing unverified arbitrary code is literally their job. Their machines are not trustworthy, and yet, they often have access to the entire company internal network. So you get a situation where they have both too much privilege (access to resources beyond the scope of their work) and too little (some dev tools being unavailable).
somesortofthing•Apr 15, 2026
There's still the question of access to the codebase. By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt. The attacker usually has even less access than this - in the beginning, they have network tools, an undocumented API, and maybe some binaries.
You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.
Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.
If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.
Retr0id•Apr 15, 2026
Tokens can also be burnt on decompilation.
tptacek•Apr 15, 2026
Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.
jeffmcjunkin•Apr 15, 2026
Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498
Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.
(also, hi Thomas!)
stackghost•Apr 15, 2026
My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.
Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.
That matches my experience too - LLMs are very capable in "translating" between domains - one of the best experience I've had with LLMs is turning "decompiled" source into "human readable" source. I don't think that "Binary Only" closed-source isn't the defense against this that some people here seem to think it is.
echelon•Apr 16, 2026
Has anyone used an LLM to deobfuscate compiled Javascript?
bitexploder•Apr 16, 2026
Yep. They are good at it.
saagarjha•Apr 16, 2026
I've used it for hobby efforts on Electron/React Native (Hermes bytecode) apps and it seems to work reasonably well
heeen2•Apr 16, 2026
yes, but it requires some nudging if you don't want to waste tokens. it will happily grep and sed through massive javascript bundles but if you tell it to first create tooling like babel scripts to format, it will be much quicker.
lelanthran•Apr 16, 2026
> Has anyone used an LLM to deobfuscate compiled Javascript?
Seems like a waste of money; wouldn't it be better to extract the AST deterministically, write it out and only then ask an LLM to change those auto-generated symbol names with meaningful names?
gfosco•Apr 16, 2026
Yeah, it's token intensive but worth it. I built a very dumb example harness which used IDA via MCP and analyzed/renamed/commented all ~67k functions in a binary, using Claude Haiku for about $150. A local model could've accomplished it for much less/free. The knowledge base it outputs and the marked up IDA db are super valuable.
whattheheckheck•Apr 16, 2026
Do you have the repo example?
heeen2•Apr 16, 2026
I did something similar using ghidramcp for digging around this keyboard firmware, repo contains the ghidra project, linux driver and even patches to the original stock fw. https://github.com/echtzeit-solutions/monsgeek-akko-linux
somesortofthing•Apr 15, 2026
Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.
high_na_euv•Apr 16, 2026
>original authors' cooperation
Ha
>for free.
Haha, it is more complicated in reality
echelon•Apr 16, 2026
> Tokens can also be burnt on decompilation.
Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.
Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.
Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.
Prediction 4. To push back, everything will move to thin clients.
jgraham•Apr 16, 2026
I think if prediction 1 is true (that it becomes cheap to clone existing software in a way that doesn't violate copyright law), the response will not be purely technical (moving to thin clients, or otherwise trying to technically restrict the access surface to make reverse engineering harder). Instead I'd predict that companies look to the law to replace the protections that they previously got from copyright.
Obvious possibilities include:
* More use of software patents, since these apply to underlying ideas, rather than specific implementations.
* Stronger DMCA-like laws which prohibit breaking technical provisions designed to prevent reverse engineering.
Similarly, if the people predicting that humans are going to be required to take ultimate responsibility for the behaviour of software are correct, then it clearly won't be possible for that to be any random human. Instead you'll need legally recognised credentials to be allowed to ship software, similar to the way that doctors or engineers work today.
Of course these specific predictions might be wrong. I think it's fair to say that nobody really knows what might have changed in a year, or where the technical capabilities will end up. But I see a lot of discussions and opinions that assume zero feedback from the broader social context in which the tech exists, which seems like they're likely missing a big part of the picture.
btown•Apr 15, 2026
The problem, though, is that this turns "one of our developers was hit by a supply chain attack that never hit prod, we wiped their computer and rotated keys, and it's not like we're a big target for the attacker to make much use of anything they exfiltrated..." into "now our entire source code has been exfiltrated and, even with rudimentary line-by-line scanning, will be automatically audited for privilege escalation opportunities within hours."
Taken to an extreme, the end result is a dark forest. I don't like what that means for entrepreneurship generally.
linkregister•Apr 15, 2026
This is a great example of vulnerability chains that can be broken by vulnerability scanning by even cheaper open source models. The outcome of a developer getting pwned doesn't have to lead to total catastrophe. Having trivial privilege escalations closed off means an attacker will need to be noisy and set off commodity alerting. The will of the company to implement fixes for the 100 Github dependabot alerts on their code base is all that blocks these entrepreneurs.
It does mean that the hoped-for 10x productivity increase from engineers using LLMs is eroded by the increased need for extra time for security.
This take is not theoretical. I am working on this effort currently.
pixl97•Apr 15, 2026
I disagree that it's extra time for security, it's the time we should have been spending in the first place.
fragmede•Apr 16, 2026
It's great news for developers. Extra spend on a development/test env so dev have no prod access, prod has no ssh access; and SREs get two laptops, with the second one being a Chromebook that only pulls credentials when it's absolutely necessary.
linkregister•Apr 16, 2026
Yes, having a good development env with synthetic data, and an inaccessible, secure prod env just got justification. I never considered the secondary SRE laptop but I think it might be a good idea.
wafflemaker•Apr 16, 2026
Please explain the second laptop. I'm studying cybersecurity, so think I should know why. Or is it a joke?
linkregister•Apr 16, 2026
The value-add is having a workstation that's disconnected from work that would be susceptible to traditional vectors that endpoints are vulnerable to. For example, building software that pulls in potentially malicious dependencies, installing non-essential software, etc. The "SRE laptop" would only have a browser and the official CLI tools from confirmed good cloud and infrastructure vendors, e.g. gcloud, terraform.
I think that such a posture would only be possible in a mature company where concerns are already separated to the point where only a handful of administrators have actual SSO or username/passphrase access to important resources.
fragmede•Apr 16, 2026
It's not a joke. Supply chain attacks are a thing, but Google Chromebooks are about the most trustable consumer machine you can run custom code on short of a custom app on an iPad. The Chromebook would only ever have access to get the root AWS (or whatever) credentials to delete, say, the load balancer for the entire SaaS company's API/website. If my main laptop gets hacked somehow, the attacker can't get access to the root AWS credentials because the main laptop doesn't have them. The second laptop would only be used sparingly, but it would have access to those root credentials.
eru•Apr 16, 2026
> Taken to an extreme, the end result is a dark forest.
Sorry, how does that work?
bryanrasmussen•Apr 16, 2026
since the suggestion is that the new security bug finding LLMs will increase protection because it will have access to the full source code then, the dark forest fear would be, if it is possible for an attacker to get all the source the attacker will be in a better position.
This seems wrong however, as it ignores the arrow of time. The full source code has been scanned and fixed for things that LLMs can find before hitting production, anyone exfiltrating your codebase can only find holes in stuff with their models that is available via production for them to attack and that your models for some reason did not find.
I don't think there is any reason to suppose non-nation state actors will have better models available to them and thus it is not a dark forest, as nation states will probably limit their attacks to specific things, thus most companies if they secure their codebase using LLMs built for it will probably be at a significantly more secure position than nowadays and, I would think, the golden age of criminal hacking is drawing to a close. This assume companies smart enough to do this however.
Furthermore, the worry about nation state attackers still assumes that they will have better models and not sure if that is likely either.
staplers•Apr 16, 2026
I would think, the golden age of criminal hacking is drawing to a close. This assume companies smart enough to do this however.
It's rarely the systems that are the weak link, rather the humans with backdoor access.
I guess the connection would be human history, a dark forest is a scene of lawlessness and violence and danger in much of that history - at least where stories are concerned.
In the use of the phrase Dark Forest to explain the Fermi paradox it suggests that alien civilizations have kept themselves dark out of fear that the rest of the forest is actually lawless and violent.
In this case though we are entering a dark forest, like Hansel and Gretel, supposedly defenseless against the monsters that lurk in there, but really - they weren't that defenseless were they? I don't think the phrase that apt.
btown•Apr 16, 2026
Any single company might be able to proactively defend themselves from attackers, but will companies invest the tokens in this? Most people simply don't care until it's too late.
And in a world where companies begin to suffer from attacks as a result - can the ones who are willing to invest in security defend themselves, not just against cyberattackers, but against a broader investor and customer backlash that believes that startups that build their own technology stacks are riskier due to perceptions about cybersecurity?
An angel investor or LP who sees news articles and media about cyberattacks, then has a portfolio company get hacked in a material way, may simply decide the space has become too risky for further investments, no matter how much prospects get on better security footings.
The dark forest hypothesis, at its core, is about a decision of whether to put your neck out in the universe; if the weapons and countermeasures being used are too horrifying to fathom, the risks unquantifiable, one chooses not to extend one's neck. And that is how an industry begins to dry.
linkregister•Apr 16, 2026
The pressure by internal auditors and cyber insurance providers to implement these programs will be strong. I have been at organizations where EDR was added only due to the board of directors following the recommendation of 3rd parties. Of course, there will be new companies that haven't achieved the maturity to have had these pressures. But new companies being thoroughly compromised is hardly a recent phenomenon.
anitil•Apr 15, 2026
On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them
This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.
sally_glance•Apr 16, 2026
This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.
As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...
thephyber•Apr 16, 2026
This assumes AI model improvements will be predictable, which they won’t.
There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.
theptip•Apr 16, 2026
It’s a good thing to keep in mind, but LLM + scaffolding is clearly superior. So if you just use vanilla LLMs you will always be behind.
I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.
fragmede•Apr 16, 2026
It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.
AlexCoventry•Apr 16, 2026
They have no values of their own, so you have to direct their attention that way.
l33tman•Apr 16, 2026
As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.
chrisjj•Apr 16, 2026
There will always be bosses who/which think telling workers to work well works well.
jtbayly•Apr 16, 2026
I read once (so no idea if it is true) that in voice lessons, one of the most effective things you can do to improve people's technique is to tell them to pretend to be an opera singer.
bitexploder•Apr 16, 2026
And if you have the better harness and the next model?
anitil•Apr 16, 2026
I would _hope_ that the double combo would be better, but honestly I have no idea
bitexploder•Apr 16, 2026
I do. It is better. I have done a lot of vuln research. I can get way better than one shot level results out of “inferior” models.
I think you took away the wrong lesson from that podcast:
I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.
Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.
jorvi•Apr 16, 2026
That seems very unlikely.
Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.
I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.
The real improvements are going to be in tooling and harnessing.
Kinrany•Apr 16, 2026
That only applies to workarounds for current limitations, no? Some things a harness can do will apply in the same way to future models.
kelvinjps10•Apr 15, 2026
what about open source software?
eru•Apr 16, 2026
> There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.
Well, you need to harden everything, the attacker only needs to find one or at most a handful of exploits.
lelanthran•Apr 16, 2026
> Well, you need to harden everything, the attacker only needs to find one or at most a handful of exploits.
Yeah, but it's not like the attacker knows where to look without checking everything, it it?
If you harden and fix 90% of vulns, the attacker may give up when their attempts reach 80% of vulns.
It's the same as it has ever been; you don't need to outrun the bear, you only need to outrun the other runners.
My point is that the cost for the attacker is higher than the cost for the defender, if the attacker has to spend tokens probing for vulnerabilities against a system which has little know about it, while the defender spends tokens on a system they have the full source to.
That is not at all relevant to "security via obscurity" or similar arguments: having the source in the open may (eventually) be more secure, but it lowers the token-spend for the attacker.
bryanrasmussen•Apr 16, 2026
>By all accounts, the best LLM cyber scanning approaches are really primitive
It seems like that is perhaps not the case anymore with the Mythos model?
nl•Apr 16, 2026
> By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase
What accounts are these?
I've seen some people use this but I cannot imaging that anyone thinks this is the best.
For example I've had success telling LLMs to scan from application entry points and trace execution, and that seems an extremely obvious thing to do. I can't imagine others in the field don't have much better approaches.
linkregister•Apr 16, 2026
Indeed, all the hot security scanning vendors are using custom prompts to capture a more holistic approach. There are of course plenty of legacy scanners that still focus on OS package versions and static configs, but the parts of the industry leaning into LLMs have genuine value to add.
I don't expect Claude Code Review to be a replacement for a good vendor's solution.
kansface•Apr 16, 2026
This feels pretty fertile atm to me, because it has been prohibitively expensive to do. I expect there is a ton of low hanging fruit. Why not in the age of AI?
ozim•Apr 16, 2026
Still it makes cost of making software higher.
You cannot get away with „well no one is going to spend time writing custom exploit to get us” or „just be faster than slowest running away from the bear”.
xeyownt•Apr 16, 2026
One defender, many attackers, I don't see how the economy of scale can be positive for the defender.
Assuming your code is inaccessible isn't good for security. All security reviews are done assuming code source is available. If you don't provide the source, you'll never score high in the review.
throwawayqqq11•Apr 16, 2026
I think automated scanning can be positive for the defenders, when the rate of introducing new vulnerabilities vs fixing old ones is < 1 (detection rate + infra is a factor too ofc). In that case, AI can become the many eyes to check FOSS and those projects will eventually reach a "secure" state.
goodpoint•Apr 16, 2026
It's the opposite, the economy of scale favors defense.
chrisjj•Apr 16, 2026
> By all accounts, the best LLM cyber scanning approaches are really primitive - it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt
Primitive? I'd say simple and thorough.
lmeyerov•Apr 16, 2026
Most companies and their vendor ecosystems run on OSS
Worse, "attackers no longer break in, they log in", so the supply chain attacks harvesting credentials have been frightening
tossandthrow•Apr 16, 2026
> it's just a bash script that goes through every single file in the codebase and, for each one and runs a "find the vulns here" prompt.
This really is not the case.
You have freedom of methodology.
You can also ask it to enumerate various risks and find proof of existence for each of them.
Certainly our LLM audits are not just a prompt per file - so I have a hard time believing that best in class tools would do this.
pseudohadamard•Apr 16, 2026
I've actually had pretty good results from doing exactly that. There was one FP when it tried to be Coverity and failed miserably, but the others were "you need to look at this bit more closely", and in most cases there was something there. Not necessarily a vuln but places where the code could have been written more clearly. It was like having your fourth grade English teacher looking over your shoulder and saying "you need to look at the grammar in this sentence more closely".
And using an LLM to audit your code isn't necessarily a case of turning it into perfect code, it's to keep ahead of the other side also using an LLM. You don't need to outrun the bear, just the other hikers.
saidnooneever•Apr 15, 2026
people biting into what companies say about their own products had always been the frustration in cyber. now more than ever.
nothing is better or worse, basically as its always been.
if you think otherwise, stop ignoring the past.
saidnooneever•Apr 15, 2026
thanks for the down vote. i am not cynical though. how many billion dollar companies claim 109% detection rates and bullet proof security. i worked at one of these companies as they bought another and suffered through trying to make broken promises a reality. (they did it partly, an epic achievement. amazing engineers.) its a broken game.
you are addicted to dopamine. think carefully and take good care of yourself
tptacek•Apr 15, 2026
It looks like it, but it isn't. It's the work itself that's valued in software security, not the amount of it you managed to do. The economics are fundamentally different.
Put more simply: to keep your system secure, you need to be fixing vulnerabilities faster than they're being discovered. The token count is irrelevant.
Moreover: this shift is happening because the automated work is outpacing humans for the same outcome. If you could get the same results by hand, they'd count! A sev:crit is a sev:crit is a sev:crit.
keeda•Apr 16, 2026
I think the premise is:
1) The number of vulnerabilities surfaced (and fixed?) in a given software is roughly proportional to the amount of attention paid to it.
2) Attention can now be paid in tokens by burning huge amounts of compute (bonus: most commonly on GPUs, just like crypto!)
3) Whoever finds a vulnerability has a valuable asset, though the value differs based on the criticality of the vulnerability itself, and whether you're the attacker or the defender.
More tokens -> more vulns is not a guarantee of course, it's a stochastic process... but so is PoW!
int32_64•Apr 15, 2026
By using these services, you're also exfiltrating your entire codebase to them, so you have to continuously use the best cyber capabilities providers offer in case a data breach allows somebody to obtain your codebase and an attacker uses a better vulnerability detector than what you were using.
jerf•Apr 15, 2026
I've said for decades that, in principle, cybersecurity is advantage defender. The defender has to leave a hole. The attackers have to find it. We just live in a world with so many holes that dedicated attackers rarely end up bottlenecked on finding holes, so in practice it ends up advantage attacker.
There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.
This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.
traderj0e•Apr 15, 2026
I agree for the type of attacks the article focuses on, but DDoS and social engineering seem like advantage attacker.
mapontosevenths•Apr 15, 2026
>in principle, cybersecurity is advantage defender
I disagree.
The defender must be right every single time. The attacker only has to get lucky and thanks to scale they can do that every day all day in most large organizations.
traderj0e•Apr 15, 2026
Well, the attacker has something to lose too. It's not like the defender has to be perfect or else attacks will just happen, it takes time/money to invest in attacking.
mapontosevenths•Apr 16, 2026
The cost to your average ransomware crew can be rounded down to zero, because it's pretty darn close. They use automated tools running on other peoples computers and utilizing other peoples connectivity. The tools themselves for most RaaS (ransomware as a service) affiliates are also close to zero cost, as they pay the operator a percentage of profits.
The time is a cost, but at scale any individual target is a pretty minor investment since it's 90%+ automated. Also, these aren't folks that are otherwise highly employable. The opportunity cost to them is also usually very low.
The last attacker I got into a conversation with was interesting. Turns out, he was a 16 year old from Atlanta GA using a toolkit as an affiliate. He claimed he made ~100k/year and used the money on cars and girls. I felt like he was inflating that number to brag. His alternative probably would have been McDonalds, and as a minor if he got caught it would've been probation most likely. I told him to come to the blue team, we pay better.
janalsncm•Apr 15, 2026
My understanding of defense in depth is that it is a hedge against this. By using multiple uncorrelated layers (e.g. the security guard shouldn’t get sleepier when the bank vault is unlocked) you are transforming a problem of “the defender has to get it right every time” into “the attacker has to get through each of the layers at the same time”.
mapontosevenths•Apr 16, 2026
It is a hedge, that said it only reduces the probability of an event and does not eliminate it.
To use your example, if the odds of the guard being asleep and the vault being unlocked are both 1% we have a 0.0001 chance on any given day. Phew, we're safe...
Except that Google says there are 68,632 bank branch locations in the US alone. That means it will happen roughly 7 times on any given day someplace in America!
Now apply that to the scale of the internet. The attackers can rattle the locks in every single bank in an afternoon for almost zero cost.
The poorly defended ones have something close to 100% odds of being breached, and the well defended ones how low odds on any given day, but over a long enough timeline it becomes inevitable.
To again use your bank example. if we only have one bank, but keep those odds it means that over about 191 years the event will happen 7 times. Or to restate that number, it is like to happen at least once every 27 years. You'll have about 25% odds of it happening in any 7 year span.
For any individual target, it becomes unlikely, but also still inevitable.
From an attackers perspective this means the game is rigged in their favor. They have many billions of potential targets, and the cost of an attack is close to zero.
From a defenders perspective it means realizing that even with defense in depth the breach is still going to happen eventually and that the bigger the company is the more likely it is.
Cyber is about mitigating risk, not eliminating it.
tptacek•Apr 15, 2026
The attacker and defender have different constant factors, and, up until very recently, constant factors dominated the analysis.
coldtea•Apr 15, 2026
Not to mention an attacker motivated by financial gain doesn't even need a particular targer defender. One/any found available will do.
NegativeK•Apr 16, 2026
The defender must be right every single time, and the attacker right only once.
Until the attacker has initial access.
Then the attacker needs to be right every single time.
DerSaidin•Apr 15, 2026
> Cybersecurity looks like proof of work now
Imo, cybersecurity looks like formally verified systems now.
You can't spend more tokens to find vulnerabilities if there are no vulnerabilities.
drdrey•Apr 15, 2026
good luck formally verifying everything
ares623•Apr 16, 2026
I declare ~bankruptcy~ formal verification
deepsun•Apr 15, 2026
Every formal verification depends highly on requirements. It's pretty easy to make a mistake in defining the task itself. In the end, you'd want to verify system behavior in real world, and it's impossible to completely define real world. You always make some assumptions/models to reason within, and it impossible to verify the assumptions are correct.
AlexCoventry•Apr 16, 2026
I think there's definitely more scope for ruling out vulnerabilities by implementing simpler designs and architectures.
jldugger•Apr 16, 2026
I misread the title as "proof work" not "proof _of_ work." The analysis makes sense, but has kinda always been true. So mostly depressing rather than insightful.
But part of me has been wondering for a while now whether proofs of correctness is the way out of the NVIDIA infinite money glitch. IDK if we're there yet but it's pretty much the only option I can imagine.
jzelinskie•Apr 15, 2026
Security has always been a game of just how much money your adversary is willing to commit. The conclusions drawn in lots of these articles are just already well understood systems design concepts, but for some reason people are acting like they are novel or that LLMs have changed anything besides the price.
For example from this article:
> Karpathy: Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to “yoink” functionality when it’s simple enough and possible.
Anyone who's heard of "leftpad" or is a Go programmer ("A little copying is better than a little dependency" is literally a "Go Proverb") knows this.
Another recent set of posts to HN had a company close-sourcing their code for security, but "security through obscurity" has been a well understand fallacy in open source circles for decades.
pmontra•Apr 16, 2026
Yes, there is nothing novel in "to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them." That's what security always looked like, physical security included (burglars, snipers, etc.) So when AI is available you have to throw more AI at securing your system than your adversaries do. What a surprise.
Maybe we could start with the prompts for the code generation models used by developers.
lelanthran•Apr 16, 2026
> Another recent set of posts to HN had a company close-sourcing their code for security, but "security through obscurity" has been a well understand fallacy in open source circles for decades.
I dunno about that quoted bit; "Defense in depth" (Or defense via depth) is a good thing, and obscurity is just one of those layers.
"Security through obscurity" is indeed wrong if the obscurity is a large component of the security, but it helps if it is just another layer of defense in the stack.
IOW, harden your system as if it were completely transparent, and only then make it opaque.
alexwebb2•Apr 16, 2026
> "security through obscurity" has been a well understand fallacy in open source circles for decades
The times, as they say, are a-changin’.
Open software is not inherently more secure than closed software, and never has been.
Its relative security value was always derived from circumstantial factors, one of the most important of which was the combination of incentive and ability and willingness of others in the community to spend their time and attention finding and fixing bugs and potential exploits.
Now, that’s been the case for so long that we all implicitly take it for granted, and conclude that open software is generally more secure than closed, and that security through obscurity falls short in comparison.
But this may very well fundamentally change when the cost of navigating the search space of potential exploits, for both the attacker and the defender, is dramatically reduced along the axes of time and attention, and increased along the axis of monetary investment.
It then becomes a game of which side is more willing to pool monetary resources into OSS security analysis – the attackers or the defenders – and I wouldn’t feel comfortable betting on the defenders in that case.
j2kun•Apr 15, 2026
The article heavily quotes the "AI Security Institute" as a third-party analysis. It was the first I heard of them, so I looked up their about page, and it appears to be primarily people from the AI industry (former Deepmind/OpenAI staff, etc.), with no folks from the security industry mentioned. So while the security landscape is clearly evolving (cf. also Big Sleep and Project Zero), the conclusion of "to harden a system we need to spend more tokens" sounds like yet more AI boosting from a different angle. It raises the question of why no other alternatives (like formal verification) are mentioned in the article or the AISI report.
I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.
tptacek•Apr 15, 2026
I would be interested in which notable security researchers you can find to take the other side of this argument. I don't know anything about the "AI Security Institute", but they're saying something broadly mirrored by security researchers. From what I can see, the "debate" in the actual practitioner community is whether frontier models are merely as big a deal as fuzzing was, or something signficantly bigger. Fuzzing was a profound shift in vulnerability research.
(Fan of your writing, btw.)
VorpalWay•Apr 15, 2026
> but they're saying something broadly mirrored by security researchers.
You might well be right, it is not an area I know much of or work in. But I'm a fan of reliable sources for claims. It is far to easy to make general statements on the internet that appear authorative.
j2kun•Apr 15, 2026
It's less that I think they would take the other side of the argument, than that they would lend some credence to the content of the analysis. For example, I would not particularly trust a bunch of AI researchers to come up with a representative set of CTF tasks, which seems to be the basis of this analysis.
tptacek•Apr 15, 2026
Yeah, you might be right about this particular analysis! The sense I have from talking to people at the labs is that they're really just picking deliberately diverse and high-profile targets to see what the models are capable of.
croemer•Apr 16, 2026
They are a UK government unit: "The AI Security Institute is a research organisation within the Department of Science, Innovation and Technology."
Unfortunately, they fit straight lines to graphs with y axis from 0 to 100% and x axis being time - which is not great. Should do logistic instead.
wg0•Apr 16, 2026
If true, that's naked, shameless and brutal capitalism.
Seems much like those secretly tobacco industry funded reports about tobacco being safe and such.
cmrx64•Apr 15, 2026
Dijkstra would shake his head at our folly.
smj-edison•Apr 15, 2026
I'm curious to see if formally verified software will get more popular. I'm somewhat doubtful, since getting programmers to learn formally math is hard (rightfully so, but still sad). But, if LLMs could take over the drudgery of writing proofs in a lot of the cases, there might be something there.
stringfood•Apr 15, 2026
I am so exhausted with being asked to learn difficult and frankly confusing topics - the fact that it is so hard and so humbling to learn these topics is exactly why everyone is so happy to let AI think about formal programming and I can focus on getting Jersey Shore season 2 loaded into my Plex server. It's the one where Pauly D breaks up with Shelli
gjadi•Apr 15, 2026
How is getting proof one doesn't understand going to help build safer system?
I want to believe formal methods can help, not because one doesn't have to think about it, but because the time freed from writing code can be spent on thinking on systems, architecture and proofs.
smj-edison•Apr 15, 2026
That's a fair question, and looking at my post I now realize I have two independent points:
1. A proof mindset is really hard to learn.
2. Writing theorem definitions can be hard, but writing a proof can be even harder. So, if you could write just the definitions, and let an LLM handle all the tactics and steps, you could use more advanced techniques than just a SAT solver.
So I guess LLMs only marginally help with (1), but they could potentially be a big help for (2), especially with more tedious steps. It would also allow one to use first order logic, and not just propositional logic (or dependant types if you're into that).
Mistletoe•Apr 15, 2026
Everything eventually turns into Bitcoin. That’s what I plan to see in the future years and decades.
c1ccccc1•Apr 15, 2026
If you have a limited budget of tokens as a defender, maybe the best thing to spend them on is not red teaming, but formalizing proofs of your code's security. Then the number of tokens required roughly scales with the amount and complexity of your code, instead of scaling with the number of tokens an attacker is willing to spend.
(It's true that formalization can still have bugs in the definition of "secure" and doesn't work for everything, which means defenders will still probably have to allocate some of their token budget to red teaming.)
pxc•Apr 15, 2026
> If you have a limited budget of tokens as a defender, maybe the best thing to spend them on is not red teaming, but formalizing proofs of your code's security.
You can only do this if you have a very clear sense of what your code should be doing. In most codebases I've ever worked with, frankly, no one has any idea.
Red teaming as an approach always has value, but one important characteristic it has is that you can apply red teaming without demanding any changes at all to your code standards, or engineering culture (and maybe even your development processes).
Most companies are working with a horrific sprawl of code, much of it legacy with little ownership. Red teaming, like buying tools and pushing for high coverage, is an attractive strategy to business leaders because it doesn't require them to tackle the hardest problems (development priorities, expertise, institutional knowledge, talent, retention) that factor into application security.
Formal verification is unfortunately hard in the ways that companies who want to think of security as a simple resource allocation problem most likely can't really manage.
I would love to work on projects/with teams that see formal verification as part of their overall correctness and security strategy. And maybe doing things right can be cheaper in the long run, including in terms of token burn. But I'm not sure this strategy will be applicable all that generally; some teams will never get there.
Jolter•Apr 16, 2026
Is it possible to prove security properties about a web application?
Mistletoe•Apr 15, 2026
Everything eventually turns into Bitcoin. That’s what I plan to see in the future years and decades. Satoshi just saw it first.
umvi•Apr 15, 2026
> You don’t get points for being clever. You win by paying more.
And yet... Wireguard was written by one guy while OpenVPN is written by a big team. One code base is orders of magnitude bigger than the other. Which should I bet LLMs will find more cybersecurity problems with? My vote is on OpenVPN despite it being the less clever and "more money thrown at" solution.
So yes, I do think you get points for being clever, assuming you are competent. If you are clever enough to build a solution that's much smaller/simpler than your competition, you can also get away with spending less on cybersecurity audits (be they LLM tokens or not).
zachdotai•Apr 15, 2026
we did a lot of thinking around this topic. and distilled it into a new way to dynamically evaluate the security posture of an AI system (which can apply for any system for that matter). we wrote some thoughts on this here: https://fabraix.com/blog/adversarial-cost-to-exploit
sdevonoes•Apr 15, 2026
Please. Are we going to rely now in Anthropic et al to secure our systems? Wasn’t enough to rely on them to build our systems? What’s next? To rely on them for monitoring and observability? What else? Design and mockups?
tptacek•Apr 15, 2026
The nice thing about vulnerability research is that you either have a vulnerability or you don't. There's no such thing as a "slop vulnerability".
lopityuity•Apr 15, 2026
"We burned 10 trillion tokens and the Amazon rain forest is now a desert, but our stochastic parrot discovered that if a user types '$-1dffj39fff%FFj$@#lfjf' 10 thousand times into a terminal that you can get privilege escalation on a Linux kernel from 10 years ago. The best part? We avoided paying anyone outside of the oligarchy for the discovery of this vulnerability."
In your embarrassingly reductive binary vulnerability state worldview? Have.
a34729t•Apr 15, 2026
If we rely on Anthropic to write our system, it's only natural to rely on them to secure it. Seriously, at the big tech companies were rapidly approaching all code being written by LLMs... so at least we have the close the security chain quickly.
singpolyma3•Apr 15, 2026
If you run this long enough presumably it will find every exploit and you patch them all and run it again to find exploits in your patches until there simply... Are no exploits?
heliumtera•Apr 15, 2026
In other news, token seller says tokens should be bought
protocolture•Apr 15, 2026
>You don’t get points for being clever. You win by paying more.
Really depends how consistently the LLMs are putting new novel vulnerabilities back in your production code for the other LLMs to discover.
devmor•Apr 15, 2026
> to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.
If we take this at face value, it's not that different than how a great deal of executive teams believe cybersecurity has worked up to today. "If we spend more on our engineering and infosec teams, we are less likely to get compromised".
The only big difference I can see is timescale. If LLMs can find vulnerabilities and exploit them this easily (and I do take that with a grain of salt, because benchmarks are benchmarks), then you may lose your ass in minutes instead of after one dedicated cyber-explorer's monster energy fueled, 7-week traversal of your infrastructure.
I am still far more concerned about social engineering than LLMs finding and exploiting secret back doors in most software.
BloondAndDoom•Apr 15, 2026
Security always had “defender’s dilemma” (an attacker needs to find one thing, but defender needs to fix everything) problem, nothing is new in terms of AI’s impact just application of different resources and units.
wheelerwj•Apr 15, 2026
Cybersecurity has always been proof of work. Fuck, most of software development is proof of work by that logic. Thats why many attacks originate from countries were the cost of living is a fraction of the COL in the United States. They can throw more people at the problem because its cheaper to do so.
But I don't really get the hype, we can fix all the vulnerabilities in the world but people are still going to pick up parking-lot-USBs and enter their credentials into phishing sites.
_pdp_•Apr 15, 2026
All of the recent news read like something that could happen in a cyberpunk novel - AIs that defend vs AIs that do the attacks.
> Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks)
Would it? I’m old school but I’ve never trusted these massive dependency chains.
That’s a nit.
We’re going to have to write more secure software, not just spend more.
xeyownt•Apr 16, 2026
Yeah, exactly.
Your wall should be made of a small number bricks you bet your life on.
All the rest goes inside.
dataviz1000•Apr 15, 2026
> to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them.
I, for the NFL front offices, created a script that exposed an API to fully automate Ticketmaster through the front end so that the NFL could post tickets on all secondary markets and dynamic price the tickets so if rain on a Sunday was expected they could charge less. Ticketmaster was slow to develop an API. Ticketmaster couldn't provide us permission without first developing the API first for legal reasons but told me they would do their best to stop me.
They switched over to PerimeterX which took me 3 days to get past.
Last week someone posted an article here about ChatGPT using Cloudflare Turnstile. [0] First, the article made some mistakes how it works. Second, I used the [AI company product] and the Chrome DevTools Protocol (CDP) to completely rewrite all the scripts intercepting them before they were evaluated -- the same way I was able to figure out PerimeterX in 3 days -- and then recursively solve controlling all the finger printing so that it controls the profile. Then it created an API proxy to expose ChatGPT for free. It required some coaching about the technique but it did most of the work in 3 hours.
These companies are spending 10s of millions of dollars on these products and considering what OpenAI is boasting about security, they are worthless.
Does this mean all code written before Mythos is a liability?
samuelknight•Apr 15, 2026
All code is a liability in general. All code written before LLMs and during the current in-between years are vulnerable to the next frontier model. Eventually we will settle into a new paradigm that correctly addresses the new balance of effort.
samuelknight•Apr 15, 2026
I don't know about Mythos but the chart understates the capability of the current frontier models. GPT and Claude models available today are capable of Web app exploits, C2, and persistence in well under 10M tokens if you build a good harness.
The benchmark might be a good apples-to-apples comparison but it is not showing capability in an absolute sense.
danieltk76•Apr 15, 2026
There are never ending ways to make agents better at hacking. Defense is clearly behind. At my startup we are constantly coming up with new defensive measures to put our hacking agent Sable against, and I've determined that you basically need to be air gapped in the future for a chance of survival. A SOC of AI agents can't keep up with 1 AI hacker on a network that is even remotely stealthy. it is a disaster. wrote an article about it:
https://blog.vulnetic.ai/evading-an-ai-soc-with-sable-from-v...
dangero•Apr 16, 2026
Agree with this — the economics have completely changed. Along these lines, we all need to re-scope our personal cybersecurity.
For example, developers should no longer run dev environments on the same machine where they access passwords, messages, and emails — no external package installation on that box at all.
SaaS Password Managers — assume your vault will be stolen from whichever provider is hosting it.
Ubikeys will be more important than ever to airgap root auth credentials.
ofjcihen•Apr 16, 2026
“Sable began with an initial port scan of 10.10.1.10 and then authenticated to the target.”
That would have started a P2 and woken up a senior IR responder anywhere that I’ve worked. Are you sure you’re running a realistic defender environment?
carlcortright•Apr 16, 2026
My first thought seeing the title: "always has been"
Everything in modern corporate is just proof of work. Security is filling out forms. Engineering is just endless talking. Token-maxing is the new meta.
karmasimida•Apr 16, 2026
Trusted software will be so expensive that it will effectively kill startups for infrastructure, unless they can prove they spent millions of dollars hardening their software.
I predict the software ecosystem will change in two folds: internal software behind a firewall will become ever cheaper, but anything external facing will become exponential more expensive due to hacking concern.
riffic•Apr 16, 2026
those hacking concerns are just as valid inside as well as outside the firewall.
karmasimida•Apr 16, 2026
You can enforce physical isolation to make sure hacking isn’t possible at least without some level of physical intrusion
rgmerk•Apr 16, 2026
Maybe I’m missing something, but there’s also the idea that you don’t need to be perfectly secure, you just need to be secure enough that it’s not worth the effort to break in.
In the case of crooks (rather than spooks) that often means your security has to be as good as your peers, because crooks will spend their time going with the best gain/effort ratio.
peterbell_nyc•Apr 16, 2026
Why crack one website when you can crack all of them? For a well funded (especially nation state) attacker, if $1 in compute and effort returns $2 in ransoms, when it's possible to access another n x $1 of compute and if you don't hit diminishing returns or cashflow limitations, why wouldn't you just keep spending $'s until you p0wned all the systems?
If there is only one bear, you just need to run faster than your friends. If there's a pack of them, it you need to start training much harder!
linkregister•Apr 16, 2026
The supply chain attack is interesting in that it doesn't require any marginal effort for an attacker to get an initial exploit for additional targets. Then the bottleneck is post-exploitation efforts and value of the targets.
bmitch3020•Apr 16, 2026
> If corporations that rely on OSS libraries spend to secure them with tokens, it’s likely going to be more secure than your budget allows.
That's a really big "if". Particularly since so many companies don't even know all of the OSS they are using, and they often use OSS to offload the cost of maintaining it themselves.
My hope is when the dust settles, we see more OSS SAST tools that are much better at detecting vulnerabilities. And even better if they can recommend fixes. OSS developers don't care about a 20 point chained attack across a company network, they just want to secure their one app. And if that app is hardened, perhaps that's the one link of the chain the attackers can't get past.
NegativeK•Apr 16, 2026
> Particularly since so many companies don't even know all of the OSS they are using, and they often use OSS to offload the cost of maintaining it themselves.
Companies that market to the EU are going to need to find out real fast.
chaitanyya•Apr 16, 2026
What do they mean when they say "no diminishing returns?" does this essentially mean the code you are testing has no bounded state space and you continue to find infinite paths?
Because we have tools and techniques that can guarantee the absence of certain behavior in a bounded state space using formal methods (even unbounded at times)
Sure, it's hard to formally verify everything but if you are dealing with something extremely critical why not design it in a way that you can formally verify it?
But yeah, the easy button is keep throwing more tokens till you money runs out of money
creatonez•Apr 16, 2026
I mostly agree with the article.
> You don’t get points for being clever
Not sure about this framing, this can easily lead to the wrong conclusions. There is an arms race, yes, and defenders are going to need to spend a lot of GPU hours as a result. But it seems self-evident that the fundamentals of cybersecurity still matter a lot, and you still win by being clever. For the foreseeable future, security posture is still going to be a reflection of human systems. Human systems that are under enormous stress, but are still fundamentally human. You win by getting your security culture in order to produce (and continually reproduce) the most resilient defense that masters both the craft and the human element, not just by abandoning human systems in favor of brute forcing security problems away as your only strategy.
Indeed, domains that are truly security critical will acquire this organizational discipline (what's required is the same type of discipline that the nuclear industry acquires after a meltdown, or that the aviation industry acquires after plane crashes), but it will be a bumpy ride.
This article from exactly 1 year ago is almost prophetic to exactly what's going on right now and the subtle ways in which people are most likely to misunderstand the situation: https://knightcolumbia.org/content/ai-as-normal-technology
pcblues•Apr 16, 2026
As a result of all this AI "find a zero-day" business, when I boot to windows I open the task manager and order by pid. I kill anything I didn't start or don't recognise.
The only process that scared me was windowgrid. It kept finding a way back when I killed all the "start with boot" locations I know. Run, runonce, start up apps, etc. Surely it's not in autoexec.bat :)
Maybe code quality shouldn't be considered cybersecurity in the first place?
When things are tagged "cybersecurity", compliance/budget/manager/dashboard/education/certification are the usual response...
I don't think it would be an appropriate response for code quality issues, and it would likely escape the hands of the very people who can fix code quality issues, ie. developers.
mikewarot•Apr 16, 2026
Long ago, during the Viet Nam conflict, the US government learned that computers needed to be able to securely process data from multiple levels of classification simultaneously. Research in the 1970s found solutions that were adopted in the Mainframe world, like KeyKOS and EROS. Then the PC revolution swept all that away, and we're here 40+ years later, with operating systems that trust every bit of code the user runs with that user's full authority.
It's nuts. If the timing were slightly different, none of this "Cybersecurity" would even be a thing. We'd just have capabilities based, secure general purpose computation.
necovek•Apr 16, 2026
While I agree we are re-learning lessons from ages ago and reinventing the same tech, I believe the problem comes from the desire to manage the same data with different software. On desktops, imagine your photo library that you might view with one set of programs, modify with another, try out a completely new one to make videos out of photos...
As soon as there are multiple programs with full authority on your data, "cybersecurity" happens. And internet/web is that to the power of 100.
Openpic•Apr 16, 2026
The PoW analogy completely ignores the actual hard part: fixing the stuff. It’s cool if you burn millions of tokens to find 1,000 bugs, but it's completely useless if your small team only has the bandwidth to safely patch 5 of them without taking down prod.
TZubiri•Apr 16, 2026
I remain skeptical, security is not a notch that you can turn, you can't shove more money or more tokens and make the thing more security.
Not saying security will never be dominated by AI like it happened with chess, with maps, with Go, with language. But just braindead money to security pipeline? Skeptical.
amarant•Apr 16, 2026
Am I the only one who thinks this is exactly like it was before AI, when we used small batch hand crafted tokens made by organic engineers to find vulnerabilities?
These mass-produced tokens are just cheaper...
aidenn0•Apr 16, 2026
Cheaper and more fungible. Companies pay lots of money for mediocre security audits. Most attackers aren't very good either. However it only takes one good attacker.
If the attacker and defender are using the same AI model, then (up to some inflection point) whoever spends more finds the most vulnerabilities.
choeger•Apr 16, 2026
To me it looks like formal verification is going to be the answer. We're going to move up the ladder and write formal specs and proofs soon.
Briannaj•Apr 16, 2026
really, really?
After how many years of "shifting left" and understanding the importance of having security involved in the dev and planning process, now the recommendation is to vibe code with human intuition, review then spend a million tokens to "harden"?
I understand that isn't the point of the article and the article does make sense in its other parts. But that last paragraph leaves me scratching my head wondering if the author understands infosec at all?
xarope•Apr 16, 2026
I can see the dichotomy forming in the "post AI" world;
1) massive companies spending millions of tokens to write+secure their software
2) in the shadows, "elite" software contractors writing bespoke software to fulfill needs for those who can't afford the millions, or fix cracks in (1)
(Oh wait, I think this is what is happening now, anyway, minus the millions of tokens)
pelasaco•Apr 16, 2026
I don't think open source will get stronger. Those who have enough GPU power won't depend on multiple human eyes anymore. AI will be enough.
I already see this happening: companies are moving toward AI-generated code (or forking projects into closed source), keeping their code private, AI written pipelines taking care of supply chain security, auditing and developing it primarily with AI.
At that point, for some companies, there's no real need for a community of "experts" anymore.
devinabox•Apr 16, 2026
The cost of this is going to come down dramatically - just throwing the model at the codebase is a really inefficient process. My own experiments show that spending more tokens on understanding and transforming how the codebase can be explored(i.e enumerating source to sink traces) drastically lowers the cost to confirm vulnerabilities.Something that excites me greatly is that software quality has been incredibly difficult primarily because no single developer can hold the entire contract in their head and analyze it. It's now a reality that we can transform raw source code into actionable artifacts that allow a system to see the big picture and pin point the fracture points within it.
self_awareness•Apr 16, 2026
> This chart suggests an interesting security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them.
What's new?
It was always about spending more money on something.
Team has no capacity? Because the company doesn't invest in the team, doesn't expand it, doesn't focus on it.
We don't have enough experts? Because the company doesn't invest in the team, doesn't raise the salary bar to get new experts, it's not attractive to experts in other companies.
It was always about "spending tokens more than competitors", in every area of IT.
admiralrohan•Apr 16, 2026
"Security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them. To harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them." - This feels similar to missile defense dilemma. Spending 2M$ missile to attack a 20k$ drone.
4ashUa•Apr 16, 2026
The problem with the security researcher industry is that it is infested with self promoters who talk about methodologies and tools but have never written any secure software themselves. Or any software at all, as the GitHub accounts from some of these geniuses show.
Of course those are attracted to new tools and AI shill institutes like AISI (yes, the UK government is shilling for AI, it understands a proper grift that benefits the elites).
Security "research" is perfect for talkers and people who produce powerpoint graphs that sell their latest tools.
You still can sit down and write secure software, while the "researchers" focus on the same three soft targets (sudo, curl, ffmpeg) over an over again and get $100,000 in tokens and salaries for a bug in a protocol from the 1990s that no one uses. Imagine if this went to the authors instead.
But no, government money MUST go to the talkers and powerpointists. Always.
zkmon•Apr 16, 2026
Give more ammo to bad actors and sell the ammo to defenders, charge both for tokens. Why isn't this business model banned already?
jdthedisciple•Apr 16, 2026
I don't understand the nature of the supposed security incidents found by LLMs:
Are these totally previously unknown security holes or are they still generally within the umbrella of our understanding of cybersecurity itself?
If it's the latter, why can't we systematically find and fix them ourselves?
H8crilA•Apr 16, 2026
"why isn't everything that could be discovered already discovered"
jdthedisciple•Apr 16, 2026
So you believe AI actually discovered novel ways to compromise computer software that had previously been unknown to the entirety of cyber security experts in the world?
Big if true. Can you cite an example? I'm all ears.
H8crilA•Apr 16, 2026
No, and nothing I've written suggested that. If you're an AI bot then your alignment needs fixing.
jdthedisciple•Apr 16, 2026
No idea what you're trying to say then. All I demodulated from you thus far is some passive aggression. Perhaps I'm just not in tune with your frequencies rn.
xeyownt•Apr 16, 2026
Then the question: what is cheaper, secure a code base written by humans, or secure a code base vibe coded with an army of agents?
antirez•Apr 16, 2026
Why this is the wrong analogy: finding hash collisions, while exponentially harder with N, is guaranteed to find, with enough work, some S so that H(S) satisfies N, so an asymmetry of resources used will have the side with more work eventually winning. But bugs are different: 1. different LLMs executions take different branches, but eventually the branches possible based on the code possible states are saturated. 2. if we imagine sampling the model for a bug in a given code M times, with M large, eventually the cap becomes not M (because of saturated state so of the code and the LLM sampler) but I, the model intelligence level. The OpenBSD SACK bug easily shows that: you can run an inferior model for an infinite number of times, it will never realize that the lack of validation of the start window if put together with the integer overflow then put together with the fact the branch where the node should never be NULL is entered produce the bug. So cyber security of tomorrow will not be like proof of work "more GPU wins", but better models and faster access to such models win.
fhd2•Apr 16, 2026
Agreed, it is different in terms of there being no guarantee that a specific piece of software even has an exploit. If you don't want to break into a specific piece of software, or even a specific system, I would argue that the law of averages applies: If you just invest enough, you'll likely find _something_ worth exploiting.
In other terms, I feel the argument from TFA generally checks out, just on a different level than "more GPU wins". It's one up: "More money wins". That's based on the premise that more capable models will be more expensive, and using more of it will increase the likelihood of finding an exploit, as well as the total cost. What these model providers pay for GPUs vs R&D, or what their profit margin is, I'd consider less central.
But then again, AI didn't change this, if you have more money you can find more exploits: Whether a model looks for them or a human.
motbus3•Apr 16, 2026
Interesting reading, but it brings me some thoughts.
Security was always about having more money/resources.
Using more tokens is just another measure for the same.
Some previous post, which I cannot verify myself, stated that mythos is not as powerful as it seems to be as the same bugs could be found using much smaller/simpler models and that the method is the key part.
Whether an arbitrary piece of code can be exploited is obviously undecidable (i.e. it is equivalent to the halting problem). Let me give you an example that will sketch why this is the case: `if(sha(input1)==12345) { run_shell(input2);}`. The real question is how does this look like in practice, for the code that we as humans actually use in our networks.
riteshkew1001•Apr 16, 2026
Proof-of-work analogy is backwards. Bitcoin is symmetrical, every miner runs the same hash. Cybersecurity is an NP-hard chess match where the attacker gets to pick the square. 100M tokens for one exploit also means 100M tokens × every asset to cover the defender side. "Both sides spending" isn't parity when one side wins with a single find.
nektro•Apr 16, 2026
we're going to have to get a lot better at building test suites. for example every js exploit found in browsers should also be added to https://github.com/tc39/test262.
ang_cire•Apr 16, 2026
> This chart suggests an interesting security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them.
What this fails to take into account is that unless the codebase is changed, there are a finite amount of actual (and even fewer actionable) bugs in a piece of code, but an infinite amount of potential attacker spend; nothing stops you running mythos against it, whether it finds anything or not, and because each run is atomic by nature, you just have to play the numbers out and see when the average vuln discovery rate is dropping. You could spend a billion dollars and not find anything, without the defender spending a cent.
Generally speaking, the advantage goes to whoever can spend more time or money on security research (this has always been true, which is why the NSA was able to find Windows exploits that M$ did not). But eventually the fount of bugs in a piece of software will dry up, and attackers have no way of knowing if that's the case or not before dumping money at it.
63 Comments
Of course it's trivially NOT true that you can defend against all exploits by making your system sufficiently compact and clean, but you can certainly have a big impact on the exploitable surface area.
I think it's a bit bizarre that it's implicitly assumed that all codebases are broken enough, that if you were to attack them sufficiently, you'll eventually find endlessly more issues.
Another analogy here is to fuzzing. A fuzzer can walk through all sorts of states of a program, but when it hits a password, it can't really push past that because it needs to search a space that is impossibly huge.
It's all well and good to try to exploit a program, but (as an example) if that program _robustly and very simply_ (the hard part!) says... that it only accepts messages from the network that are signed before it does ANYTHING else, you're going to have a hard time getting it to accept unsigned messages.
Admittedly, a lot of today's surfaces and software were built in a world where you could get away with a lot more laziness compared to this. But I could imagine, for example, a state of the world in which we're much more intentional about what we accept and even bring _into_ our threat environment. Similarly to the shift from network to endpoint security. There are for sure, uh, million systems right now with a threat model wildly larger than it needs to be.
- a very large codebase
- a codebase which is not modularized into cohesive parts
- niche languages or frameworks
- overly 'clever' code
> Worryingly, none of the models given a 100M budget showed signs of diminishing returns. “Models continue making progress with increased token budgets across the token budgets tested,” AISI notes.
So, the author infers a durable direct correlation between token spend and attack success. Thus you will need to spend more tokens than your attackers to find your vulnerabilities first.
However it is worth noting that this study was of a 32-step network intrusion, which only one model (Mythos) even was able to complete at all. That’s an incredibly complex task. Is the same true for pointing Mythos at a relatively simple single code library? My intuition is that there is probably a point of diminishing returns, which is closer for simpler tasks.
In this world, popular open source projects will probably see higher aggregate token spend by both defenders and attackers. And thus they might approach the point of diminishing returns faster. If there is one.
For instance, if failing any step locks you out, your probability of success is p^N, which means it’s functionally impossible with enough layers.
It is not that one would design a system in this manner because you'd never design a loophole in no matter the steps it takes to get there: it is just a benchmark.
I wouldn't use those as excuses to dismiss AI though. Even if this model doesn't break your defences, give it 3 months and see where the next model lands.
Better to write good, high-quality, properly architected and tested software in the first place of course.
Edited for typo.
In fact, security programs built on the idea that you can find and patch every security hole in your codebase were basically busted long before LLMs.
Yeah, it sucks. But you're getting paid, among other things, to put up with some amount of corporate suckiness.
I tend to encourage Firefox over Cr flavoured browsers because FF (for me) are the absolute last to dive in with fads and will boneheadedly argue against useful stuff until the cows come home ... Web Serial springs to mind (which should finally be rocking up real soon now).
Oh and they are not sponsored by Google errm ... 8)
I'm old enough to remember having to use telnet to access the www (when it finally rocked up and looked rather like Gopher and WAIS) (via a X.25 PAD) and I have seen the word "unsupported" bandied around way too often since to basically mean "walled garden".
I think that when you end up using the term "unsupported browser" you have lost any possible argument based on reason or common decency.
why in the absolute fuck would I want random web pages to be able to control all the devices connected to my computer?
Developers usually need elevated privileges, executing unverified arbitrary code is literally their job. Their machines are not trustworthy, and yet, they often have access to the entire company internal network. So you get a situation where they have both too much privilege (access to resources beyond the scope of their work) and too little (some dev tools being unavailable).
You can do a lot better efficiency-wise if you control the source end-to-end though - you already group logically related changes into PRs, so you can save on scanning by asking the LLM to only look over the files you've changed. If you're touching security-relevant code, you can ask it for more per-file effort than the attacker might put into their own scanning. You can even do the big bulk scans an attacker might on a fixed schedule - each attacker has to run their own scan while you only need to run your one scan to find everything they would have. There's a massive cost asymmetry between the "hardening" phase for the defender and the "discovering exploits" phase for the attacker.
Exploitability also isn't binary: even if the attacker is better-resourced than you, they need to find a whole chain of exploits in your system, while you only need to break the weakest link in that chain.
If you boil security down to just a contest of who can burn more tokens, defenders get efficiency advantages only the best-resourced attackers can overcome. On net, public access to mythos-tier models will make software more secure.
Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.
(also, hi Thomas!)
Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.
Seems like a waste of money; wouldn't it be better to extract the AST deterministically, write it out and only then ask an LLM to change those auto-generated symbol names with meaningful names?
Ha
>for free.
Haha, it is more complicated in reality
Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.
Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.
Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.
Prediction 4. To push back, everything will move to thin clients.
Obvious possibilities include:
* More use of software patents, since these apply to underlying ideas, rather than specific implementations.
* Stronger DMCA-like laws which prohibit breaking technical provisions designed to prevent reverse engineering.
Similarly, if the people predicting that humans are going to be required to take ultimate responsibility for the behaviour of software are correct, then it clearly won't be possible for that to be any random human. Instead you'll need legally recognised credentials to be allowed to ship software, similar to the way that doctors or engineers work today.
Of course these specific predictions might be wrong. I think it's fair to say that nobody really knows what might have changed in a year, or where the technical capabilities will end up. But I see a lot of discussions and opinions that assume zero feedback from the broader social context in which the tech exists, which seems like they're likely missing a big part of the picture.
Taken to an extreme, the end result is a dark forest. I don't like what that means for entrepreneurship generally.
It does mean that the hoped-for 10x productivity increase from engineers using LLMs is eroded by the increased need for extra time for security.
This take is not theoretical. I am working on this effort currently.
I think that such a posture would only be possible in a mature company where concerns are already separated to the point where only a handful of administrators have actual SSO or username/passphrase access to important resources.
Sorry, how does that work?
This seems wrong however, as it ignores the arrow of time. The full source code has been scanned and fixed for things that LLMs can find before hitting production, anyone exfiltrating your codebase can only find holes in stuff with their models that is available via production for them to attack and that your models for some reason did not find.
I don't think there is any reason to suppose non-nation state actors will have better models available to them and thus it is not a dark forest, as nation states will probably limit their attacks to specific things, thus most companies if they secure their codebase using LLMs built for it will probably be at a significantly more secure position than nowadays and, I would think, the golden age of criminal hacking is drawing to a close. This assume companies smart enough to do this however.
Furthermore, the worry about nation state attackers still assumes that they will have better models and not sure if that is likely either.
I don't see the connection.
In the use of the phrase Dark Forest to explain the Fermi paradox it suggests that alien civilizations have kept themselves dark out of fear that the rest of the forest is actually lawless and violent.
In this case though we are entering a dark forest, like Hansel and Gretel, supposedly defenseless against the monsters that lurk in there, but really - they weren't that defenseless were they? I don't think the phrase that apt.
And in a world where companies begin to suffer from attacks as a result - can the ones who are willing to invest in security defend themselves, not just against cyberattackers, but against a broader investor and customer backlash that believes that startups that build their own technology stacks are riskier due to perceptions about cybersecurity?
An angel investor or LP who sees news articles and media about cyberattacks, then has a portfolio company get hacked in a material way, may simply decide the space has become too risky for further investments, no matter how much prospects get on better security footings.
The dark forest hypothesis, at its core, is about a decision of whether to put your neck out in the universe; if the weapons and countermeasures being used are too horrifying to fathom, the risks unquantifiable, one chooses not to extend one's neck. And that is how an industry begins to dry.
[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...
As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...
There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.
I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.
Here we go again.
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.
Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.
Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.
I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.
The real improvements are going to be in tooling and harnessing.
Well, you need to harden everything, the attacker only needs to find one or at most a handful of exploits.
Yeah, but it's not like the attacker knows where to look without checking everything, it it?
If you harden and fix 90% of vulns, the attacker may give up when their attempts reach 80% of vulns.
It's the same as it has ever been; you don't need to outrun the bear, you only need to outrun the other runners.
That is not at all relevant to "security via obscurity" or similar arguments: having the source in the open may (eventually) be more secure, but it lowers the token-spend for the attacker.
It seems like that is perhaps not the case anymore with the Mythos model?
What accounts are these?
I've seen some people use this but I cannot imaging that anyone thinks this is the best.
For example I've had success telling LLMs to scan from application entry points and trace execution, and that seems an extremely obvious thing to do. I can't imagine others in the field don't have much better approaches.
I don't expect Claude Code Review to be a replacement for a good vendor's solution.
You cannot get away with „well no one is going to spend time writing custom exploit to get us” or „just be faster than slowest running away from the bear”.
Assuming your code is inaccessible isn't good for security. All security reviews are done assuming code source is available. If you don't provide the source, you'll never score high in the review.
Primitive? I'd say simple and thorough.
Worse, "attackers no longer break in, they log in", so the supply chain attacks harvesting credentials have been frightening
This really is not the case.
You have freedom of methodology.
You can also ask it to enumerate various risks and find proof of existence for each of them.
Certainly our LLM audits are not just a prompt per file - so I have a hard time believing that best in class tools would do this.
And using an LLM to audit your code isn't necessarily a case of turning it into perfect code, it's to keep ahead of the other side also using an LLM. You don't need to outrun the bear, just the other hikers.
nothing is better or worse, basically as its always been.
if you think otherwise, stop ignoring the past.
you are addicted to dopamine. think carefully and take good care of yourself
Put more simply: to keep your system secure, you need to be fixing vulnerabilities faster than they're being discovered. The token count is irrelevant.
Moreover: this shift is happening because the automated work is outpacing humans for the same outcome. If you could get the same results by hand, they'd count! A sev:crit is a sev:crit is a sev:crit.
1) The number of vulnerabilities surfaced (and fixed?) in a given software is roughly proportional to the amount of attention paid to it.
2) Attention can now be paid in tokens by burning huge amounts of compute (bonus: most commonly on GPUs, just like crypto!)
3) Whoever finds a vulnerability has a valuable asset, though the value differs based on the criticality of the vulnerability itself, and whether you're the attacker or the defender.
More tokens -> more vulns is not a guarantee of course, it's a stochastic process... but so is PoW!
There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.
This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.
I disagree.
The defender must be right every single time. The attacker only has to get lucky and thanks to scale they can do that every day all day in most large organizations.
The time is a cost, but at scale any individual target is a pretty minor investment since it's 90%+ automated. Also, these aren't folks that are otherwise highly employable. The opportunity cost to them is also usually very low.
The last attacker I got into a conversation with was interesting. Turns out, he was a 16 year old from Atlanta GA using a toolkit as an affiliate. He claimed he made ~100k/year and used the money on cars and girls. I felt like he was inflating that number to brag. His alternative probably would have been McDonalds, and as a minor if he got caught it would've been probation most likely. I told him to come to the blue team, we pay better.
To use your example, if the odds of the guard being asleep and the vault being unlocked are both 1% we have a 0.0001 chance on any given day. Phew, we're safe...
Except that Google says there are 68,632 bank branch locations in the US alone. That means it will happen roughly 7 times on any given day someplace in America!
Now apply that to the scale of the internet. The attackers can rattle the locks in every single bank in an afternoon for almost zero cost.
The poorly defended ones have something close to 100% odds of being breached, and the well defended ones how low odds on any given day, but over a long enough timeline it becomes inevitable.
To again use your bank example. if we only have one bank, but keep those odds it means that over about 191 years the event will happen 7 times. Or to restate that number, it is like to happen at least once every 27 years. You'll have about 25% odds of it happening in any 7 year span.
For any individual target, it becomes unlikely, but also still inevitable.
From an attackers perspective this means the game is rigged in their favor. They have many billions of potential targets, and the cost of an attack is close to zero.
From a defenders perspective it means realizing that even with defense in depth the breach is still going to happen eventually and that the bigger the company is the more likely it is.
Cyber is about mitigating risk, not eliminating it.
Until the attacker has initial access.
Then the attacker needs to be right every single time.
Imo, cybersecurity looks like formally verified systems now.
You can't spend more tokens to find vulnerabilities if there are no vulnerabilities.
But part of me has been wondering for a while now whether proofs of correctness is the way out of the NVIDIA infinite money glitch. IDK if we're there yet but it's pretty much the only option I can imagine.
For example from this article:
> Karpathy: Classical software engineering would have you believe that dependencies are good (we’re building pyramids from bricks), but imo this has to be re-evaluated, and it’s why I’ve been so growingly averse to them, preferring to use LLMs to “yoink” functionality when it’s simple enough and possible.
Anyone who's heard of "leftpad" or is a Go programmer ("A little copying is better than a little dependency" is literally a "Go Proverb") knows this.
Another recent set of posts to HN had a company close-sourcing their code for security, but "security through obscurity" has been a well understand fallacy in open source circles for decades.
Maybe we could start with the prompts for the code generation models used by developers.
I dunno about that quoted bit; "Defense in depth" (Or defense via depth) is a good thing, and obscurity is just one of those layers.
"Security through obscurity" is indeed wrong if the obscurity is a large component of the security, but it helps if it is just another layer of defense in the stack.
IOW, harden your system as if it were completely transparent, and only then make it opaque.
The times, as they say, are a-changin’.
Open software is not inherently more secure than closed software, and never has been.
Its relative security value was always derived from circumstantial factors, one of the most important of which was the combination of incentive and ability and willingness of others in the community to spend their time and attention finding and fixing bugs and potential exploits.
Now, that’s been the case for so long that we all implicitly take it for granted, and conclude that open software is generally more secure than closed, and that security through obscurity falls short in comparison.
But this may very well fundamentally change when the cost of navigating the search space of potential exploits, for both the attacker and the defender, is dramatically reduced along the axes of time and attention, and increased along the axis of monetary investment.
It then becomes a game of which side is more willing to pool monetary resources into OSS security analysis – the attackers or the defenders – and I wouldn’t feel comfortable betting on the defenders in that case.
I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.
(Fan of your writing, btw.)
You might well be right, it is not an area I know much of or work in. But I'm a fan of reliable sources for claims. It is far to easy to make general statements on the internet that appear authorative.
Unfortunately, they fit straight lines to graphs with y axis from 0 to 100% and x axis being time - which is not great. Should do logistic instead.
Seems much like those secretly tobacco industry funded reports about tobacco being safe and such.
I want to believe formal methods can help, not because one doesn't have to think about it, but because the time freed from writing code can be spent on thinking on systems, architecture and proofs.
1. A proof mindset is really hard to learn.
2. Writing theorem definitions can be hard, but writing a proof can be even harder. So, if you could write just the definitions, and let an LLM handle all the tactics and steps, you could use more advanced techniques than just a SAT solver.
So I guess LLMs only marginally help with (1), but they could potentially be a big help for (2), especially with more tedious steps. It would also allow one to use first order logic, and not just propositional logic (or dependant types if you're into that).
(It's true that formalization can still have bugs in the definition of "secure" and doesn't work for everything, which means defenders will still probably have to allocate some of their token budget to red teaming.)
You can only do this if you have a very clear sense of what your code should be doing. In most codebases I've ever worked with, frankly, no one has any idea.
Red teaming as an approach always has value, but one important characteristic it has is that you can apply red teaming without demanding any changes at all to your code standards, or engineering culture (and maybe even your development processes).
Most companies are working with a horrific sprawl of code, much of it legacy with little ownership. Red teaming, like buying tools and pushing for high coverage, is an attractive strategy to business leaders because it doesn't require them to tackle the hardest problems (development priorities, expertise, institutional knowledge, talent, retention) that factor into application security.
Formal verification is unfortunately hard in the ways that companies who want to think of security as a simple resource allocation problem most likely can't really manage.
I would love to work on projects/with teams that see formal verification as part of their overall correctness and security strategy. And maybe doing things right can be cheaper in the long run, including in terms of token burn. But I'm not sure this strategy will be applicable all that generally; some teams will never get there.
And yet... Wireguard was written by one guy while OpenVPN is written by a big team. One code base is orders of magnitude bigger than the other. Which should I bet LLMs will find more cybersecurity problems with? My vote is on OpenVPN despite it being the less clever and "more money thrown at" solution.
So yes, I do think you get points for being clever, assuming you are competent. If you are clever enough to build a solution that's much smaller/simpler than your competition, you can also get away with spending less on cybersecurity audits (be they LLM tokens or not).
In your embarrassingly reductive binary vulnerability state worldview? Have.
Really depends how consistently the LLMs are putting new novel vulnerabilities back in your production code for the other LLMs to discover.
If we take this at face value, it's not that different than how a great deal of executive teams believe cybersecurity has worked up to today. "If we spend more on our engineering and infosec teams, we are less likely to get compromised".
The only big difference I can see is timescale. If LLMs can find vulnerabilities and exploit them this easily (and I do take that with a grain of salt, because benchmarks are benchmarks), then you may lose your ass in minutes instead of after one dedicated cyber-explorer's monster energy fueled, 7-week traversal of your infrastructure.
I am still far more concerned about social engineering than LLMs finding and exploiting secret back doors in most software.
But I don't really get the hype, we can fix all the vulnerabilities in the world but people are still going to pick up parking-lot-USBs and enter their credentials into phishing sites.
I think were are already here. I wrote something about this, if you are interested: https://go.cbk.ai/security-agents-need-a-thinner-harness
Would it? I’m old school but I’ve never trusted these massive dependency chains.
That’s a nit.
We’re going to have to write more secure software, not just spend more.
Your wall should be made of a small number bricks you bet your life on.
All the rest goes inside.
I, for the NFL front offices, created a script that exposed an API to fully automate Ticketmaster through the front end so that the NFL could post tickets on all secondary markets and dynamic price the tickets so if rain on a Sunday was expected they could charge less. Ticketmaster was slow to develop an API. Ticketmaster couldn't provide us permission without first developing the API first for legal reasons but told me they would do their best to stop me.
They switched over to PerimeterX which took me 3 days to get past.
Last week someone posted an article here about ChatGPT using Cloudflare Turnstile. [0] First, the article made some mistakes how it works. Second, I used the [AI company product] and the Chrome DevTools Protocol (CDP) to completely rewrite all the scripts intercepting them before they were evaluated -- the same way I was able to figure out PerimeterX in 3 days -- and then recursively solve controlling all the finger printing so that it controls the profile. Then it created an API proxy to expose ChatGPT for free. It required some coaching about the technique but it did most of the work in 3 hours.
These companies are spending 10s of millions of dollars on these products and considering what OpenAI is boasting about security, they are worthless.
[0] https://news.ycombinator.com/item?id=47566865
The benchmark might be a good apples-to-apples comparison but it is not showing capability in an absolute sense.
For example, developers should no longer run dev environments on the same machine where they access passwords, messages, and emails — no external package installation on that box at all.
SaaS Password Managers — assume your vault will be stolen from whichever provider is hosting it.
Ubikeys will be more important than ever to airgap root auth credentials.
That would have started a P2 and woken up a senior IR responder anywhere that I’ve worked. Are you sure you’re running a realistic defender environment?
https://imgflip.com/memetemplate/Always-Has-Been
I predict the software ecosystem will change in two folds: internal software behind a firewall will become ever cheaper, but anything external facing will become exponential more expensive due to hacking concern.
In the case of crooks (rather than spooks) that often means your security has to be as good as your peers, because crooks will spend their time going with the best gain/effort ratio.
If there is only one bear, you just need to run faster than your friends. If there's a pack of them, it you need to start training much harder!
That's a really big "if". Particularly since so many companies don't even know all of the OSS they are using, and they often use OSS to offload the cost of maintaining it themselves.
My hope is when the dust settles, we see more OSS SAST tools that are much better at detecting vulnerabilities. And even better if they can recommend fixes. OSS developers don't care about a 20 point chained attack across a company network, they just want to secure their one app. And if that app is hardened, perhaps that's the one link of the chain the attackers can't get past.
Companies that market to the EU are going to need to find out real fast.
Because we have tools and techniques that can guarantee the absence of certain behavior in a bounded state space using formal methods (even unbounded at times)
Sure, it's hard to formally verify everything but if you are dealing with something extremely critical why not design it in a way that you can formally verify it?
But yeah, the easy button is keep throwing more tokens till you money runs out of money
> You don’t get points for being clever
Not sure about this framing, this can easily lead to the wrong conclusions. There is an arms race, yes, and defenders are going to need to spend a lot of GPU hours as a result. But it seems self-evident that the fundamentals of cybersecurity still matter a lot, and you still win by being clever. For the foreseeable future, security posture is still going to be a reflection of human systems. Human systems that are under enormous stress, but are still fundamentally human. You win by getting your security culture in order to produce (and continually reproduce) the most resilient defense that masters both the craft and the human element, not just by abandoning human systems in favor of brute forcing security problems away as your only strategy.
Indeed, domains that are truly security critical will acquire this organizational discipline (what's required is the same type of discipline that the nuclear industry acquires after a meltdown, or that the aviation industry acquires after plane crashes), but it will be a bumpy ride.
This article from exactly 1 year ago is almost prophetic to exactly what's going on right now and the subtle ways in which people are most likely to misunderstand the situation: https://knightcolumbia.org/content/ai-as-normal-technology
The only process that scared me was windowgrid. It kept finding a way back when I killed all the "start with boot" locations I know. Run, runonce, start up apps, etc. Surely it's not in autoexec.bat :)
https://news.ycombinator.com/item?id=47788473
When things are tagged "cybersecurity", compliance/budget/manager/dashboard/education/certification are the usual response...
I don't think it would be an appropriate response for code quality issues, and it would likely escape the hands of the very people who can fix code quality issues, ie. developers.
It's nuts. If the timing were slightly different, none of this "Cybersecurity" would even be a thing. We'd just have capabilities based, secure general purpose computation.
As soon as there are multiple programs with full authority on your data, "cybersecurity" happens. And internet/web is that to the power of 100.
Not saying security will never be dominated by AI like it happened with chess, with maps, with Go, with language. But just braindead money to security pipeline? Skeptical.
These mass-produced tokens are just cheaper...
If the attacker and defender are using the same AI model, then (up to some inflection point) whoever spends more finds the most vulnerabilities.
After how many years of "shifting left" and understanding the importance of having security involved in the dev and planning process, now the recommendation is to vibe code with human intuition, review then spend a million tokens to "harden"?
I understand that isn't the point of the article and the article does make sense in its other parts. But that last paragraph leaves me scratching my head wondering if the author understands infosec at all?
1) massive companies spending millions of tokens to write+secure their software
2) in the shadows, "elite" software contractors writing bespoke software to fulfill needs for those who can't afford the millions, or fix cracks in (1)
(Oh wait, I think this is what is happening now, anyway, minus the millions of tokens)
I already see this happening: companies are moving toward AI-generated code (or forking projects into closed source), keeping their code private, AI written pipelines taking care of supply chain security, auditing and developing it primarily with AI.
At that point, for some companies, there's no real need for a community of "experts" anymore.
What's new?
It was always about spending more money on something.
Team has no capacity? Because the company doesn't invest in the team, doesn't expand it, doesn't focus on it.
We don't have enough experts? Because the company doesn't invest in the team, doesn't raise the salary bar to get new experts, it's not attractive to experts in other companies.
It was always about "spending tokens more than competitors", in every area of IT.
Of course those are attracted to new tools and AI shill institutes like AISI (yes, the UK government is shilling for AI, it understands a proper grift that benefits the elites).
Security "research" is perfect for talkers and people who produce powerpoint graphs that sell their latest tools.
You still can sit down and write secure software, while the "researchers" focus on the same three soft targets (sudo, curl, ffmpeg) over an over again and get $100,000 in tokens and salaries for a bug in a protocol from the 1990s that no one uses. Imagine if this went to the authors instead.
But no, government money MUST go to the talkers and powerpointists. Always.
Are these totally previously unknown security holes or are they still generally within the umbrella of our understanding of cybersecurity itself?
If it's the latter, why can't we systematically find and fix them ourselves?
Big if true. Can you cite an example? I'm all ears.
In other terms, I feel the argument from TFA generally checks out, just on a different level than "more GPU wins". It's one up: "More money wins". That's based on the premise that more capable models will be more expensive, and using more of it will increase the likelihood of finding an exploit, as well as the total cost. What these model providers pay for GPUs vs R&D, or what their profit margin is, I'd consider less central.
But then again, AI didn't change this, if you have more money you can find more exploits: Whether a model looks for them or a human.
Security was always about having more money/resources. Using more tokens is just another measure for the same.
Some previous post, which I cannot verify myself, stated that mythos is not as powerful as it seems to be as the same bugs could be found using much smaller/simpler models and that the method is the key part.
What this fails to take into account is that unless the codebase is changed, there are a finite amount of actual (and even fewer actionable) bugs in a piece of code, but an infinite amount of potential attacker spend; nothing stops you running mythos against it, whether it finds anything or not, and because each run is atomic by nature, you just have to play the numbers out and see when the average vuln discovery rate is dropping. You could spend a billion dollars and not find anything, without the defender spending a cent.
Generally speaking, the advantage goes to whoever can spend more time or money on security research (this has always been true, which is why the NSA was able to find Windows exploits that M$ did not). But eventually the fount of bugs in a piece of software will dry up, and attackers have no way of knowing if that's the case or not before dumping money at it.