Hacker News Clone

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf](huggingface.co)

982 pointsby pretextDec 1, 2025

36 Comments

nimchimpsky•Dec 1, 2025

Pretty amazing that a relatively small Chinese hedge fund can build AI better than almost anyone.

JSR_FDED•Dec 1, 2025

And gives it away for free!

Havoc•Dec 1, 2025

Yeah they've consistently delivered. At the same time there are persistent whispers that they're not all that small and scruffy as portrayed either.

astrange•Dec 2, 2025

Anthropic also said their development costs aren't very different.

BoorishBears•Dec 1, 2025

3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0

https://x.com/deepseek_ai/status/1995452641430651132

deaux•Dec 2, 2025

The assumption here is that 3.2 (without suffix) is an evolution of 3.2-Exp rather than being the same model, but they don't seem to be explicitly stating anywhere whether they're actually different or that they just made the same model GA.

zparky•Dec 1, 2025

Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!).

futureshock•Dec 1, 2025

The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.

jodleif•Dec 1, 2025

I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper

newyankee•Dec 1, 2025

Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

jazzyjackson•Dec 1, 2025

Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark

bilbo0s•Dec 1, 2025

>I guess it's counting on deepseek being banned

And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

coliveira•Dec 1, 2025

Exactly. "Business investment" these days means that the people involved will have at least some amount of power to determine the winning results.

jodleif•Dec 1, 2025

> Valuation is not based on what they have done but what they might do

Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

ksynwa•Dec 1, 2025

Even if they do continue to lag behind they are a good bet against monopolisation by proprietary vendors.

coliveira•Dec 1, 2025

They would if corporations were allowed to run these models. I fully expect the US government to prohibit corporations from doing anything useful with Chinese models (full censorship). It's the same game they use with chips.

jasonsb•Dec 1, 2025

It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

Edit: It looks like Cerebras is offering a very fast GLM 4.6

csomar•Dec 1, 2025

According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

jodleif•Dec 1, 2025

Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

observationist•Dec 1, 2025

The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean.

It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.

Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.

dworks•Dec 2, 2025

Qwen 3 Coder Plus has been braindead this past weekend, but Codex 5.1 has also been acting up. It told me updating UI styling was too much work and I should do it myself. I also see people complaining about Claude every week. I think this is an unsolved problem, and you also have to separate perception from actual performance, which I think is an impossible task.

kachapopopow•Dec 1, 2025

cerebras AI offers models at 50x the speed of sonnet?

baq•Dec 2, 2025

if that's an honest question, the answer is pretty much yes, depending on model.

kachapopopow•Dec 2, 2025

the question mark was expressing confusion.

DeathArrow•Dec 1, 2025

> If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini.

I think GLM 4.6 offered by Cerebras is much faster than any US model.

jasonsb•Dec 1, 2025

You're right, I forgot about that one.

irthomasthomas•Dec 1, 2025

Gemini 3 = ~70tps https://openrouter.ai/google/gemini-3-pro-preview

Opus 4.5 = ~60-80tps https://openrouter.ai/anthropic/claude-opus-4.5

Kimi-k2-think = ~60-180tps https://openrouter.ai/moonshotai/kimi-k2-thinking

Deepseek-v3.2 = ~30-110tps (only 2 providers rn) https://openrouter.ai/deepseek/deepseek-v3.2

jasonsb•Dec 1, 2025

It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps.

irthomasthomas•Dec 1, 2025

These aren't advertised speeds, they are the average measured speeds by openrouter across different providers.

isamuel•Dec 1, 2025

There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

newyankee•Dec 1, 2025

but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?

cyberlimerence•Dec 1, 2025

They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s

mosselman•Dec 1, 2025

Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good.

It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.

What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.

Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!

So I love everything about the competitive nature of this.

CamperBob2•Dec 1, 2025

If you thought DeepSeek "just wasn't good," there's a good chance you were running it wrong.

For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.

bjourne•Dec 1, 2025

WDYM? Isn't https://chat.deepseek.com/ the real DeepSeek?

CamperBob2•Dec 1, 2025

Good point, I was assuming the GP was running local for some reason. Hard to argue when it's the official providers who are being compared.

I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.

lukan•Dec 1, 2025

"It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; ..."

Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).

littlestymaar•Dec 1, 2025

> It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed

Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

gazaim•Dec 1, 2025

> Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

littlestymaar•Dec 1, 2025

> There were many counter revolutionaries among the leadership

Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo.

> Stalin was like "ah fuck we're hella compromised."

There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid.

> Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…

breppp•Dec 1, 2025

Not sure how the entire Nazi comparison plays out, but at the time there were good reasons to imagine the Soviets will fall apart (as they initially did)

Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)

ecshafer•Dec 1, 2025

I don't think that anyone, much less someone working in tech or engineering in 2025, could still hold beliefs about Chinese not being capable scientists or engineers. I could maybe give (the naive) pass to someone in 1990 thinking China will never build more than junk. But in 2025 their product capacity, scientific advancement, and just the amount of us who have worked with extremely talented Chinese colleagues should dispel those notions. I think you are jumping to racism a bit fast here.

Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.

hnfong•Dec 1, 2025

I don't think anyone seriously believes that the Chinese aren't capable, it's more like people believe no matter what happens, USA will still dominate in "high tech" fields. A variant of "American Exceptionalism" so to speak.

This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning).

Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...

gazaim•Dec 1, 2025

These Americans have no comprehension of intelligence being used to benefit humanity instead of being used to fund a CEO's new yacht. I encourage them to visit China to see how far the USA lags behind.

astrange•Dec 2, 2025

Lags behind meaning we haven't covered our buildings in LEDs?

America is mostly suburbs and car sewers but that's because the voters like it that way.

stocksinsmocks•Dec 2, 2025

Early stages of Barbarossa were very successful and much of the Soviet Air Force, which had been forward positioned for invasion, was destroyed. Given the Red Army’s attitude toward consent, I would keep the praise carefully measured. TV has taught us there are good guys and bad guys when the reality is closer to just bad guys and bad guys

espadrine•Dec 1, 2025

Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

jodleif•Dec 1, 2025

1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.

brabel•Dec 1, 2025

Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again.

BoorishBears•Dec 1, 2025

I'd say Google still hasn't caught up on the smaller model side at all, but we've all been (rightfully) wowed enough by Pro to ignore that for now.

Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image.

Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up.

I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise.

dworks•Dec 2, 2025

FWIW, Qwen Z-Image is much better than Seedream and people (redditors) are saying its better than Nano Banana in their first trials. Its also 7B I think, and open.

BoorishBears•Dec 2, 2025

I've used and finetuned Z-Image Turbo: it's nowhere near Seedream or even Qwen-Image when the latter is finetuned (also doesn't do image editing yet)

It is very good for the size and speed, and I'm excited for the Edit and Base variants... but Reddit has been a bit "over-excited" because it run on their small GPUs and isn't overly resistant to porn.

torginus•Dec 1, 2025

Thanks for sharing that!

The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?

I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)

espadrine•Dec 1, 2025

Good question. There's 2 points to consider.

• For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding

• The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.

By the way, you can look at the Table tab to see all known and predicted results on benchmarks.

pama•Dec 2, 2025

The table is confusing. It is not clear what is known and what is predicted (and how it is predicted). Why not measure the missing pieces instead of predicting—is it too expensive or is the tooling missing?

agumonkey•Dec 1, 2025

forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ?

coliveira•Dec 1, 2025

Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control.

skeledrew•Dec 1, 2025

It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power).

agumonkey•Dec 1, 2025

that's what i meant even though i could have been more precise

erikhorton•Dec 2, 2025

Yes extremely likely they are prone to censorship based on the training. Try running them with something like LM Studio locally and ask it questions the government is uncomfortable about. I originally thought the bias was in the GUI, but it's baked into the model itself.

raincole•Dec 1, 2025

> video

Most of AI-generated videos we see on social media now are made with Chinese models.

coliveira•Dec 1, 2025

Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall.

ben_w•Dec 1, 2025

I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes".

But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.

* https://en.wikipedia.org/wiki/All-pay_auction

coliveira•Dec 1, 2025

> investors have looked at the ultimate prize — basically winning something larger than the entire present world economy

This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers.

saubeidl•Dec 1, 2025

Yeah, this is basically financial malpractice/fraud.

culi•Dec 2, 2025

Qwen, Hunyuan, and WAN are three of the major competitors in the vision, text-to-image, and image-to-video spaces. They are quite competitive. Right now WAN is only behind Google's Veo in image-to-video rankings on llmarena for example

https://lmarena.ai/leaderboard/image-to-video

janalsncm•Dec 2, 2025

> Chinese models typically focus on text

Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.

Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.

Bolwin•Dec 1, 2025

Third party providers rarely support caching.

With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)

If they start caching then US companies will be completely out priced.

beastman82•Dec 1, 2025

Then you should short the market

fastball•Dec 1, 2025

They're not that close (on things like LMArena) and being cheaper is pretty meaningless when we are not yet at the point where LLMs are good enough for autonomy.

mrinterweb•Dec 1, 2025

I would expect one of the motivations for making these LLM model weights open is to undermine the valuation of other players in the industry. Open models like this must diminish the value prop of the frontier focused companies if other companies can compete with similar results at competitive prices.

rprend•Dec 1, 2025

People pay for products, not models. OpenAI and Anthropic make products (ChatGPT, Claude Code).

TIPSIO•Dec 1, 2025

It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

bigyabai•Dec 1, 2025

People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup.

Spivak•Dec 1, 2025

Yeah I think the advantage of OSS models is that you can get your pick of providers and aren't locked into just Anthropic or just OpenAI.

hnfong•Dec 1, 2025

Reproducibility of results are also important in some cases.

There are consumer-ish hardware that can run large models like DeepSeek 3.x slowly. If you're using LLMs for a specific purpose that is well-served by a particular model, you don't want to risk AI companies deprecating it in a couple months and push you to a newer model (that may or may not work better in your situation).

And even if the AI service providers nominally use the same model, you might have cases where reproducibility requires you use the same inference software or even hardware to maintain high reproducibility of the results.

If you're just using OpenAI or Anthropic you just don't get that level of control.

Aachen•Dec 2, 2025

Who is the target audience of these free releases? I don't mind free and open information sharing but I have wondered what's in it for the people that spent unholy amounts of energy on scraping, developing, and training

noosphr•Dec 1, 2025

Home rigs like that are no longer cost effective. You're better off buying an rtx pro 6000 outright. This holds both for the sticker price, the supporting hardware price, the electricity cost to run it and cooling the room that you use it in.

torginus•Dec 1, 2025

I was just watching this video about a Chinese piece of industrial equipment, designed for replacing BGA chips such as flash or RAM with a good deal of precision:

https://www.youtube.com/watch?v=zwHqO1mnMsA

I wonder how well the aftermarket memory surgery business on consumer GPUs is doing.

ThrowawayTestr•Dec 1, 2025

LTT recently did a video on upgrading a 5090 to 96gb of ram

dotancohen•Dec 1, 2025

I wonder how well the opthalmologist is doing. These guys are going to be paying him a visit playing around with those lasers and no PPE.

CamperBob2•Dec 1, 2025

Eh, I don't see the risk, no pun intended. It's not collimated, and it's not going to be in focus anywhere but on-target. It's also probably in the long-wave range >>1000 nm that's not focused by the eye. At the end of the day it's no different from any other source of spot heating. I get more nervous around some of the LED flashlights you can buy these days.

I want one. Hot air blows.

noosphr•Dec 2, 2025

It's 45w of lasing power. I have a scar on my hand that's 15 years old from running one of those at 10% power and getting a reflection from a bare metal sheet.

This will absolutely scar, if not char, your cornea faster than you can blink.

CamperBob2•Dec 2, 2025

That's (again) less energy than a flashlight puts out these days, so the beam had to be tightly focused in your case. That isn't how these things work.

There is nothing special about "lasing power." It amounts to a 45-watt light bulb, nothing more and nothing less.

dotancohen•Dec 2, 2025

A 45 watt light bulb spreads the energy in all directions - at 1 meter away that's about 3 watts in every square meter or roughly 0.000003 watts per square millimeter. The laser is putting 45 watts into that same square millimeter at the same distance.

Of course the laser is tightly focused. That's pretty much one of the defining properties of laser devices. How else do you think the laser is heating the microprocessors in the video?

CamperBob2•Dec 2, 2025

They will be using a beam spreader to conform to the size of the targeted IC, which is usually on the order of 5x5 mm and up. For smaller parts they will be reducing the power.

They shouldn't be focusing it to a point under any conditions. Whether it's as safe as it could be is a different question, of course. For instance, you'd like to think that the act of configuring it for a smaller beam footprint would reduce the power at the same time, as opposed to requiring a separate adjustment that might be overlooked by the operator. Would have been nice if the video had addressed that and other safety considerations, for sure.

A lot depends on the exact wavelength. 1400 nm and longer is much less worrisome than near-visible IR.

noosphr•Dec 2, 2025

OK, put you face in front of a 45w co2 laser tube and report your results.

The laser is collimated but not focused so by your logic it will be fine.

This is advice on par with eating tide pods.

CamperBob2•Dec 3, 2025

That's obviously not a good-faith or technically-accurate description of what's happening here, or else everybody in that video would be carrying a white cane, along with everybody who uses this type of equipment in the phone repair business.

About all we can agree on, I think, is that neither of us knows enough about the product to argue about it usefully.

noosphr•Dec 3, 2025

I build CO2 laser cutters after dropping out from my physics PhD for 18 months.

Unlike you I do know what I'm talking about.

CamperBob2•Dec 3, 2025

I see. Are you also an authority on Tide Pod gastronomy, or just regarding lasers you've never worked with installed in devices you've never seen?

mikae1•Dec 1, 2025

Or perhaps a 512GB Mac Studio. 671B Q4 of R1 runs on it.

redrove•Dec 1, 2025

I wouldn’t say runs. More of a gentle stroll.

storus•Dec 1, 2025

I run it all the time, token generation is pretty good. Just large contexts are slow but you can hook a DGX Spark via Exo Labs stack and outsource token prefill to it. Upcoming M5 Ultra should be faster than Spark in token prefill as well.

embedding-shape•Dec 1, 2025

> I run it all the time, token generation is pretty good.

I feel like because you didn't actually talk about prompt processing speed or token/s, you aren't really giving the whole picture here. What is the prompt processing tok/s and the generation tok/s actually like?

storus•Dec 1, 2025

I addressed both points - I mentioned you can offload token prefill (the slow part, 9t/s) to DGX Spark. Token generation is at 6t/s which is acceptable.

a96•Dec 2, 2025

So, quarter stroll.

embedding-shape•Dec 2, 2025

6 tok/sec might be acceptable for a dense model that doesn't do thinking, but for something like DeepSeek 3.2 that does do reasoning, 6 tok/sec isn't acceptable for anything else but async/batched stuff, sadly. Even for a response with just 100 tokens we're talking a minute for it to just write the response, for anything except the smallest of prompts you'll easily be hitting 1000 tokens (600 seconds!).

Maybe my 6000 Pro spoiled me, but for actual usage, 6 or even 9 tok/sec is too slow for a reasoning/thinking model. To be honest, kind of expected on CPU though. I guess it's cool that it can run on Apple hardware, but it isn't exactly a pleasant experience at least today.

storus•Dec 2, 2025

Dunno, DeepSeek on MacStudio doesn't feel much slower than when using it directly on deepseek.com; 6t/s is still around 24 characters per second which is faster than many people could read. I also have 6000 Pro but you won't fit any large model there and to be able to run DeepSeek R1/3.1/3.2 671B at Q4 you'd need 5-6 of them depending on the communication overhead. MacStudio is the simplest solution to run it locally.

embedding-shape•Dec 2, 2025

> 6t/s is still around 24 characters per second which is faster than many people could read.

But again, not if you're using thinking/reasoning, which if you want to use this specific model properly, you are. Then you have a huge delay before the actual response comes through.

> MacStudio is the simplest solution to run it locally.

Obviously, that's Apple's core value proposition after all :) One does not acquire a state-of-the-art GPU and then expect simple stuff, especially when it's a fairly uncommon and new one. You cannot really be afraid of diving into CUDA code and similar fun rabbit holes. Simply two very different audiences for the two alternatives, and the Apple way is the simpler one, no doubt about it.

redrove•Dec 2, 2025

6t/s will have you pulling your hair out with any deepseek model.

hasperdi•Dec 1, 2025

With quantization, converting it to an MOE model... it can be a fast walk

throw4039•Dec 1, 2025

Yeah, the pricing for the rtx pro 6000 is surprisingly competitive with the gamer cards (at actual prices, not MSRP). A 3x5090 rig will require significant tuning/downclocking to be run from a single North American 15A plug, and the cost of the higher powered supporting equipment (cooling, PSU, UPS, etc) needed will pay for the price difference, not to mention future expansion possibilities.

halyconWays•Dec 1, 2025

As someone with a basement rig of 6x 3090s, not really. It's quite slow, as with that many params (685B) it's offloading basically all of it into system RAM. I limit myself to models with <144B params, then it's quite an enjoyable experience. GLM 4.5 Air has been great in particular

lostmsu•Dec 2, 2025

Did you find it better than GPT-OSS 120B? The public rankings are contradictory.

halyconWays•Dec 15, 2025

I haven't used GPT-OSS 120B, or other GPT-OSS models, and I mostly go on personal recommendations rather than benchmarks directly.

tarruda•Dec 1, 2025

You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197

IIRC the 512GB mac studio is about $10k

hasperdi•Dec 1, 2025

and can be faster if you can get an MOE model of that

dormento•Dec 1, 2025

"Mixture-of-experts", AKA "running several small models and activating only a few at a time". Thanks for introducing me to that concept. Fascinating.

(commentary: things are really moving too fast for the layperson to keep up)

whimsicalism•Dec 1, 2025

that's not really a good summary of what MoEs are. you can more consider it like sublayers that get routed through (like how the brain only lights up certain pathways) rather than actual separate models.

Mehvix•Dec 1, 2025

The gains from MoE is that you can have a large model that's efficient, it lets you decouple #params and computation cost. I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'. These are totally different systems, IMO this shallow comparison muddies the water and does a disservice to each field of study. There's been loads of research showing there's redundancy in MoE models, ie cerebras has a paper[1] where they selectively prune half the experts with minimal loss across domains -- I'm not sure you could disable half the brain and notice a stupefying difference.

[1] https://www.cerebras.ai/blog/reap

whimsicalism•Dec 2, 2025

> I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'.

I'm not saying it is a perfect analogy, but it is by far the most familiar one for people to describe what sparse activation means. I'm no big fan of over-reliance on biological metaphor in this field, but I think this is skewing a bit on the pedantic side.

re: your second comment about pruning, not to get in the weeds but I think there have been a few unique cases where people did lose some of their brain and the brain essentially routed around it.

hasperdi•Dec 1, 2025

As pointed out by a sibling comment. MOE consists of a router and a number of experts (eg 8). These experts can be imagined as parts of the brain with specialization, although in reality they probably don't work exactly like that. These aren't separate models, they are components of a single large model.

Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements.

Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration

bigyabai•Dec 1, 2025

>90% of inference hardware is faster if you run an MOE model.

miohtama•Dec 1, 2025

All modern models are MoE already, no?

hasperdi•Dec 2, 2025

That's not the case. Some are dense and some are hybrid.

MOE is not the holy grail, as there are drawbacks eg. less consistency, expert under/over-use

tarruda•Dec 2, 2025

Deepseek is already a MoE

menaerus•Dec 2, 2025

~20 tokens/second is actually pretty good. I see he's using the q5 version of the model. I wonder how it scales with the larger contexts. And the same guy published the video today with the new 3.2 version: https://www.youtube.com/watch?v=b6RgBIROK5o

reilly3000•Dec 1, 2025

There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis.

cubefox•Dec 1, 2025

> There are plenty of 3rd party and big cloud options to run these models by the hour or token.

Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.

big_man_ting•Dec 2, 2025

have you checked OpenRouter if they offer any providers who serve the model you need?

cubefox•Dec 2, 2025

I searched for "base" and the best available base model seems to be indeed Llama 3.1 405B Base at Hyperbolic.ai, as mentioned in the comment above.

reilly3000•Dec 2, 2025

If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...

Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.

https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.

You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.

verdverm•Dec 2, 2025

GCloud and Hyperbolic have been my go-to as well

cubefox•Dec 2, 2025

> If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:

> https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...

I don't see any large base models there. A base model is a pretrained foundation model without fine tuning. It just predicts text.

> Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.

A B200 is probably not enough: it has just 192 GB RAM while DeepSeek-V3.2-Exp-Base, the base model for DeepSeek-V3.2, has 685 billion BF16 parameters. Though I assume they have larger options. The problem is that all the configuration work is then left to the user, which I'm not experienced in.

> https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models

Thanks. They do indeed have a single base model: Llama 3.1 405B BASE. This one is a bit older (July 2024) and probably not as good as the base model for the new DeepSeek release. But that might the the best one can do, as there don't seem to be any inference providers which have deployed a DeepSeek or even Kimi base model.

weberer•Dec 2, 2025

Fireworks supports this model serverless for $1.20 per million tokens.

https://fireworks.ai/models/fireworks/deepseek-v3p2

cubefox•Dec 2, 2025

That's the final, fine-tuned model. The base model (pretraining only, no instruction SFT, RLHF, RLVR etc) is this one: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base It's apparently not offered at any inference provider, nor are older DeepSeek base models.

potsandpans•Dec 1, 2025

I run a bunch of smaller models on a 12gb vram 3060 and it's quite good. For larger open models ill use open router. I'm looking into on- demand instances with cloud/vps providers, but haven't explored the space too much.

I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive.

I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this.

wickedsight•Dec 2, 2025

Which ones are your favorites that fit on the 3060?

seanw265•Dec 1, 2025

FWIW it looks like OpenRouter's two providers for this model (one of whom being Deepseek itself) are only running the model around 28tps at the moment.

https://openrouter.ai/deepseek/deepseek-v3.2

This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely.

red2awn•Dec 1, 2025

Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851

ode•Dec 1, 2025

Do we know why?

hammeiam•Dec 1, 2025

Sparse Attention, it's the highlight of this model as per the paper

pylotlight•Dec 2, 2025

I'll have to wait for the bycloud video on this one :P

culi•Dec 2, 2025

How did we come to the place that the most transparent and open models are now coming out of China—freely sharing their research and source code—while all the American ones are fully locked down

pennomi•Dec 2, 2025

Over reliance on investors who demand profits more than engineering.

The best innovation always happens before being tainted by investment.

victor9000•Dec 2, 2025

Because the entire US economy is being propped up by AI hype.

nickstinemates•Dec 2, 2025

The money would have gone somewhere. The "smart" money went to AI. Don't get fooled.

culi•Dec 3, 2025

everybody looks smart in a bull market

p-e-w•Dec 2, 2025

Because the whole framing of US vs China as open vs closed was never correct to begin with.

kouteiheika•Dec 2, 2025

> How did we come to the place that the most transparent and open models are now coming out of China—freely sharing their research and source code—while all the American ones are fully locked down

Greed and "safety" hysteria.

aqme28•Dec 2, 2025

The US companies are all basically GPUaas. I’m not sure what the financial model is here, but I like it.

SequoiaHope•Dec 2, 2025

Short cheeky answer is that capitalists need to capture value and communists don’t. Less cheeky answer is that this is a good opportunity for China to make sure the world isn’t dominated by US-sourced AI models.

However in another way the US probably offers more free inference than China. What good is an open 600 billion parameter model to a poor person? A free account with ChatGPT might be more useful to them, though also more exploitative.

evrenesat•Dec 2, 2025

China needs to build the world's trust and respect, while the US is slowly but surely losing theirs.

m_mueller•Dec 2, 2025

slowly?

tim333•Dec 2, 2025

The US is more law and finance led and the first thing seems to be to get an IP advantage and raise money. China I guess less so - they are famously lax on IP and everyone copies everything.

zug_zug•Dec 1, 2025

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

srameshc•Dec 1, 2025

As much I agree with your sentiment, but I doubt the intention is singular.

echelon•Dec 1, 2025

I don't care if this kills Google and OpenAI.

I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).

Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.

I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.

The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.

These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.

raw_anon_1111•Dec 1, 2025

I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.

They would feel the same way about using xAI or maybe even Facebook models.

StealthyStart•Dec 1, 2025

This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived.

0xWTF•Dec 1, 2025

Children do the same thing intuitively: parents continually complain that their children don't listen to them. But as soon as someone else tells them to "cover their nose", "chew with their mouth closed", "don't run with scissors", whatever, they listen and integrate that guidance into their behavior. What's harder to observe is all the external guidance they get that they don't integrate until their parents tell them. It's internal vs external validation.

raw_anon_1111•Dec 1, 2025

Or in many cases they go over to their grandparents house and they let them run wild and all of the sudden your parents have “McDonald’s money” for their grandkids when they never had it for you.

raw_anon_1111•Dec 1, 2025

I have seen it much more nuanced than that.

2020 - I was a mid level (L5) cloud consultant at AWS with only two years of total AWS experience and that was only at a small startup before then. Yet every customer took my (what in hindsight might not have been the best) advice all of the time without questioning it as long as it met their business goals. Just because I had @amazon.com as my email address.

Late 2023 - I was the subject matter expert in a niche of a niche in AWS that the customer focused on and it was still almost impossible to get someone to listen to a consultant from a shitty third rate consulting company.

2025 - I left the shitty consulting company last year after only a year and now work for one with a much better reputation and I have a better title “staff consultant”. I also play the game and be sure to mention that I’m former “AWS ProServe” when I’m doing introductions. Now people listen to me again.

coliveira•Dec 1, 2025

So much worse for American companies. This only means that they will be uncompetitive with similar companies that use models with realistic costs.

raw_anon_1111•Dec 1, 2025

I can’t think of a single major US company that is big internationally that is competing on price.

ipaddr•Dec 1, 2025

Any car company. Uber.

All tech companies offering free services.

raw_anon_1111•Dec 2, 2025

Is a “cheaper” service going to come along and upend Google or Facebook?

I’m not saying this to insult the technical capabilities of Uber. But it doesn’t have the economics that most tech companies have - high fixed costs and very low marginal costs. Uber has high marginal costs saving a little on inference isn’t going to make a difference.

jamiek88•Dec 2, 2025

What American car company competes overseas on price?

necovek•Dec 2, 2025

All the American cars (Ford, Chevrolet, GM...) are much cheaper in Europe than eg. German cars from their trifecta (and other Europe-made high end vehicles from eg Sweden, Italy or UK), and on par with mid-priced vehicles from the likes of Hyundai, Kia, Mazda...

Obviously, some US brands do not compete on price, but other than maybe Jeep and Tesla, those have a small market penetration.

re-thc•Dec 2, 2025

> I can’t think of a single major US company that is big internationally that is competing on price.

All the clouds compete on price. Do you really think it is that differentiated? Google, Amazon and Microsoft all offer special deals to sign big companies up and globally too.

raw_anon_1111•Dec 2, 2025

I worked inside AWS consulting department for 3 years (AWS ProServe) and now I work as a staff consultant for a 3rd AWS partner. I have been on enough sales calls, seen enough go to market training materials and flown out to customers sites to know how these things work. AWS has never tried to compete as the “low cost leader”. Marketing 101 says you never want to compete on price if you can avoid it.

Microsoft doesn’t compete on price. Their major competitive advantage is Big Enterprise is already big into Microsoft and it’s much easier to get them to come onto Azure. They compete on price only when it comes to making Windows workloads Bd SQL Server cheaper than running on other providers.

AWS is the default choice for legacy reasons and it definitely has services an offerings that Google doesn’t have. I have never once been on a sales call where the sales person emphasizes that AWS is cheaper.

As far as GCP, they are so bad at evterprise sales, we never really looked at them as serious competition.

Sure AWS will throw credits in for migrations and professional services both internally and for third party partners. But no CFO is going to look at just the short term credits.

re-thc•Dec 2, 2025

> AWS has never tried to compete as the “low cost leader”. Marketing 101 says you never want to compete on price if you can avoid it.

Despite all that and whatever you say, the fact is you do compete. It doesn't have to be a race to the bottom.

So Cloudfront free tier and the latest discount bundles etc aren't to compete? People have also negotiated private pricing way below list price and a lot cheaper than competitors.

Similarly was the Dynamodb price cuts not due to competition?

I can give way more examples...

raw_anon_1111•Dec 2, 2025

I am well aware that Netflix doesn’t pay the same price for AWS services that “Joe Bob’s Fish Tackle and WordPress shop”. All big companies give discounts to large companies as part of negotiations which is different from “we are the low cost leader”.

All technology gets cheaper over time. There is a difference between lowering price in response to competitors and finding the profit maximizing price based on supply and demand.

AWS was lowering prices to increase demand before GCP and Azure were a thing.

Jassy said right before he became CEO of Amazon and he was still over AWS that only 5% of IT spend was on any cloud provider. They are capturing non consumption and marketing value of AWS vs that.

While I don’t have any insider experience about Azure, looking on the outside, I would think that Azure’s go to market is also not competing against AWS on price, but trying to get on prem customers on Azure.

tokioyoyo•Dec 1, 2025

If the Chinese model becomes better than competitors, these worries will suddenly disappear. Also, there are plenty startups and enterprises that are running fine-tuned versions of different OS models.

raw_anon_1111•Dec 1, 2025

Yeah that’s not how Big Enterprise works…

And most startups are just doing prompt engineering that will never go anywhere. The big companies will just throw a couple of developers at the feature and add it to their existing business.

tokioyoyo•Dec 1, 2025

Big enterprise with mostly private companies as their clients? Lol, yeah, that’s how they work from my personal experience. The reality is, if it’s not a tech-first enterprise and already outsource part of tech to a shop outside of NA (which is almost majority at this point), they will do absolutely everything to cut the costs.

raw_anon_1111•Dec 1, 2025

I spent three years working in consulting mostly in public sector and education and the last two working with startups to mid size commercial interest and a couple of financial institutions.

Before that I spent 6 years working between 3 companies in health care in a tech lead role. I’m 100% sure that any of those companies would I have immediately questioned my judgment for suggesting DeepSeek if had been a thing.

Absolutely none of them would ever have touched DeepSeek.

ipaddr•Dec 2, 2025

Why would you be presenting what AI tech you are using? You would tell them AI will come from Amazon using a variety of models.

raw_anon_1111•Dec 2, 2025

You still choose your model. I’m no more going to say “I’m using Bedrock” without being more specific than I would say “I’m using RDS” without specifying the database.

chaboud•Dec 2, 2025

In various sectors, you need to be able to explain why you/your-system did what it did. Exchange Act Rule 15c3-5 is probably the most relevant in financial circles:

https://www.ecfr.gov/current/title-17/chapter-II/part-240/su...

Note: I am neither a lawyer nor in financial circles, but I do have an interest in the effects of market design and regulation as we get into a more deeply automated space.

raw_anon_1111•Dec 2, 2025

To add on, while it doesn’t work with GenAI models as far as I know. AWS has a service around explainability around ML decisions

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-mode...

tokioyoyo•Dec 2, 2025

I've worked with financial services, and insurance providers that would have done the opposite for cost saving measures. So, I'm not sure what to say here.

raw_anon_1111•Dec 2, 2025

Regulators would have the head of any financial institution that used a Chinese model.

corimaith•Dec 2, 2025

Financial Services are far more risk averse first than they are cost cutting, they literally have risk departments.

If you'd spent anytime working at one for swe you won't have access to popular open source frameworks, let alone Chinese LLMs. The LLM development is mostly occurring through collaborations with the regional LLM businesses or internal labs.

subroutine•Dec 1, 2025

As a government contractor, using a Chinese model is a non-starter.

jazzyjackson•Dec 2, 2025

I don't know that it's actually prohibited. There is no Chinese telecommunications equipment allowed, no Huawei or Bytedance, but nothing prohibiting software merely being developed in China, not yet at least.

Although I did just check what regions AWS bedrock support Deepseek and their govcloud regions do not, so that's a good reason not to use it. Still, on prem on a segmented network, following CMMC, probably permissable

apercu•Dec 2, 2025

There’s nuance and debate about the 110 level 2 controls without bringing Chinese tech in to the picture. I’d love to be a fly on the wall in that meeting lol.

re-thc•Dec 2, 2025

> I don't know that it's actually prohibited.

Chinese models generally aren't but DeepSeek specifically is at this point.

hhh•Dec 1, 2025

No… Nobody I work for will touch these models. The fear is real that they have been poisoned or have some underlying bomb. Plus y’know, they’re produced by China, so they would never make it past a review board in most mega enterprises IME.

tokioyoyo•Dec 1, 2025

People say that, but everyone, including enterprises, are constantly buying Chinese tech one way or another because of cost/quality ratio. There’s a tipping point in any excel file where risks don’t make sense, if the cost is 20x for the same quality.

Of course you’ll always have exceptions (government, military and etc.), but for private, winner will take it all.

raw_anon_1111•Dec 1, 2025

What Chinese built infrastructure tech where information can be exfiltrated or cause any real damage are American companies buying? Chinese communication tech is for the most part not allowed in any American technology.

oceanplexian•Dec 2, 2025

80% of the parts in iPhones are manufactured in China, and they have completely and utterly dominated in Enterprise (Ever heard of someone using a Blackberry in 2025? Me neither.) so there’s one example.

raw_anon_1111•Dec 2, 2025

The software is made by Apple. Hardware can’t magically intercept communications and the manufacturing is done mostly in Taiwan. If Apple doesn’t have a process to protect its operating system from supply chain attacks, it would be derelict

necovek•Dec 2, 2025

Hardware can do any "magic" software can, which should be obvious since software runs on it. It's just not as cost-effective to modify it after shipping, which is why the tech sector is moving to more sw less hw (simplified, ofc, there are other reasons).

contrarian1234•Dec 2, 2025

The xenaphobia is still very much there. Chinese tech is sanitized through Taiwanese middlemen (Foxconn, Asus, Acer etc). If you try to use Chinese tech or funding directly you will have a lot of pushback from VCs, financial institutions and business partners. China is the boogieman

baq•Dec 2, 2025

it is many things, but not xenophobia.

cherioo•Dec 1, 2025

That conversation probably gets easier if and when company when $100+M on AI.

Companies just need to get to the “if” part first. That or they wash their hand by using a reseller that can use whatever it wants under the hood.

deaux•Dec 2, 2025

For what it's worth, this is complete insanity when practically every mega enterprises' hardware is largely Made in China.

raw_anon_1111•Dec 2, 2025

Enterprise hardware isn’t the issue. It’s the software. How much enterprise hardware is running with Chinese software? The US basically bans any hardware with Chinese software that can disrupt infrastructure.

nylonstrung•Dec 2, 2025

Tons of routers, modems, embedded, are running Chinese software

adrian_b•Dec 2, 2025

Backdoors in software are much easier to discover than backdoors in hardware.

Any kind of hardware that is somehow connected to the wired or wireless communication interfaces is much more dangerous than any software.

Backdoors embedded in such hardware devices may be impossible to identify before being activated by the reception of some "magic" signals from outside.

vitaflo•Dec 2, 2025

I work at a F50 company and Deepseek is one of the model that has been approved for use. Took them a bit to get it all in place but it's certainly being used in Megacorps.

kriops•Dec 1, 2025

[flagged]

JSR_FDED•Dec 1, 2025

AirBnB is all in on DeepSeek and Qwen.

https://sg.finance.yahoo.com/news/airbnb-picks-alibabas-qwen...

raw_anon_1111•Dec 2, 2025

TIL: That Chinese models are considered better at multiple languages than non Chinese models.

tayo42•Dec 2, 2025

It's a customer service bot? And Airbnb is a vacation home booking site. It's pretty inconsequential

antonvs•Dec 2, 2025

Airbnb has ~$12 bn annual revenue, and is a counterexample to the idea that no companies can be "convinced to use DeepSeek".

The fact that it's customer service means it's dealing with text entered by customers, which has privacy and other consequences.

So no, it's not "pretty inconsequential". Many more companies fit a profile like that than whatever arbitrary criteria you might have in mind for "consequential".

tehjoker•Dec 1, 2025

really a testament to how easily the us govt has spun a china bad narrative even though it is mostly fiction and american exceptionalism

littlestymaar•Dec 1, 2025

> I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening.

I think if Mistral can just stay close enough to the race it will win many customers by not doing anything.

That might be the perspective of a US based company. But there is also Europe and basically it's a choice between Trump and China.

Muromec•Dec 1, 2025

Europe has Mistral. It feels that governments that can do things without fax take this as a sovereignity thing and roll their own or have their provider in their jurisdiction.

deaux•Dec 2, 2025

> Even when the technical people understood that

I'm not sure if technical people who don't understand this deserve the moniker technical in this context.

nylonstrung•Dec 2, 2025

The average person has been programmed to be distrustful of open source in general, thinking it is inferior quality or in service of some ulterior motive

giancarlostoro•Dec 1, 2025

ChatGPT is like "Photoshop" people will call any AI chatgpt.

twelvechairs•Dec 1, 2025

The bar is incredibly low considering what OpenAI has done as a "not for profit"

kopirgan•Dec 1, 2025

You need get a bunch of accountants to agree on what's profit first..

komali2•Dec 2, 2025

Agree against their best interest, mind you!

energy123•Dec 2, 2025

It's like AMD open-sourcing FSR or Meta open-sourcing Llama. It's good for us, but it's nothing more than a situational and temporary alignment of self-interest with the public good. When the tables turn (they become the best instead of 4th best, or AMD develops the best upscaler, etc), the decision that aligns with self-interest will change, and people will start complaining that they've lost their moral compass.

re-thc•Dec 2, 2025

It's not. This isn't about competition in a company sense but sanctions and wider macro issues.

energy123•Dec 2, 2025

It's like it in the sense that it's done because it aligns with self-interest. Even if the nature of that self-interest differs.

orbital-decay•Dec 2, 2025

>situational and temporary alignment of self-interest with the public good

That's how it supposed to work.

jstummbillig•Dec 1, 2025

How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are?

ericskiff•Dec 1, 2025

I believe this was a statement on cost per token to us as consumers of the service

moffkalast•Dec 2, 2025

Training cost-effectiveness doesn't matter for open models since someone else ate the cost. In this case, Chinese taxpayers.

KvanteKat•Dec 2, 2025

Deepseek is a private corporation funded by a hedge fund (High-Flyer). I doubt much public money was spent by the Chinese state on this. Like with LLMs in the US, the people paying for it so far are mainly investors who are betting on a return in the long to medium term.

boringg•Dec 2, 2025

Do you actually believe what you just wrote or are you trolling? One version at least has a foot planted in reality. The other one well...

rowanG077•Dec 1, 2025

Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department.

eru•Dec 2, 2025

Yes. Though we don't know for sure whether that's because they actually have lower costs, or whether it's just the Chinese taxpayer being forced to serve us a treat.

chronogram•Dec 2, 2025

Third party providers are still cheap though. The closed models are the ones where you can't see the real cost to running them.

eru•Dec 2, 2025

Oh, I was mostly talking about the Chinese taxpayer footing the training bill.

You are right that we can directly observe the cost of inference for open models.

rescbr•Dec 2, 2025

Not sure the Chinese taxpayer is footing the bill though - of course, it might not be net zero, there might be secondary effects, etc.

A few days ago I read an article saying the Chinese utilities have a pricing structure that favors high-tech industries (say, an AI data center), making the difference by charging more the energy-intensive but less sophisticated industries (an aluminium smelter, for example).

Admittedly, there are some advantages when you do central and long-term economic planning.

tedivm•Dec 1, 2025

If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware.

On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.

Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.

You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.

That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

qeternity•Dec 1, 2025

> DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.

Muromec•Dec 1, 2025

>That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?

GTP•Dec 2, 2025

Not really in the loop either, but when Deepseek R1 was released, I sumbled upon this YouTube channel [1] that made local AI PC builds in the 1000-2000$ range. But he doesn't always use GPUs, maybe the cheaper builds were CPU plus a lot of RAM, I don't remember.

[1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx

baq•Dec 2, 2025

'lots of RAM' got expensive lately -_-

District5524•Dec 2, 2025

Digital Spaceport is a really good channel, I second that - the author is not sparing any detail. The cheaper options always use CPU only, or sharding between different cheap GPUs (without SLI/switching) - which is not good for all use cases (he also highlights this). But some his prices are one-off bargains for used stuff. And RAM prices doubled this year, so you won't buy 2x256 GB DDR4 for $336, no matter what: https://digitalspaceport.com/500-deepseek-r1-671b-local-ai-s...

AlexCoventry•Dec 2, 2025

Mixture-of-Expert models benefit from economies of scale, because they can process queries in parallel, and expect different queries to hit different experts at a given layer. This leads to higher utilization of GPU resources. So unless your application is already getting a lot of use, you're probably under-utilizing your hardware.

chazeon•Dec 2, 2025

Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.

estsauver•Dec 2, 2025

Well, those are also extremely limited vram areas that wouldn't be able to run anything in the ~70b parameter space. (Can you run 30b even?)

Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.

nylonstrung•Dec 2, 2025

Deepseek and Kimi both have great agentic performance

When used with crush/opencode they are close to Claude performance.

Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude

Aeolun•Dec 2, 2025

> Deepseek on openrouter is still 25x cheaper than claude

Is it? Or only when you don’t factor in Claude cached context? I’ve consistently found it pointless to use open models because the price of the good ones is so close to cached context on Claude that I don’t need them.

joefourier•Dec 2, 2025

Deepseek via their API also has cached context, although the tokens/s was much lower than Claude when I tried it. But for background agents the price difference makes it absolutely worth it.

ewoodrich•Dec 2, 2025

Yes, if you try using Kilo Code/Cline via Openrouter the cost will be much cheaper using Deepseek/Kimi vs Claude Sonnet 4.5.

elif•Dec 2, 2025

Strictly speaking, you have not deployed any model on a 5090 because a 5090 card has never been produced.

And without specifying your quantization level it's hard to know what you mean by "not usable"

Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.

matthewmacleod•Dec 2, 2025

You can just buy a 5090 now for $3k. Have you confused it with something else?

__alexs•Dec 2, 2025

Are you a time traveller from the past? https://www.nvidia.com/en-gb/geforce/graphics-cards/50-serie...

JosephjackJR•Dec 2, 2025

they took the already ridiculous v3.1 terminus model, added this new deepseek sparse attention thing, and suddenly it’s doing 128k context at basically half the inference cost of the old version with no measurable drop in reasoning or multilingual quality. like, imo gold medal level math and code, 100+ languages, all while sipping tokens at 14 cents per million input. that’s stupid cheap. the rl recipe they used this time also seems way more stable. no more endless repetition loops or random language switching you sometimes got with the earlier open models. it just works. what really got me is how fast the community moved. vllm support landed the same day, huggingface space was up in hours, and people are already fine-tuning it for agent stuff and long document reasoning. i’ve been playing with it locally and the speed jump on long prompts is night and day. feels like the gap to the closed frontier models just shrank again. anyone else tried it yet?

miki123211•Dec 2, 2025

> with your own hardware

Or with somebody else's.

If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.

If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.

Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.

Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.

embedding-shape•Dec 2, 2025

> If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size

Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.

kmacdough•Dec 2, 2025

Furthermore, paid models are heavily subsidized by bullish investors playing for monopoly. So that tips the scales further towards Deepseek.

deaux•Dec 2, 2025

We can judge on inference cost because we do know what those are for open-weights models as there are a dozen independent providers that host these models and price them according to respective inference cost.

We can't judge on training cost, that's true.

stingraycharles•Dec 2, 2025

You can use tokens/sec on something like AWS Bedrock (which hosts both open and closed models) as a proxy for “costs per token” for the closed providers.

mzl•Dec 2, 2025

Apart from measuring prices from venture-backed providers which might or might not correlate with cost-effectiveness, I think the measures of intelligence per watt and intelligence per joule from https://arxiv.org/abs/2511.07885 is very interesting.

badmonster•Dec 2, 2025

Good point. Could usage patterns + inference costs give us proxy metrics? What would be a fair baseline?

make3•Dec 1, 2025

I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly.

ActorNightly•Dec 1, 2025

>winning on cost-effectiveness

Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.

beefnugs•Dec 1, 2025

Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future

observationist•Dec 1, 2025

Because Moore's law marches on.

We're around 35-40 orders of magnitude from computers now to computronium.

We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.

If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.

Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.

Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.

What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.

delaminator•Dec 1, 2025

> If we do get to AGI (2029 according to Kurzweil)

if you base your life on Kurzweil's hard predictions you're going to have a bad time

blonder•Dec 2, 2025

I appreciate your rabid optimism, but considering that Moores Law has ceased to be true for multiple years now I am not sure a handwave about being able to scale to infinity is a reasonable way to look at things. Plenty of things have slowed down in progress in our current age, for example airplanes.

timschmidt•Dec 2, 2025

Someone always crawls out of the woodwork to repeat this supposed "fact" which hasn't been true for the entire half-century it's been repeated. Jim Keller (designer of most of the great CPUs of the last couple decades) gave a convincing presentation several years ago about just how not-true it is: https://www.youtube.com/watch?v=oIG9ztQw2Gc Everything he says in it still applies today.

Intel struggled for a decade, and folks think that means Moore's law died. But TSMC and Samsung just kept iterating. And hopefully Intel's 18a process will see them back in the game.

eru•Dec 2, 2025

During the 1990s (and for some years before and after) we got 'Dennard scaling'. The frequency of processors tended to increase exponentially, too, and featured prominently in advertising and branding.

I suspect many people conflated Dennard scaling with Moore's law and the demise of Dennard scaling is what contributes to the popular imagination that Moore's law is dead: frequencies of processors have essentially stagnated.

See https://en.wikipedia.org/wiki/Dennard_scaling

timschmidt•Dec 2, 2025

Yup. Since then we've seen scaling primarily in transistor count, though clock speed has increased slowly as well. Increased transistor count has led to increasingly complex and capable instruction decode, branch prediction, out of order execution, larger caches, and wider execution pipelines in attempt to increase single-threaded performance. We've also seen the rise of embarrassingly parallel architectures like GPUs which more effectively make use of additional transistors despite lower clock speeds. But Moore's been with us the whole time.

Chiplets and advanced packaging are the latest techniques improving scaling and yield keeping Moore alive. As well as continued innovation in transistor design, light sources, computational inverse lithography, and wafer scale designs like Cerebras.

eru•Dec 2, 2025

Yes. Increase in transistor count is what the original Moore's law was about. But during the golden age of Dennard scaling it was easy to get confused.

timschmidt•Dec 2, 2025

Agreed. And specifically Moore's law is about transistors per constant dollar. Because even in his time, spending enough could get you scaling beyond what was readily commercially available. Even if transistor count had stagnated, there is still a massive improvement from the $4,000 386sx Dad somehow convinced Mom to greenlight in the late 80s compared to a $45 Raspberry Pi today. And that factors into the equation as well.

Of course, feature size (and thus chip size) and cost are intimately related (wafers are a relatively fixed cost). And related as well to production quantity and yield (equipment and labor costs divide across all chips produced). That the whole thing continues scaling is non-obvious, a real insight, and tantamount to a modern miracle. Thanks to the hard work and effort of many talented people.

eru•Dec 2, 2025

The way I remember it, it was about the transistor count in the commercially available chip with the lowest per transistor cost. Not transistor count per constant dollar.

Wikipedia quotes it as:

> The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.

But I'm fairly sure, if you graph how many transistors you can buy per inflation adjusted dollar, you get a very similar graph.

timschmidt•Dec 2, 2025

Yes. I think you're probably right about phrasing. And transistor count per inflation adjusted dollar is the unit most commonly used to graph it. Similar ways to say the same thing.

js8•Dec 2, 2025

You could put 64TBs of storage into your pocket with current technology. There are 4TB microSD cards available.

Not sure about the stated GFlops.. but I suspect we find that AI doesn't need that much compute to begin with.

fragmede•Dec 2, 2025

You can run models locally on high end smartphones today with apps like PocketPal or Local LLM.

observationist•Dec 2, 2025

The Law of Accelerating Returns is a better formulation, not tied to any particular substrate, it's just not as widely known.

https://imgur.com/a/UOUGYzZ - had chatgpt whip up an updated chart.

LoAR shows remarkably steady improvement. It's not about space or power efficiency, just ops per $1000, so transistor counts served as a very good proxy for a long time.

There's been sufficiently predictable progress that 80-100 TFLOPS in your pocket by 3035 is probably a solid bet, especially if a fully generative AI OS and platform catches on as a product. The LoAR frontier for compute in 2035 is going to be more advanced than the limits of prosumer/flagship handheld products like phones, so theres a bit of lag and variability.

ActorNightly•Dec 2, 2025

Nothing to do with Moores Law or AGI.

The current models are simply inefficient for their capability in how they handle data.

eru•Dec 2, 2025

> What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.

ActorNightly•Dec 2, 2025

I didn't say winning business, I said winning on cost effectiveness.

JSR_FDED•Dec 1, 2025

Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.

ActorNightly•Dec 2, 2025

Lol its kinda suprising that the level of understanding around LLMs is so little.

You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.

You already have Vector Databases that are used as context stores with information retrieval.

Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.

So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.

eru•Dec 2, 2025

You should start a company and try your strategy. I hope it works! (Though I am doubtful.)

In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.

ActorNightly•Dec 2, 2025

If someone wants to fund me, Ill gladly work on this. There is no money in this though, because selling cloud service is much more profitable.

Its also not a matter of it working or not. It already works. Take a small model that fits on a GPU with a large context window, like Gemma 27b or smaller ones, give it a whole bunch of context on the topic, and ask it questions and it will generate very accurate results based on the context.

So instead of encoding everything into the model itself, you can just take training data, store it in vector DBs, and train a model to retrieve that data based on query, and then the rest of it is just training context extraction.

eru•Dec 3, 2025

> There is no money in this though, because selling cloud service is much more profitable.

Oh, be more creative. One simple way to make money off your idea is:

(1) Get a hedge fund to finance your R&D.

(2) Hedge fund shorts AI cloud providers and other relevant companies.

(3) Your R&D pans out and the AI cloud providers' stock tanks.

(4) The hedge fund makes a profit.

Though I don't understand: wouldn't your idea work work when served from the cloud, too? If what you are saying is true, you'd provide a better service at lower cost?

JSR_FDED•Dec 2, 2025

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

ActorNightly•Dec 2, 2025

Ok then point out where I made a mistake.

Nothing shows lack of understanding of the subject matter more than referencing the Dunning Kruger effect in a conversation.

bbor•Dec 1, 2025

I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

[1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

[2] https://huggingface.co/google/gemma-3n-E4B-it

[3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

qeternity•Dec 1, 2025

> but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]

No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.

chistev•Dec 2, 2025

How do they make their money

binary132•Dec 2, 2025

I suspect it is a state venture designed to undermine the American-led proprietary AI boom. I'm all for it, tbh, but as others have pointed out, if they successfully destroy the American ventures it's not like we can expect an altruistic endgame from them.

vitaflo•Dec 2, 2025

Deepseek is owned by a Chinese hedge fund. It was originally created for finance and then generalized later. In any case you pay for it like any other LLM.

htrp•Dec 1, 2025

what is the ballpark vram / gpu requirement to run this ?

rhdunn•Dec 1, 2025

For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

You'll also need space for the context window; I'm not sure how to calculate that either.

petu•Dec 1, 2025

I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp).

jodleif•Dec 1, 2025

A common pattern is to offload (most of) the expert layers to the CPU. This combination is still quite fast even with slow system ram, though obviously inferior to a pure VRAM loading

anvuong•Dec 1, 2025

I think your understanding of MoE is wrong. Depending on the settings, each token can actually be routed to multiple experts, called experts choice architecture. This makes it easier to parallelize the inference (each expert on a different device for example), but it's not simply just keeping one expert in memory.

lalassu•Dec 1, 2025

Disclaimer: I did not test this yet.

I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases.

I hope it's not the same here.

vorticalbox•Dec 1, 2025

This used to happen with bench marks on phones, manufacturers would tweak android so benchmarks ran faster.

I guess that’s kinda how it is for any system that’s trained to do well on benchmarks, it does well but rubbish at everything else.

make3•Dec 1, 2025

yes, they turned off all energy economy measures when benchmarking software activity was detected, which completely broke the point of the benchmarks because your phone is useless if it's very fast but the battery lasts one hour

make3•Dec 1, 2025

I would assume that huge amount is spent in frontier models just making the models nicer to interact with, as it is likely one of the main things that drives user engagement.

not_that_d•Dec 1, 2025

What is "Vibe testing"?

BizarroLand•Dec 1, 2025

I would assume that it is testing how well and appropriately the LLM responds to prompts.

catigula•Dec 1, 2025

He means capturing things that benchmarks don't. You can use Claude and GPT-5 back-to-back in a field that score nearly identically on. You will notice several differences. This is the "vibe".

msp26•Dec 1, 2025

K2 Thinking has immaculate vibes. Minimal sycophancy and a pleasant writing style while being occasionally funny.

If it had vision and was better on long context I'd use it so much more.

catigula•Dec 1, 2025

This is why I stopped bothering checking out these models and, funnily enough, grok.

nylonstrung•Dec 2, 2025

My experience with deepseek and Kimi is quite the opposite: smarter than benchmarks would imply

Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement

CuriouslyC•Dec 2, 2025

This was a bad problem with earlier Chinese (Qwen and Kimi K1 in particular) models, but the original DeepSeek delivered and GLM4.6 delivers. They don't diversify training as much as American labs so you'll find more edge cases and the interaction experience isn't quite as smooth, but the models put in work.

segmondy•Dec 2, 2025

Weird, I have gone local for the last 2 years. I use Chinese models 90% of the time, Kimi K2 Thinking, DeepSeekv3.Terminus, Qwen3 and GLM4.6. I'm not vibe testing it but really putting them to use and they do keep up great.

spullara•Dec 1, 2025

I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that.

  % curl https://api.deepseek.com/models \          
    -H "Authorization: Bearer ${DEEPSEEK_API_KEY}"  
  {"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}

KronisLV•Dec 1, 2025

Oh hey, quality improvement without doing anything!

(unless/until a new version gets worse for your use case)

hnfong•Dec 1, 2025

Agree that having datestamps on model ids is a good idea, but it's open source, you can download the weights and build on those. In the long run, this is better than the alternative of calling API of a proprietary model and hoping it doesn't get deprecated.

deaux•Dec 2, 2025

Anthropic has done similar before (changing model behavior on the same dated endpoint).

cherioo•Dec 2, 2025

Allegedly deepseek is doing this because they don’t have enough gpu to serve two models concurrently.

twistedcheeslet•Dec 1, 2025

How capable are these models at tool calling?

potsandpans•Dec 1, 2025

From some very brief experimentation with deepseek about 2 months ago, tool calling is very hot or miss. Claude appears to be the absolute best.

segmondy•Dec 2, 2025

Depends on if they are trained for tool calling, this model is experimentation with new architecture, training methods, etc. It's not designed for tool calling. If you want to tool call, then you should look into DeepSeekv3.1-Terminus.

Foobar8568•Dec 1, 2025

At least, there is no doubt where he is from !

which version are you?

我是DeepSeek最新版本模型！如果你想了解具体的版本号信息，我建议你：

    查看官方文档 - DeepSeek官网和文档会有最准确的版本信息

    关注官方公告 - 版本更新通常会在官方渠道公布

    查看应用商店/网页版 - 使用界面通常会显示当前版本

我具备DeepSeek的所有最新功能特性，包括：

    强大的对话和推理能力

    128K上下文长度

    文件上传处理（图像、文档等）

    联网搜索功能（需手动开启）

    完全免费使用

如果你需要知道精确的版本号用于技术对接或其他特定用途，最好直接查阅官方技术文档，那里会有最准确和详细的技术规格说明。

有什么其他问题我可以帮你解答吗？

schlauerfox•Dec 1, 2025

It's so strange when it obviously hits a preprogrammed non-answer in these models, how can one ever trust them when there is a babysitter that interferes in an actual answer. I suppose that asking it what version it is isn't a valid question in it's training data so it's programmed to say check the documentation, but still definitely suspicious when it gives a non-answer.

embedding-shape•Dec 1, 2025

> DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.

At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS?

throwdbaaway•Dec 2, 2025

That DSML in the encoding directory looks quite a bit different from the Harmony chat template.

catigula•Dec 1, 2025

To push back on naivety I'm sensing here I think it's a little silly to see Chinese Communist Party backed enterprise as somehow magnanimous and without ulterior, very harmful motive.

jascha_eng•Dec 1, 2025

Oh they need control of models to be able to censor and ensure whatever happens inside the country with AI stays under their control. But the open-source part? Idk I think they do it to mess with the US investment and for the typical open source reasons of companies: community, marketing, etc. But tbh especially the messing with the US, as a european with no serious competitor, I can get behind.

catigula•Dec 1, 2025

They're pouring money to disrupt American AI markets and efforts. They do this in countless other fields. It's a model of massive state funding -> give it away for cut-rate -> dominate the market -> reap the rewards.

It's a very transparent, consistent strategy.

AI is a little different because it has geopolitical implications.

ForceBru•Dec 1, 2025

When it's a competition among individual producers, we call it "a free market" and praise Hal Varian. When it's a competition among countries, it's suddenly threatening to "disrupt American AI markets and efforts". The obvious solution here is to pour money into LLM research too. Massive state funding -> provide SOTA models for free -> dominate the market -> reap the rewards (from the free models).

catigula•Dec 1, 2025

We don't do that.

fragmede•Dec 1, 2025

It's not like the US doesn't face similar accusations. One such case is the WTO accusing Boeing of receiving illegal subsidies from the US government. https://www.transportenvironment.org/articles/wto-says-us-ga...

tokioyoyo•Dec 1, 2025

I can’t believe I’m shilling for China in these comments, but how different it is for company A getting blank check investments from VCs and wink-wink support from the government in the west? And AI-labs in China has been getting funding internally in the companies for a while now, before the LLM-era.

ptsneves•Dec 1, 2025

This is the rare earth minerals dumping all over again. Devalue to such a price as to make the market participants quit, so they can later have a strategic stranglehold on the supply.

This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.

It is a kind of cheat on the fair market but at the same time it is also costly to China and its capital costs may become unsustainable before the last players fold.

jsiepkes•Dec 1, 2025

The way we fund the AI bubble in the west could also be described as: "kind of cheat on the fair market". OpenAI has never made a single dime of profit.

nylonstrung•Dec 2, 2025

Yeah and OpenAI's CPO was artificially commissioned as a Lt. Colonel in the US Army in conjunction with a $200M contract

Absurd to say Deepseek is CCP controlled while ignoring the govt connection here

embedding-shape•Dec 1, 2025

> This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.

It's a bit early to have any sort of feelings about it, isn't it? You're speaking in absolutes, but none of this is necessarily 100% true as we don't know their intentions. And judging a group of individuals intention based on what their country seems to want, from the lens of a foreign country, usually doesn't land you with the right interpretation.

CamperBob2•Dec 1, 2025

Good luck making OpenAI and Google cry uncle. They have the US government on their side. They will not be allowed to fail, and they know it.

What I appreciate about the Chinese efforts is that they are being forced to get more intelligence from less hardware, and they are not only releasing their work products but documenting the R&D behind them at least as well as our own closed-source companies do.

A good reason to stir up dumping accusations and anti-China bias would be if they stopped publishing not just the open-source models, but the technical papers that go with them. Until that happens, I think it's better to prefer more charitable explanations for their posture.

tokioyoyo•Dec 1, 2025

I mentioned this before as well, but AI-competition within China doesn’t care that much about the western companies. Internal market is huge, and they know winner-takes-it-all in this space is real.

Jedd•Dec 1, 2025

> It is a kind of cheat on the fair market ...

I am very curious on your definition and usage of 'fair' there, and whether you would call the LLM etc sector as it stands now, but hypothetically absent deepseek say, a 'fair market'. (If not, why not?)

jascha_eng•Dec 1, 2025

Do they actually spend that much though? I think they are getting similar results with much fewer resources.

It's also a bit funny that providing free models is probably the most communist thing China has done in a long time.

josh_p•Dec 1, 2025

Isn’t it already well accepted that the LLM market exists in a bubble with a handful of companies artificially inflating their own values?

ESH

DiogenesKynikos•Dec 1, 2025

Are you by chance an OpenAI investor?

We should all be happy about the price of AI coming down.

doctorwho42•Dec 1, 2025

But the economy!!! /s

Seriously though, our leaders are actively throwing everything and the kitchen sink into AI companies - in some vain attempt to become immortal or own even more of the nations wealth beyond what they already do, chasing some kind of neo-tech feudalism. Both are unachievable because they rely on a complex system that they clearly don't understand.

coliveira•Dec 1, 2025

> cheat on the fair market

Can you really view this as a cheat this when the US is throwing a trillion dollars in support of a supposedly "fair market"?

deaux•Dec 2, 2025

Ah, so exactly like Uber, Netflix, Microsoft, Amazon, Facebook and so on have done to the rest of the world over the last few decades then?

Where do you think they learnt this trick? Years lurking on HN and this post's comment section wins #1 on the American Hypocrisy chart. Unbelievable that even in the current US people can't recognize when they're looking in the mirror. But I guess you're disincentivized to do so when most of your net worth stems from exactly those companies and those practices.

corimaith•Dec 2, 2025

Except domestic alternatives to the tech companies you listed were not driven out by them, they still exist today with substantial market share. American tech dominance elsewhere has more to do a lack of competition, and when competition does exist they're more often than not held at a disadvantage by domestic governments. So your counter narrative is false here.

devsda•Dec 2, 2025

> American tech dominance elsewhere has more to do a lack of competition,

Do you believe the lack of competition is purely because the products are superior?

US tech is now sort of like the dollar. People/countries outside the US need and want alternatives to hedge against in the event of political uncertainity but cannot do it completely for various reasons including arm twisting by the US govt.

One example is some govts and universities in the EU are trying to get rid of MS products for decades but they are unable to.

bogdan•Dec 2, 2025

> American tech dominance elsewhere has more to do a lack of competition

If that's true, why doesn't America compete on this front against China?

> they're more often than not held at a disadvantage by domestic governments

So when the US had the policy advantage over the EU it was just the market working, but when China has the policy advantage over the US it suddenly becomes unfair?

corimaith•Dec 2, 2025

>> they're more often than not held at a disadvantage by domestic governments

I think you misunderstood this. When domestic competitor arise against American tech, the domestic government tends to explicitly favour those competitor against American tech, placing the latter at an disadvantage.

You can see India or China or Korea or SEA where they have their own favored food delivery apps and internet services. Even in the EU the local LLM companies like Mistral are favored by local businesses for integration over OpenAI. Clearly American tech hasn't actually displaced serious domestic competitors, so the rare earths comparison fails when the USA in contrast is far more willing to let local businesses fail.

ptsneves•Dec 2, 2025

Not American and I also agree that the current big techs should be broken up by force of the state, there is a very big difference between a company becoming monopolistic due to market forces, and a company becoming monopolistic due to state strategy, intervention, backing.

Things can be bad in a spectrum and I believe it is much easier for society/state to break up a capitalistic monopoly than a state backed monopoly. To illustrate, the state has sued some of those companies and they were seriously threatened, because of competition ills. That is not the case with a state company.

Draiken•Dec 2, 2025

And what exactly are grants then? Tariffs? All the lobbied laws that benefit specific corporations or industries? Aren't they state backed advantages?

Banks created their oligopolies and then who saved them when they fucked up?

Isn't Tesla a state backed monopoly in the USA because of grants and tariffs on external competitors? Isn't SpaceX? Yet nobody treats then as state backed.

I don't understand this necessity to put companies in a pedestal and hate on states. Capitalist propaganda I guess?

Market forces are manipulated all the time. This distinction is nonsense. Companies influence states and vice-versa.

csomar•Dec 2, 2025

Prosecutor, judge and jury? You have access to their minds to know their true intentions? This whole “deepseek is controlled by CCP” is ridiculous. If you want to know how bad the CCP is at IT, then check the government backed banks.

The way I see this, some tech teams in China have figured out that training and tuning LLMs is not that expensive after all and they can do it at a fraction of the cost. So they are doing it to enter a market previously dominated by US only players.

nextaccountic•Dec 2, 2025

Where exactly is this fair market? Giant US companies love rules and regulations, but only when it benefits them (and they pay dearly for it)

gazaim•Dec 1, 2025

*Communist Party of China (CPC)

v0y4g3r•Dec 1, 2025

You nailed it

amunozo•Dec 1, 2025

The motive is to destroy the American supremacy on AI, it's not that deep. This is much easier to do open sourcing the models than competing directly, and this can have good ramifications for everybody, even if the motive is "bad".

tehjoker•Dec 1, 2025

the motive is to prevent us dominance of this space, which is a good thing

mac-attack•Dec 2, 2025

And the next question is what have they some with power historically, and what are they liable to do in the future with said power. Limiting scope to AI is shortsighted and doesn't speak to the concerns people have beyond an Ai Race

tehjoker•Dec 2, 2025

It's a fair question, but my view of America's influence on world affairs has been dismal. China by contrast has not had a history of invading its neighbors, though I strongly criticize their involvement in the American attack on Cambodia and Vietnam (China supported the Khmer Rouge and briefly invaded Vietnam but was quickly pushed back, a reason Mao is sometimes criticized as having a good early period and a bad late period).

Meanwhile, America has been causing death and destruction around the world. It's easy to make lists: Vietnam, Iraq, Gaza, Cuba, South and Central America etc etc.

mac-attack•Dec 4, 2025

I am an American and can appreciate the shortcomings of my country. I also have the ability to see the shortcomings of China as well. Do you see the irony that I asked to reflect on China's history and you instead list things you don't like about the US?

stared•Dec 2, 2025

Do you think it is from goodness of their heart that corporates support open source? E.g. Microsoft - VSCode and Typescript, Meta - PyTorch and React, Google - Chromium and Go.

Yet, we (developers, users, human civilization), benefit from that.

So yes, I cherish when Chinese companies release open source LLMs. Be it as it fits their business model (the same way as US companies) or from grants (the same way as a lot of EU-backed projects, e.g. Python, DuckDB, scikit-learn).

mcbuilder•Dec 1, 2025

After using it a couple hours playing around, it is a very solid entry, and very competitive compared with the big US relaeses. I'd say it's better than GLM4.6 and I'm Kimi K2. Looking forward to v4

energy123•Dec 2, 2025

Did you try with 60k+ context? I found previous releases to be lacklustre which I tentatively attributed to the longer context, due to the model being trained on a lot of short context data.

gradus_ad•Dec 1, 2025

How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors, but models seem to be reaching a performance plateau; the top open weight models are generally indistinguishable from the top private models.

Infrastructure owners with access to the cheapest energy will be the long run winners in AI.

tsunamifury•Dec 1, 2025

Pure models clearly aren’t the monetizing strategy, use of them on existing monetized surfaces are the core value.

Google would love a cheap hq model on its surfaces. That just helps Google.

gradus_ad•Dec 1, 2025

Hmmm but external models can easily operate on any "surface". For instance Claude Code simply reads and edits files and runs in a terminal. Photo editing apps just need a photo supplied to them. I don't think there's much juice to squeeze out of deeply integrated AI as AI by its nature exists above the application layer, in the same way that we exist above the application layer as users.

tsunamifury•Dec 2, 2025

Gemini is the most used model on the planet per request.

All the facts say otherwise to your thoughts here.

dotancohen•Dec 1, 2025

People and companies trust OpenAI and Anthropic, rightly or wrongly, with hosting the models and keeping their company data secure. Don't underestimate the value of a scapegoat to point a finger at when things go wrong.

reed1234•Dec 1, 2025

But they also trust cloud platforms like GCP to host models and store company data.

Why would a company use an expensive proprietary model on Vertex AI, for example, when they could use an open-source one on Vertex AI that is just as reliable for a fraction of the cost?

I think you are getting at the idea of branding, but branding is different from security or reliability.

verdverm•Dec 2, 2025

Looking at and evaluating kimi-2/deepseek vs gemini-family (both through vertex ai), it's not clear open sources is always cheaper for the the same quality

and then we have to look at responsiveness, if the two models are qualitatively in the same ballpark, which one runs faster?

ehnto•Dec 2, 2025

> Don't underestimate the value of a scapegoat to point a finger at when things go wrong.

Which is an interesting point in favour of the human employee, as you can only consolidate scape goats so far up the chain before saying "It was AIs fault" just looks like negligence.

jonplackett•Dec 1, 2025

Either...

Better (UX / ease of use)

Lock in (walled garden type thing)

Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name)

niek_pas•Dec 1, 2025

> Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name.

Not Google, and not Amazon. Microsoft is a maybe.

polyomino•Dec 1, 2025

The success of Facebook basically proves that public brand perception does not matter at all

acephal•Dec 1, 2025

Facebook itself still has a big problem with it's lack of youth audience though. Zuck captured the boomers and older Gen X, which are the biggest demos of living people however.

eru•Dec 2, 2025

> Zuck captured the boomers and older Gen X, which are the biggest demos of living people however.

In the developed world. I'm not sure about globally.

reed1234•Dec 1, 2025

People trust google with their data in search, gmail, docs, and android. That is quite a lot of personal info, and trust, already.

All they have to do is completely switch the google homepage to gemini one day.

poszlem•Dec 1, 2025

Or lobbing for regulations. You know. The "only american models are safe" kind of regulation.

iLoveOncall•Dec 1, 2025

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?

They won't. Actually, even if open models aren't competitive, they still won't. Hasn't this been clear since a while already?

There's no moat in models, investments in pure models has only been to chase AGI, all other investment (the majority, from Google, Amazon, etc.) has been on products using LLMs, not models themselves.

This is not like the gold rush where the ones who made good money were the ones selling shovels, it's another kind of gold rush where you make money selling shovels but the gold itself is actually worthless.

teleforce•Dec 1, 2025

>How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?

According to Google (or someone at Google) no organization has moat on AI/LLM [1]. But that does not mean that it is not hugely profitable providing it as SaaS even you don't own the model or Model as a Service (MaaS). The extreme example is Amazon providing MongoDB API and services. Sure they have their own proprietary DynamoDB but for the most people scale up MongoDB is more than suffice. Regardless brand or type of databases being used, you paid tons of money to Amazon anyway to be at scale.

Not everyone has the resource to host a SOTA AI model. On top of tangible data-intensive resources, they are other intangible considerations. Just think how many company or people host their own email server now although the resources needed are far less than hosting an AI/LLM model?

Google came up with the game changing transformer at its backyard and OpenAI temporarily stole the show with the well executed RLHF based system of ChatGPT. Now the paid users are swinging back to Google with its arguably more superior offering. Even Google now put AI summary as its top most search return results for free to all, higher than its paid advertisement clients.

[1]Google “We have no moat, and neither does OpenAI”:

https://news.ycombinator.com/item?id=35813322

istjohn•Dec 2, 2025

That quote from Google is 2.5 years old.

zamadatix•Dec 2, 2025

Have they said differently since?

KeplerBoy•Dec 2, 2025

I also cringed a bit about seeing a statement that old being cited, but all the events since then only proved google right, I'd say.

Improvements seem incremental and smaller. For all I care, I could still happily use sonnet 3.5.

mistrial9•Dec 2, 2025

undergrads at UC Berkeley are wearing vLLM t-shirts

Tepix•Dec 2, 2025

Hosting a SOTA AI model is something that can be separated well from the rest of your cloud deployments. So you can pretty much choose between lots of vendors and that means margins will probably not be that great.

delichon•Dec 1, 2025

> Infrastructure owners with access to the cheapest energy will be the long run winners in AI.

For a sufficiently low cost to orbit that may well be found in space, giving Musk a rather large lead. By his posts he's currently obsessed with building AI satellite factories on the moon, the better to climb the Kardashev scale.

kridsdale1•Dec 2, 2025

The performance bottleneck for space based computers is heat dissipation.

Earth based computers benefit from the existence of an atmosphere to pull cold air in from and send hot air out to.

A space data center would need to entirely rely on city sized heat sink fins.

delichon•Dec 2, 2025

For radiative cooling using aluminum, per 1000 watts at 300 kelvin: ~2.4m^2 area, ~4.8 liters volume, ~13kg weight. So a Starship (150k kg, re-usable) could carry about a megawatt of radiators per launch to LEO.

And aluminum is abundant in the lunar crust.

ehnto•Dec 2, 2025

We are jumping pretty far ahead for a planet that can barely put two humans up there, but it is a great deal of my scifi dreams in one technology tree so I'll happily watch them try.

eru•Dec 2, 2025

The grandfather comment is perhaps mixing up two things:

If launch costs are cheap enough, you can bring aluminum up from earth.

But once your in-space economy is developed enough, you might want to tap the moon or asteroids for resources.

ehnto•Dec 2, 2025

And the presence of humans. Like with a lot of robotics, the devil is probably in the details. Very difficult to debug your robot factory while it's in orbit.

That was fun to write but also I am generally on board with humanity pushing robotics further into space.

I don't think an orbital AI datacentre makes much sense as your chips will be obsolete so quickly that the capex getting it all up there will be better spent on buying the next chips to deploy on earth.

eru•Dec 2, 2025

Well, _if_ they can get launch costs down to 100 dollar / kg or so, the economics might make sense.

Radiative cooling is really annoying, but it's also an engineering problem with a straightforward solution, if mass-in-orbit becomes cheap enough.

The main reason I see for having datacentres in orbit would be if power in orbit becomes a lot cheaper than power on earth. Cheap enough to make up for the more expensive cooling and cheap enough to make up for the launch costs.

Otherwise, manufacturing in orbit might make sense for certain products. I heard there's some optical fibres with superior properties that you can only make in near zero g.

I don't see a sane way to beam power from space to earth directly.

bashtoni•Dec 2, 2025

This is exactly why the CEO of Anthropic has been talking up "risks" from AI models and asking for legislation to regulate the industry.

menaerus•Dec 2, 2025

He's talking about completely different type of risks and regulation. It's about the job displacement risks, security and misuse concerns, and ethical and societal impact.

https://www.youtube.com/watch?v=aAPpQC-3EyE

https://www.youtube.com/watch?v=RhOB3g0yZ5k

pembrook•Dec 2, 2025

I call this the "Karl Marx Fallacy." It assumes a static basket of human wants and needs over time, leading to the conclusion competition will inevitably erode all profit and lead to market collapse.

It ignores the reality of humans having memetic emotions, habits, affinities, differentiated use cases & social signaling needs, and the desire to always want to do more...constantly adding more layers of abstraction in fractal ways that evolve into bigger or more niche things.

5 years ago humans didn't know a desire for gaming GPUs would turn into AI. Now it's the fastest growing market.

Ask yourself: how did Google Search continue to make money after Bing's search results started benchmarking just as good?

Or: how did Apple continue to make money after Android opened up the market to commoditize mobile computing?

Etc. Etc.

chinesedessert•Dec 2, 2025

this name is illogical as karl marx did not commit this fallacy

pembrook•Dec 2, 2025

Yes, he did, and it was fundamental to his entire economic philosophy: https://en.wikipedia.org/wiki/Tendency_of_the_rate_of_profit...

deadfoxygrandpa•Dec 2, 2025

no, he didn't, and your link has nothing to do with your fallacy you were talking about

pembrook•Dec 2, 2025

It absolutely does, and the fact that now 2 marxists (which I can see from your comment history) have a total inability to offer any actual rebuttal, does not surprise me.

deadfoxygrandpa•Dec 3, 2025

theres nothing to rebut. you made an assertion thats false on the face of it and posted a link to something totally unrelated. it's so wrong i dont even know which part you're misunderstanding.

but one of the core ideas of marx's conception of history is that human needs, wants, and human nature itself are constantly in a state of change and that those needs and desires are in large part a product of the environment in which you live, and further that humans and human society in turn change their own environments which in turn change human nature itself

Balinares•Dec 2, 2025

I'm not seeing anywhere in that page anything about an assumed static basket of human wants and needs. Maybe I missed it -- can you point out where you saw that?

Interesting, though, that per the very same article someone like Adam Smith concurred empirically with Marx's observation on the titular tendency of rates of profit to fall. This suggests to me it likely had some meat to it.

pembrook•Dec 2, 2025

Without going too deep on it (I used to be a fan in university as a silly youth), the tendency of the rate of profit to fall is the key aspect of Marx's crisis theory.

Basically dude thought the competition inherent in capitalism would cause all profit to be competed to zero leading to an eventual 'crisis' and collapse of the capitalist means of production.

Implicit in this assumption is the idea that the things humans need and want changes/evolves in a predictable way, and not in a chaotic/fractal/reflexive way (which is what actually happens).

An eventual static basket of desired goods would be the only mechanism by which competition ever could compete profits to zero. If the basket is dynamic/reflexive/evolving, there's constantly new gaps opening between human desires and market offerings to arbitrage for profit. You can just look at the average profit margins of S&P500 companies over time to see they are not falling.

The further we get from subsistence worries (Adam Smith's invisible hand has pulled virtually the entire globe out of living in the dirt), the more divergent and higher abstraction these wants and needs become, and hence the profit opportunities are only increasing -- which is how the economy grows (no, it's not a fixed pie, another Marxian fallacy).

deadfoxygrandpa•Dec 3, 2025

again, marx didnt see it as a fixed pie. thats the whole reason behind his idea of absolute vs relative surplus value, is that the pie isn't fixed. he absolutely saw the (at his time) modern capitalist economy as a revolutionary, dynamic force that brought about a great increase in the absolute amount of productive capacity and wealth in the world

WhyOhWhyQ•Dec 2, 2025

I don't see what OpenAI's niche is supposed to be, other than role playing? Google seems like they'll be the AI utility company, and Anthropic seems like the go-to for the AI developer platform of the future.

linkage•Dec 2, 2025

Anthropic has RLed the shit out of their models to the extent that they give sub-par answers to general purpose questions. Google has great models but is institutionally incapable of building a cohesive product experience. They are literally shipping their org chart with Gemini (mediocre product), AI Overview (trash), AI Mode (outstanding but limited modality), Gemini for Google Workspace (steaming pile), Gemini on Android (meh), etc.

ChatGPT feels better to use, has the best implementation of memory, and is the best at learning your preferences for the style and detail of answers.

a96•Dec 2, 2025

RLed?

cmckn•Dec 2, 2025

Reinforcement learning, I believe

Mistletoe•Dec 2, 2025

Gemini is not mediocre, have you used it lately?

https://www.vellum.ai/llm-leaderboard

blibble•Dec 2, 2025

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?

hopefully they won't

and their titanic off-balance sheet investments will bankrupt them as they won't be able to produce any revenue

alexandre_m•Dec 2, 2025

> What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors

Quality was rarely the reason open source lagged in certain domains. Most of the time, open source solutions were technically superior. What actually hurt open source were structural forces, distribution advantages, and enterprise biases.

One could make an argument that open source solutions often lacked good UX historically, although that has changed drastically the past 20 years.

zarzavat•Dec 2, 2025

For most professional software, the open source options are toys. Is there anything like an open source DAW, for example? It's not because music producers are biased against open source, it's because the economics of open source are shitty unless you can figure out how to get a company to fund development.

throwup238•Dec 2, 2025

> Is there anything like an open source DAW, for example?

Yes, Ardour. It’s no more a toy than KiCad or Blender.

adam_patarino•Dec 2, 2025

It’s convenience - it’s far easier to call an API than deploy a model to a VPC and configure networking, etc.

Given how often new models come out, it’s also easier to update an API call than constantly deploying model upgrades.

But in the long run, I hope open source wins out.

seydor•Dec 2, 2025

Yes but how do you find the best open model? You check google.

mistercheph•Dec 2, 2025

Kagi

seydor•Dec 2, 2025

Let me google "free alternative to kagi"

empath75•Dec 2, 2025

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?

So a couple of things. There are going to be a handful of companies in the world with the infrastructure footprint and engineering org capable of running LLMs efficiently and at scale. You are never going to be able to run open models in your own infra in a way that is cost competitive with using their API.

Competition _between_ the largest AI companies _will_ drive API prices to essentially 0 profit margin, but none of those companies will care because they aren't primarily going to make money by selling the LLM API -- your usage of their API just subsidizes their infrastructure costs, and they'll use that infra to build products like chat gpt and claude, etc. Those products are their moat and will be where 90% of their profit comes from.

I am not sure why everyone is so obsessed with "moats" anyway. Why does gmail have so many users? Anybody can build an email app. For the same reason that people stick with gmail, people are going to stick with chatgpt. It's being integrated into every aspect of their lives. The switching costs for people are going to be immense.

wosined•Dec 1, 2025

Remember: If it is not peer-reviewed, then it is an ad.

vessenes•Dec 1, 2025

I mean.. true. Also, DeepSeek has good cred so far on delivering roughly what their PR says they are delivering. My prior would be that their papers are generally credible.

Havoc•Dec 1, 2025

Good general approach, but deepseek has thus far always delivered. And not just delivered, but under open license too. "Ad" as starting assumption seems overly harsh

orena•Dec 1, 2025

Any results on frontier math or arc ?

Havoc•Dec 1, 2025

Note combination of big frontier level model and MIT license.

singularity2001•Dec 1, 2025

Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB??

jameslk•Dec 2, 2025

128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

aryonoco•Dec 2, 2025

M5 Max probably won’t, but M5 Ultra probably will

eldenring•Dec 2, 2025

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

cowpig•Dec 2, 2025

More and more people I talk to care about privacy, but not in SF

mistercheph•Dec 2, 2025

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

ainch•Dec 2, 2025

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.

sidcool•Dec 2, 2025

Can someone kind please ELI5 this paper?

HarHarVeryFunny•Dec 2, 2025

They've developed a sparse attention mechanism (which they document and release source code for) to increase model efficiency with long context, as needed for fast & cost-effective extensive RL training for reasoning and agentic use

They've built a "stable & scalable" RL protocol - more capable RL training infrastructure

They've built a pipeline/process to generate synthetic data for reasoning and agentic training

These all combine to build an efficient model with extensive RL post-training for reasoning and agentic use, although they note work is still needed on both the base model (more knowledge) and post-training to match frontier performance.

nickandbro•Dec 2, 2025

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5

chronogram•Dec 2, 2025

It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate

arthurcolle•Dec 2, 2025

Surely OpenAI will follow up with a gpt-oss-780b

sfdlkj3jk342a•Dec 2, 2025

What version is actually running on chat.deepseek.com?

It refuses to tell me when asked, only that it's been train with data up until July 2024, which would make it quite old. I turned off search and asked it for the winner of the US 2024 election, and it said it didn't know, so I guess that confirms it's not a recent model.

scottyeager•Dec 2, 2025

You can read that 3.2 is live on web and app here: https://api-docs.deepseek.com/news/news251201

The pdf describes how they did "continued pre-training" and then post training to make 3.2. I guess what's missing is the full pre-training that absorbs most date sensitive knowledge. That's probably also the reason that the versions are 3.x still.

chistev•Dec 2, 2025

I've found it better than ChatGPT lately, at least the free version of GPT.

I don't know, but GPT seems to have regressed a lot, at least the free version.

johnnienaked•Dec 2, 2025

Are we the baddies?

a96•Dec 2, 2025

The AI says shake... "Signs point to yes."

samir123766•Dec 2, 2025

nice

EternalFury•Dec 2, 2025

It does seem good, but it’s slow.

nickstinemates•Dec 2, 2025

I am waiting for the first truly open model without any of the censorship built in.

I wonder how long it will take and how quickly it will try to get shut down.

naeq•Dec 2, 2025

Most open models have been converted to uncensored versions. Search for the model name with the suffix "abliterated".

Aldo_MX•Dec 4, 2025

That's not a realistic expectation.

Classic examples like:

  User: I'm feeling bad
  LLM: Have you considered k*****g yourself?

Are a good example of what an LLM "without censorship" looks like: Good at predicting the most common sequence of text (ex. the most common sarcastic reply from Reddit), but effectively useless.

In order to build a useful LLM (ie. one that actually follows instructions) you need to teach the LLM to prefer the most helpful answer, and that process by itself is already an implicit layer of "censorship" as it requires human supervision, and different humans have different perceptions on what the most helpful answer is, especially when their paycheck is conditioned to a list of "corporate values".

You can only pick between a parrot that repeats random text from the Internet, or a parrot lobotomized to follow the orders from their trainers (which occasionally repeats random text from the Internet, because the training isn't perfect).

Unsurprisingly, the lobotomized parrot is more useful to get actual work done, even if it won't tell you what the CIA[1] did to Mexican Students on October 2nd, 1968.

[1]: https://www.bbc.com/mundo/noticias-america-latina-45662739

johnxie•Dec 2, 2025

Cool to see open models catching up fast. For builders the real question is simple. Which model gives you the tightest loop and the least surprises in production. Sometimes that is open. Sometimes closed. The rest is noise.

imbusy111•Dec 2, 2025

Funny to see tau2-bench on the list of benchmarks, when tau2-bench is flawed and 100% score is impossible, unless you add the tasks to the training set: https://github.com/sierra-research/tau2-bench/issues/89

mark_l_watson•Dec 2, 2025

I used DeepSeek-v3.2 to solve two coding problems by pasting code and directions as one large prompt into a chat interface and it performed very well. VERY WELL!

I am still happy to pay Google because of their ecosystem or Gemini app, NotebookLM, Colab, gemini-cli, etc. Google’s moat for me is all the tooling and engineering around the models.

That said, my one year Google AI subscription ends in four months and I might try an alternative, or at least evaluate options. Alibaba Cloud looks like an interesting low cost alternative to AWS for building systems. I am now a retired ‘gentleman scientist’ now and my personal research is inexpensive no matter who I pay for inference compute, but it is fun to spend a small amount of time evaluating alternatives even though mostly using Google is time efficient.

cgearhart•Dec 2, 2025

So DSA means a lightweight indexing model evaluated over the entire context window + a top-k attention evaluation. There’s no soft max in the indexing model, so it can run blazingly fast in parallel.

I’m surprised that a fixed size k doesn’t experience degrading performance in long context windows though. That’s a _lot_ of responsibility to push into that indexing function. How could such a simple model achieve high enough precision and recall in a fixed size k for long context windows?

swframe2•Dec 2, 2025

The AI market is hard to predict due to the constant development of new algorithms that could emerge unexpectedly. Refer to this summary of Ilya's opinions for insights into the necessity of these new algorithms: https://youtu.be/DcrXHTOxi3I

DeepSeek is a valuable product, but its open-source nature makes it difficult to displace larger competitors. Any advancements can be quickly adopted, and in fact, it may inadvertently strengthen these companies by highlighting weaknesses in their current strategies.

matt-alive•Dec 3, 2025

Is it open source vs enterprise or China vs US?

Frannky•Dec 4, 2025

Smart model—I use it as my main chat. It's interesting that markets were able to predict that it would lower the revenue of the paid ones.