Google scores six-year Meta cloud deal worth over $10B
The article is a bit misguided, since it "forgets" to add a bit of context until the penultimate paragraph: Meta is investing 10 times as much in building internal infrastructure capacity — this year alone. This deal is a rounding error (<1.7B / year); and with the scale of cloud costs in general, it's probably even invisible at Meta scale.
does anyone know why?
Meta has been doing the full stack integration of internal software down to machine builds for almost as long as google now; what's the point of outsourcing any of it after 20 years?
is it some capex vs opex game? putting fear in the platform engineering people about losing their jobs?
Meta likely can’t construct new data centers quickly enough to meet the growing demand for AI. While they have plans to build them, it will take several years. Consequently, they have entered into this six-year contract with Google as a temporary solution.
So a fairer way of putting this would be Meta pays Google $10B for AI ... and NVidia ... about 600B, maybe 700B over the same period.
gotta keep that circular ai hype economy rolling
Would be very interested to know how this is structured, as presumably it's a hedge for Meta's uncertainty about how much compute they'll need.
It's the ouroboros - snake eating its own tail.
Google needs deal this due to flights away from its cloud.
It can't compete with Azure for simple/coarse-grained services and AWS for complex/fine-grained services.
Atm, Google cloud is only good for cheap high-ram one-run-centric compute (AWS is cheaper for generic compute and reserved compute), simple container execution (Cloud Run), and ~100 TB bulk storage.
Your phrasing seems off. Why does Meta care what Google needs? It would seem that it's exactly backwards. Meta needs this because training resources are scarce. And Google is in the enviable position of having TPUs.
I read GP as implying GCP would be inclined to negotiate aggressively to be sure to secure this deal, due to the factors they list.
Both companies need to get something out of the deal. Listing the benefits to only one side is probably missing part of the story.
Yes. That was exactly my point.
> Atm, Google cloud is only good for cheap high-ram one-run-centric compute (AWS is cheaper for generic compute and reserved compute), simple container execution (Cloud Run), and ~100 TB bulk storage.
You are crazy if you imply AWS Athena or Azure Synapse are better than BigQuery.
Unfortunately, I've never used BigQuery in production so I don't feel qualified to comment on it.
Came here to say that Big Query is a phenomenal product, that alone is a reason for a lot of companies to adopt GCP.
> [GCP] can't compete with Azure...
Based on my professional experience, when we ignore cost [0] my ranking of the big three for "plain old computing" (that is, just compute, networking, and storage workloads) is AWS, GCP, Azure.
Azure is very, very flaky. Things often break for no clear reason, or things are changed in unexpected ways without warning, and then quietly reverted back. [1] I used to say that the only consistently good thing about Azure was that you could throw an entire Subscription in the trash and have everything in there be destroyed... but even that has become intermittently unreliable!
Given how godawful Azure is, I expect that companies use it either because they think they need it for its AD integration, or because they get very deep discounts on Windows licenses for VMs running on Azure.
[0] Both because I never had to pay the bills and (I assume) our cost estimators were always full of lies because the tooling (and I) had no idea what discounts we had negotiated.
[1] (Don't) ask me about the time Azure added in some sort of multi-minute "cooldown" time for reuse of a statically-assigned IP address that was assigned to a VM that Azure reported was completely destroyed, and we were attempting to assign to a brand new VM. Creation of the new VM kept failing with some wacky error. Azure support was clueless, and the problem vanished and came back several times over the course of a year.
I concur with your rankings. I was very careful to couch the Google compute cost "win" solely to one-shot-memory-intensive runs--not to any other configuration or long-term reservation.
> I was very careful to couch the Google compute cost "win"...
Respectfully, I was very explicit that I was ignoring cost in my ranking.
Azure might have a ton of great features and could be (for all I know) a quarter the cost of GCP. But in my professional experience, I have run into recurring Azure issues that are entirely out of my control that blocked deployment of new VMs for hours and hours (and occasionally days). That's business-disrupting.
Meta is planning to spend $65B+ on capex this year. That's a lot of data centers. Why do they need a tiny bit more from Google?
> That's a lot of data centers. Why do they need a tiny bit more from Google?
Data centers don’t pop up overnight, until then they are going to use a vendor :-)
this is the correct answer.
Assuming this has a lot to do with Google's TPUs. Google is well positioned to be the AWS for AIs given the increased efficiency of TPUs, which only they have.
Could be other way round too. Meta wants to use their own data centers capacity for their custom AI solutions. Generic compute and storage for online and batch workloads can be moved to Google cloud infra.
Also, AI training can be centralized but user serving benefits from being close to the user. So Meta might be building huge new data centers for AI training and centralized analytics etc, while using plenty of DCs owned by others around the globe to run their apps.
Yep definitely could be
Seems likely because the article says "Meta’s deal with Google is mainly around artificial intelligence infrastructure, said one of the people".
GCP getting second tier TPU allocation b/c TPU cannot be enough to meet GDM needs. At this point, it would be very stupid for external customers betting on TPUs (I am looking at Apple).
> second tier TPU allocation
What exactly does this mean?
Probably that there are already several generations of TPU hardware - the best ones go to internal use, while the older hardware gets rented out to gcp to amortize the development costs.
Why would it be stupid?
Assuming they don’t screw it up. Google has a ton of great stuff but when it comes to actually making into a product they flounder. GCP still needs a lot of work.
What more of a product do you need than being able to boot a machine with a TPU strapped on at relative low cost?
GCP did that by firing senior support people and replacing them with more junior, offshore ones.
This can be as simple as someone taking the time to collect all the GCP accounts accumulated over the years for random projects into an enterprise committed spend. Doesn't have to be anything that crazy or new.