Ian Buck, Nvidia's VP of hyperscale computing, told Reuters on Thursday that deliveries to AWS would begin this year and continue through 2027, with the headline figure sitting at one million GPUs, though the full scope of the transaction reveals how AI infrastructure has evolved beyond simple compute purchases.
TLDR
Nvidia will deliver one million GPUs to Amazon Web Services by the end of 2027, alongside networking gear and its new Groq inference chips. The deal highlights how AI infrastructure has moved beyond compute alone: networking, inference optimisation, and integration with existing systems now matter as much as raw GPU throughput. For enterprise tech leaders, the message is clear: this is what a $26 billion market looks like when it matures.
KEY TAKEAWAYS
The transaction includes Nvidia's Spectrum networking gear, Connect X network adapters, and chips from Nvidia's $17 billion Groq licensing deal announced late last year. AWS will use a combination of seven different Nvidia chip types for inference workloads alone.
For anyone building enterprise AI systems, this is worth understanding. The era of 'buy some GPUs and figure it out' ended somewhere around 2024. What replaced it looks like this deal: specialised inference chips, purpose-built networking, and integration complexity that requires vendor relationships measured in years, not quarters.
What the deal actually includes
Neither company disclosed financial terms, though given that Nvidia's H100 GPUs retail around $25,000 to $40,000 depending on configuration and volume, a million units puts a floor estimate somewhere north of $25 billion, a speculative figure since enterprise pricing at this scale operates on different economics entirely.
The non-GPU hardware in the deal reveals more about where AI infrastructure is heading than the GPUs themselves, since Buck specifically mentioned Spectrum networking switches and Connect X adapters that handle data movement between GPUs, addressing the bottleneck that becomes acute once you scale past a few hundred cards and the networking layer becomes as much a constraint as the compute layer.
Inference is hard. It's wickedly hard. To be the best at inference, it is not a one chip pony. We actually use all seven chips.
โ Ian Buck, VP Hyperscale Computing, Nvidia
The seven-chip comment is the most significant technical detail in the announcement, because inference workloads, which generate outputs from trained models, have fundamentally different hardware requirements than training: lower precision is often acceptable, latency matters more than throughput, and power efficiency becomes a primary metric rather than a secondary concern.
AWS appears to be building inference infrastructure that can route different workload types to different silicon. That is an architecture choice that makes sense at hyperscale but would be absurdly complex for most enterprises.
The strategic context
AWS has been investing heavily in custom silicon. Its Trainium AI accelerators are in third-generation development, with Trainium3 announced in December. CEO Andy Jassy said on the Q4 2025 earnings call that Trainium2 had its fastest-ever demand ramp, and Trainium3 supply would be fully committed by mid-2026.
Yet here AWS is, buying a million Nvidia GPUs, and the two investments are not contradictory because custom silicon works for workloads you can optimise specifically while general-purpose AI infrastructure requires the broad software ecosystem that CUDA provides. Nvidia's moat is not hardware performance so much as it is the accumulated weight of every ML library, framework, and training pipeline that assumes CUDA exists.
The Nvidia deal also includes deploying Connect X and Spectrum X in AWS data centres. That is significant because AWS has spent years perfecting its internal networking stack. Bringing in external networking gear for specific AI workloads suggests the performance delta is large enough to justify the integration complexity.
What this means for enterprise tech buyers
If you are planning AI infrastructure for a company that is not a hyperscaler, this deal offers several lessons.
First, inference hardware is now a separate category. The assumption that you train on big GPUs and run inference on the same hardware is outdated. Dedicated inference silicon, whether Nvidia's Groq chips, Google's TPUs, or AWS's Inferentia, optimises for different metrics. Enterprises will increasingly rent inference-optimised instances rather than running inference on general-purpose GPU clusters.
Second, networking matters as much as compute at scale, and if you are running hundreds of GPUs for distributed training, your interconnect becomes a limiting factor. Solutions like NVLink, InfiniBand, and Spectrum networking exist to address this bottleneck, and while they are overkill below a certain scale, they become mandatory above it.
Third, the hyperscaler premium is justified for variable workloads, because buying and operating a million GPUs requires expertise that lives at maybe ten companies globally, and everyone else rents capacity from those companies, paying a margin that covers the operational complexity they do not have to maintain internally.
The market numbers
The GPU-as-a-Service market is projected to grow from $8.2 billion in 2025 to $26.6 billion by 2030, a 26.5% compound annual growth rate according to MarketsandMarkets. That figure counts only cloud GPU rental; it excludes on-premises deployments and custom silicon.
Nvidia CEO Jensen Huang has said the company sees a $1 trillion revenue opportunity through 2027 across its Rubin and Blackwell chip families, and the AWS deal fits within that framework by demonstrating that hyperscalers are not diversifying away from Nvidia but rather expanding their Nvidia footprint while simultaneously developing alternatives.
DW News on the broader AI infrastructure race and Nvidia's position
The practical implication for enterprise buyers is that GPU access will remain constrained through at least 2027, because AWS just locked in two years of supply and Microsoft, Google, and Oracle have similar agreements in place. The hyperscalers are not hoarding capacity to resell at inflated margins; they are securing supply because demand at current pricing exceeds what Nvidia can manufacture.
The inference complexity
Buck's comment about using seven different chips for inference deserves more attention, because inference workloads vary enormously in their requirements. A real-time chatbot needs sub-100-millisecond response times while a batch processing job running overnight can tolerate much higher latency in exchange for cost efficiency, and image generation has different memory bandwidth requirements than text generation.
The logical conclusion is that inference infrastructure will fragment into specialised tiers, where high-latency batch jobs run on cost-optimised hardware while real-time interactive applications run on latency-optimised hardware, and the software layer that routes requests to the appropriate tier becomes a differentiator.
AWS, Google, and Microsoft are all building that software layer in parallel, and enterprises that build their own AI infrastructure will need to solve the same routing problem, albeit at smaller scale.
What comes next
Nvidia's position is strong but not invulnerable, given that AMD's MI300X is gaining adoption, Google's TPUs power a meaningful share of internal Google AI workloads, and AWS's Trainium is cost-competitive for training specific model architectures.
The pattern that seems to be emerging is specialisation, where CUDA-compatible workloads run on Nvidia while workloads that can be optimised for specific silicon run on custom accelerators, and the boundary between those categories shifts as custom silicon matures and software frameworks develop alternative backends.
For founders and CTOs planning AI infrastructure, the actionable takeaway is straightforward: unless you are building something highly specific that justifies custom hardware investment, you will be renting from AWS, Azure, or Google for the next several years, and the economics favour that approach now that the hyperscalers have locked in the supply to make it work.
The AWS deal runs through 2027, by which time Nvidia's Rubin architecture will be shipping and AWS's Trainium4 will likely be in production, meaning the competitive landscape will look considerably different even as the underlying constant persists: AI infrastructure remains a capital-intensive, operationally complex domain where scale advantages compound over time.
SOURCES & CITATIONS
FREQUENTLY ASKED QUESTIONS



