Work

Nvidia Confirms One Million GPU Sale to AWS by 2027

The deal runs through 2027 and includes far more than graphics cards. It's a window into how the hyperscalers are building their AI futures.

Server rack with rows of GPU accelerator cards, cables visible, industrial lighting

GPU clusters in a hyperscale data centre.

Editor

By Editor

Mar 21, 2026 · 8 min read

By Takeshi Mori · 2026-03-21

Ian Buck, Nvidia's VP of hyperscale computing, told Reuters on Thursday that deliveries to AWS would begin this year and continue through 2027, with the headline figure sitting at one million GPUs, though the full scope of the transaction reveals how AI infrastructure has evolved beyond simple compute purchases.

TLDR

Nvidia will deliver one million GPUs to Amazon Web Services by the end of 2027, alongside networking gear and its new Groq inference chips. The deal highlights how AI infrastructure has moved beyond compute alone: networking, inference optimisation, and integration with existing systems now matter as much as raw GPU throughput. For enterprise tech leaders, the message is clear: this is what a $26 billion market looks like when it matures.

KEY TAKEAWAYS

01Nvidia confirmed it will sell one million GPUs to AWS between 2026 and 2027, marking the largest disclosed single-customer GPU sale.

02The deal includes Spectrum networking chips, Connect X adapters, and Nvidia's new Groq inference chips from the $17 billion licensing acquisition.

03AWS will deploy seven different Nvidia chip types for inference workloads, signalling that inference now requires specialised hardware stacks.

04The GPU-as-a-Service market is projected to reach $26.6 billion by 2030, growing at 26.5% annually from $8.2 billion in 2025.

05AWS simultaneously invests in its own Trainium chips, using Nvidia for workloads where custom silicon cannot yet compete.

The transaction includes Nvidia's Spectrum networking gear, Connect X network adapters, and chips from Nvidia's $17 billion Groq licensing deal announced late last year. AWS will use a combination of seven different Nvidia chip types for inference workloads alone.

For anyone building enterprise AI systems, this is worth understanding. The era of 'buy some GPUs and figure it out' ended somewhere around 2024. What replaced it looks like this deal: specialised inference chips, purpose-built networking, and integration complexity that requires vendor relationships measured in years, not quarters.

What the deal actually includes

Neither company disclosed financial terms, though given that Nvidia's H100 GPUs retail around $25,000 to $40,000 depending on configuration and volume, a million units puts a floor estimate somewhere north of $25 billion, a speculative figure since enterprise pricing at this scale operates on different economics entirely.

The non-GPU hardware in the deal reveals more about where AI infrastructure is heading than the GPUs themselves, since Buck specifically mentioned Spectrum networking switches and Connect X adapters that handle data movement between GPUs, addressing the bottleneck that becomes acute once you scale past a few hundred cards and the networking layer becomes as much a constraint as the compute layer.

Inference is hard. It's wickedly hard. To be the best at inference, it is not a one chip pony. We actually use all seven chips.

— Ian Buck, VP Hyperscale Computing, Nvidia

The seven-chip comment is the most significant technical detail in the announcement, because inference workloads, which generate outputs from trained models, have fundamentally different hardware requirements than training: lower precision is often acceptable, latency matters more than throughput, and power efficiency becomes a primary metric rather than a secondary concern.

AWS appears to be building inference infrastructure that can route different workload types to different silicon. That is an architecture choice that makes sense at hyperscale but would be absurdly complex for most enterprises.

The strategic context

AWS has been investing heavily in custom silicon. Its Trainium AI accelerators are in third-generation development, with Trainium3 announced in December. CEO Andy Jassy said on the Q4 2025 earnings call that Trainium2 had its fastest-ever demand ramp, and Trainium3 supply would be fully committed by mid-2026.

Yet here AWS is, buying a million Nvidia GPUs, and the two investments are not contradictory because custom silicon works for workloads you can optimise specifically while general-purpose AI infrastructure requires the broad software ecosystem that CUDA provides. Nvidia's moat is not hardware performance so much as it is the accumulated weight of every ML library, framework, and training pipeline that assumes CUDA exists.

Dylan Patel

@dylan522p

𝕏

The AWS-NVIDIA deal confirms what we've been saying: custom silicon complements, doesn't replace, the NVIDIA stack. Hyperscalers need both. Everyone else needs to figure out where they sit.

Mar 20, 2026

The Nvidia deal also includes deploying Connect X and Spectrum X in AWS data centres. That is significant because AWS has spent years perfecting its internal networking stack. Bringing in external networking gear for specific AI workloads suggests the performance delta is large enough to justify the integration complexity.

What this means for enterprise tech buyers

If you are planning AI infrastructure for a company that is not a hyperscaler, this deal offers several lessons.

First, inference hardware is now a separate category. The assumption that you train on big GPUs and run inference on the same hardware is outdated. Dedicated inference silicon, whether Nvidia's Groq chips, Google's TPUs, or AWS's Inferentia, optimises for different metrics. Enterprises will increasingly rent inference-optimised instances rather than running inference on general-purpose GPU clusters.

Second, networking matters as much as compute at scale, and if you are running hundreds of GPUs for distributed training, your interconnect becomes a limiting factor. Solutions like NVLink, InfiniBand, and Spectrum networking exist to address this bottleneck, and while they are overkill below a certain scale, they become mandatory above it.

Third, the hyperscaler premium is justified for variable workloads, because buying and operating a million GPUs requires expertise that lives at maybe ten companies globally, and everyone else rents capacity from those companies, paying a margin that covers the operational complexity they do not have to maintain internally.

The market numbers

The GPU-as-a-Service market is projected to grow from $8.2 billion in 2025 to $26.6 billion by 2030, a 26.5% compound annual growth rate according to MarketsandMarkets. That figure counts only cloud GPU rental; it excludes on-premises deployments and custom silicon.

Nvidia CEO Jensen Huang has said the company sees a $1 trillion revenue opportunity through 2027 across its Rubin and Blackwell chip families, and the AWS deal fits within that framework by demonstrating that hyperscalers are not diversifying away from Nvidia but rather expanding their Nvidia footprint while simultaneously developing alternatives.

DW News on the broader AI infrastructure race and Nvidia's position

The practical implication for enterprise buyers is that GPU access will remain constrained through at least 2027, because AWS just locked in two years of supply and Microsoft, Google, and Oracle have similar agreements in place. The hyperscalers are not hoarding capacity to resell at inflated margins; they are securing supply because demand at current pricing exceeds what Nvidia can manufacture.

The inference complexity

Buck's comment about using seven different chips for inference deserves more attention, because inference workloads vary enormously in their requirements. A real-time chatbot needs sub-100-millisecond response times while a batch processing job running overnight can tolerate much higher latency in exchange for cost efficiency, and image generation has different memory bandwidth requirements than text generation.

The logical conclusion is that inference infrastructure will fragment into specialised tiers, where high-latency batch jobs run on cost-optimised hardware while real-time interactive applications run on latency-optimised hardware, and the software layer that routes requests to the appropriate tier becomes a differentiator.

AWS, Google, and Microsoft are all building that software layer in parallel, and enterprises that build their own AI infrastructure will need to solve the same routing problem, albeit at smaller scale.

What comes next

Nvidia's position is strong but not invulnerable, given that AMD's MI300X is gaining adoption, Google's TPUs power a meaningful share of internal Google AI workloads, and AWS's Trainium is cost-competitive for training specific model architectures.

The pattern that seems to be emerging is specialisation, where CUDA-compatible workloads run on Nvidia while workloads that can be optimised for specific silicon run on custom accelerators, and the boundary between those categories shifts as custom silicon matures and software frameworks develop alternative backends.

For founders and CTOs planning AI infrastructure, the actionable takeaway is straightforward: unless you are building something highly specific that justifies custom hardware investment, you will be renting from AWS, Azure, or Google for the next several years, and the economics favour that approach now that the hyperscalers have locked in the supply to make it work.

The AWS deal runs through 2027, by which time Nvidia's Rubin architecture will be shipping and AWS's Trainium4 will likely be in production, meaning the competitive landscape will look considerably different even as the underlying constant persists: AI infrastructure remains a capital-intensive, operationally complex domain where scale advantages compound over time.

SOURCES & CITATIONS

FREQUENTLY ASKED QUESTIONS

How many GPUs is Nvidia selling to AWS?

One million GPUs between 2026 and 2027, according to Nvidia VP Ian Buck. The deal also includes networking gear and inference chips.

How much is the Nvidia-AWS GPU deal worth?

Neither company disclosed financial terms. Based on enterprise H100 pricing of $25,000-$40,000 per unit, the GPU component alone could exceed $25 billion.

Does AWS use its own chips instead of Nvidia?

AWS develops Trainium and Inferentia chips for specific AI workloads but continues purchasing Nvidia GPUs for workloads requiring CUDA compatibility and broad framework support.

What is GPU-as-a-Service market size?

The GPU-as-a-Service market is projected to grow from $8.2 billion in 2025 to $26.6 billion by 2030, according to MarketsandMarkets.

Why does AWS need seven different chip types for inference?

Different inference workloads have different requirements: real-time chatbots need low latency, batch jobs optimise for cost, and image generation needs high memory bandwidth. Specialised chips address each use case.

Editor

The Bushletter editorial team. Independent business journalism covering markets, technology, policy, and culture.

Nvidia Confirms One Million GPU Sale to AWS by 2027

What the deal actually includes

The strategic context

What this means for enterprise tech buyers

The market numbers

The inference complexity

What comes next

Editor

Read Next

9 AEO agencies in Sydney worth shortlisting in 2026

Trump's 100% pharma tariffs hit Australia's $1.3bn export pipeline

Government denies SAS deployment to Middle East reported by News Corp

Five social media giants face A$49.5M fines over teen ban failures

The Morning Brief