AI Chips

AI Chips

Since AI is the current megatrend, I think it will be good to understand it more on the AI Chips that power it. And I came across this interesting YouTube video by CNBC on “How Nvidia GPUs Compare To Google’s And Amazon’s AI Chips” After watching it, I did a little more research on it to learn. Do take note that my research is only meant for education; I cannot confirm what I research is 100% accurate.

  • Nvidia’s GPUs still dominate AI compute, but
  • Every big cloud (“hyperscaler”) is now designing its own AI ASICs (TPU, Trainium, MTIA, Maia) to cut costs and reduce dependence on Nvidia.

1. The main AI chip types in the video

1. GPUs (Graphics Processing Units)

  • General-purpose parallel processors, originally for graphics, now heavily used for AI.
  • Nvidia H100/H200/B200/Blackwell etc are the current data-center monsters.
  • AMD (MI300 series) and Intel (Gaudi accelerators) are the main “other” players.

2. Custom AI ASICs / Accelerators
(“ASIC” = application-specific integrated circuit. TPUs, Trainium, MTIA, Maia all fall into this bucket.)

  • Google TPU (Tensor Processing Unit) – custom accelerator for training/inference.
  • AWS Trainium & Inferentia – Amazon’s own AI chips for training (Trainium) and inference (Inferentia).
  • Meta MTIA – Meta Training and Inference Accelerator, optimized mainly for recommendation workloads.
  • Microsoft Maia 100 (and future Maia 200) – Microsoft’s custom AI accelerator for Azure, especially to run OpenAI/Copilot models.

A nice way one TD Securities piece framed it: it’s not just “GPU vs ASIC” but merchant chips (like Nvidia GPUs) vs custom in-house chips (TPU, Trainium, MTIA, Maia, etc.).

3. NPUs (Neural Processing Units)

  • Generic name for AI accelerators, especially at the edge (phones, PCs, cameras).
  • Apple Neural Engine inside A- and M-series chips.
  • Qualcomm Hexagon NPU in Snapdragon SoCs.
  • Samsung NPU in Exynos.
  • These are mostly used for on-device AI (camera features, voice, small LLMs on phones/PCs), while the video is mainly about data-center GPUs/ASICs.

2. Who makes what, and who uses it?

A. Nvidia GPUs

What chips?

  • Data-center GPUs like H100, H200, B200, Blackwell GB200/GB300, etc.

Who designs them?

  • Nvidia (NVDA) – fabless designer.

Who fabs/supplies them?

  • Fabrication: primarily TSMC on advanced nodes (e.g., 4N / N5 / N4P). (That’s industry-standard info, though Nvidia’s own PR focuses on the architecture rather than naming the foundry.)
  • Server & system partners: Dell, HPE, Supermicro, Foxconn, etc. Example: Dell + CoreWeave are first to deploy Nvidia Blackwell Ultra at scale.

Who uses them?

  • All 4 major U.S. hyperscalers: AWS, Microsoft Azure, Google Cloud, Oracle Cloud embed Nvidia’s latest Blackwell GPUs.
  • Large specialized GPU cloud providers: CoreWeave, GMI Cloud (new Taiwan AI DC with 7,000 Blackwell GPUs)
  • Big AI labs and platforms: OpenAI, Anthropic, Meta, many others – heavy users of Nvidia data-center GPUs today.

B. Google – TPUs (custom ASIC)

What chips?

  • Generations of TPU (v2, v3, v4, v5e, v5p, Trillium, Ironwood).
  • TPU v5p is their most powerful training chip; large pods with 8,960 chips.
  • Ironwood TPU is tuned for large-scale inference.

Who designs them?

  • Google / Alphabet (GOOGL).

Who fabs/supplies?

  • Fabrication at external foundries – widely reported to be TSMC (and some collaboration with Broadcom on parts of the TPU stack), though Google’s public docs just say “custom chips” without naming the fab.

Who uses them?

  • Internally: Google uses TPUs to power Search, Photos, Maps, Gemini, etc., serving over a billion users.
  • Externally: Google Cloud TPU is offered to enterprise customers training/inferencing large models.

Key angle:

  • TPUs reduce Google’s long-term cost per token versus renting only Nvidia GPUs, but they co-exist with Nvidia hardware in Google data centers.

C. Amazon – Trainium & Inferentia (custom ASIC)

What chips?

  • AWS Trainium – for training.
  • AWS Inferentia (1 & 2) – for inference.

Who designs them?

  • Amazon / AWS (AMZN) – in-house chips.

Who fabs/supplies?

  • Fabricated by top foundries (industry reporting points strongly to TSMC; AWS does not always state it explicitly in marketing).

Who uses them?

  • AWS itself in its own services.
  • AWS customers via Trn1 (Trainium) and Inf1/Inf2 (Inferentia) EC2 instances.
  • Notable customer/use: Anthropic and Claude models are believed to be trained on Trainium2-based infrastructure as part of Amazon’s strategic deal.

Key angle:

  • Amazon is pushing a cheaper-than-GPU story (better price-performance than GPU instances) to win AI workloads onto its own silicon.

D. Meta – MTIA (custom ASIC)

What chips?

  • MTIA v1, now MTIA 2i (second-gen), optimized mainly for recommendation / ranking inference workloads.

Who designs them?

  • Meta Platforms (META).

Who fabs/supplies?

  • MTIA chips are co-developed with Broadcom and fabricated by TSMC according to recent reporting.

Who uses them?

  • Meta’s own data centers: MTIA is rolled out alongside Nvidia GPUs to handle recommendation models and some AI inference, lowering TCO versus GPUs.

Key angle:

  • Meta is heavily investing in custom silicon and even moving to acquire RISC-V AI chip startup Rivos to accelerate this, specifically to reduce Nvidia reliance.

E. Microsoft – Maia (custom ASIC) + still using Nvidia/AMD

What chips?

  • Maia 100 – first-gen AI accelerator for Azure, running OpenAI models and Copilot workloads.
  • Maia 200 (Braga) – next-gen, with mass production now delayed to ~2026.
  • Cobalt 100 – custom Arm-based CPU for general cloud workloads (not an AI accelerator, but part of the Azure custom silicon stack).

Who designs them?

  • Microsoft (MSFT).

Who fabs/supplies?

  • Maia 100 uses TSMC N5 process and advanced COWOS-S packaging.

Who uses them?

  • Azure, for internal and customer workloads – especially OpenAI models and Copilot.
  • But Microsoft is still a huge buyer of Nvidia GPUs and AMD MI300 in parallel.

Key angle:

  • Maia is about long-term cost and control for Microsoft, but does not eliminate the need for Nvidia/AMD in the near term.

F. NPUs & edge AI (high-level)

These weren’t the star of the CNBC video but are important for the broader “which companies to watch” question:

  • Apple (AAPL): Neural Engine in A- and M-series SoCs, used in iPhones, Macs, iPads – on-device AI.
  • Qualcomm (QCOM): Hexagon NPU in Snapdragon; now very AI-heavy marketing for phones and “AI PCs.”
  • Samsung (005930.KS): Exynos with integrated NPU.

Key angle:

  • This is the edge AI growth story – every high-end device will want an NPU.

3. When it Group into Category

Category 1 – Merchant AI chip & platform vendors

  • Nvidia (NVDA) – dominant merchant AI GPU/accelerator supplier; huge data-center revenue, enormous AI backlog.
  • AMD (AMD) – second-source high-end GPUs (MI300 series) especially at Microsoft/Azure & others.
  • Intel (INTC) – Gaudi AI accelerators and CPUs in data centers.
  • Networking & packaging partners: Broadcom (AVGO), Marvell (MRVL) etc., which provide high-speed interconnects and co-design for some custom chips (TPUs, MTIA).

Category 2 – Hyperscalers with in-house AI ASICs

  • Alphabet (GOOGL) – TPUs (training/inference), used internally and in Google Cloud.
  • Amazon (AMZN) – Trainium & Inferentia in AWS, plus a big Anthropic partnership that validates them.
  • Microsoft (MSFT) – Maia accelerators for Azure & OpenAI, Cobalt CPUs, plus big Nvidia & AMD deployments.
  • Meta (META) – MTIA chips and planned Rivos acquisition to deepen internal silicon capabilities.
  • Oracle (ORCL) – still reliant on Nvidia Blackwell for its AI cloud, no large in-house AI ASIC yet, but strongly tied to Nvidia.

These companies are both customers of Nvidia and potential long-term competitors via their own ASICs.

Category 3 – Foundries & manufacturing ecosystem

  • TSMC (TSM) – the common denominator for many of these chips (Nvidia GPUs, Google TPUs, AWS Trainium/Inferentia, Microsoft Maia, Meta MTIA).
  • Samsung Electronics, Intel Foundry – also chasing advanced AI chip fab business.

Category 4 – Edge AI & NPUs

  • Apple (AAPL) – on-device Neural Engine.
  • Qualcomm (QCOM) – Snapdragon NPUs for phones and AI PCs.
  • Samsung (005930.KS) – Exynos NPUs, plus memory supply to others.