Meta's Next-gen AI Chip Serves Up Ads While Sipping Power

After teasing its second-gen AI accelerator in February, Meta is ready to spill the beans on this homegrown silicon, which is already said to be powering ad recommendations in 16 regions.

The Facebook goliath has been designing custom accelerators for all manner of workloads ranging from video streaming to machine learning to drive the recommender models behind its advertising empire.

The latest addition to the Meta Training Inference Accelerator (MTIA) family claims a 3x higher performance and a 1.5x power efficiency advantage over the first-gen part, which our friends at The Next Platform analyzed last year.

According to Meta, the second-generation chip, which we’re going to call MTIA v2 for the sake of consistency, was designed to balance compute, memory capacity, and bandwidth to get the best possible performance for the hyperscaler’s internal ranking and recommender models.

Digging into the design, the accelerator features an 8×8 grid of processing elements (PEs) which together offer a 3.5x higher dense compute performance or 7x higher performance with sparsity enabled compared to MTIA v1.

Meta's AI accelerators are already powering the hyperscaler's ranking and recommender models

Meta’s latest AI accelerator, above, are already powering the hyperscaler’s ranking and recommender models – Click to enlarge. Source: Meta

Beyond using a smaller 5nm TSMC process node and boosting the clock speed from 800MHz to 1.35GHz, Meta notes several architectural and design improvements that contributed to the latest part’s performance gains. These include support for sparse computation, more on-die and off-die memory, and an upgraded network-on-chip (NoC) with twice the bandwidth of the old model. Here’s how the first and second generation compare:

	MTIA v1	MTIA v2
Process tech	7nm TSMC	5nm TSMC
Die area	373mm²	421mm²
PEs	8×8 grid	8×8 grid
Clock speed	800MHz	1.35GHz
INT8 perf	102 TOPS	354/708* TOPS
FP16/BF16 perf	51.2 TFLOPS	177/354* TFLOPS
PE mem	128KB per PE	384KB per PE
On-chip mem	128MB	256MB
Off-chip mem	64GB	128GB
Off-chip mem BW	176GB/s	204GB/s
Connectivity	8x PCIe Gen 4.0 – 16GB/s	8x PCIe Gen 5.0 – 32GB/s
TDP	25W	90W

* Sparse performance. You can find a full breakdown of both chips here.

It should be noted that the MTIA v2 won’t eliminate the web goliath’s need for GPUs. Meta supremo Mark Zuckerberg has previously said his mega-corporation will deploy 350,000 Nvidia H100 accelerators and will have the equivalent of 600,000 H100s operational by year’s end.

Instead, MTIA follows an increasingly familiar pattern for Meta (and others) of developing custom silicon tailored to specific tasks. The idea being that while the kit may not be as flexible as CPUs and GPUs, an ASIC when deployed at scale can be more efficient.

While the latest chip consumes nearly four times the power of its predecessor, it’s capable of producing up to 7x the floating point performance. Pitted against a GPU, Meta’s latest accelerator manages 7.8 TOPS per watt (TOPS/W), which as we discussed in our Blackwell coverage, beats out Nvidia’s H100 SXM at 5.65 TOPS/W and is more than twice that of the A100 SXM at 3.12 TOPS/W.

Having said that, it’s clear that Meta has gone to great lengths to size the chip to its internal workloads — namely inferencing on recommender models. These are designed to render personalized suggestions such as people you may know or, more importantly for Meta’s business model, which ads are most likely relevant to you.

The chips are also designed to scale out as needed and can be be deployed in a rack-based system containing 72 accelerators in total: Each system combines three chassis each containing 12 compute boards with two MTIA v2 chips per board.

Each MTIA v2 chassis contains 12 compute boards each containing a pair of accelerators

Each MTIA v2 chassis contains 12 compute boards each sporting a pair of accelerators … Click to enlarge. Source: Meta.

In terms of deploying workloads, Meta is leaning heavily on the PyTorch framework and Triton compiler. We’ve seen this combination used to perform tasks on various GPUs and accelerators, in part because it largely eliminates the need to develop code optimized for specific hardware.

Meta, has been a major proponent of PyTorch, which it developed before handing the reins over to the Linux Foundation, as it gives engineers the flexibility to develop AI applications that can run across a variety of GPU hardware from Nvidia and AMD. So it makes sense that Meta would want to employ the same technologies with its own chips.

In fact, Meta claims that by co-developing its software and hardware together it was able to achieve greater efficiency compared to existing GPUs platforms and expects to eke out even more performance through future optimizations.

MTIA v2 certainly won’t be the last silicon we see from Meta. The social media giant says it has several chip design programs underway, including one that will support future generative AI systems. ®

SEO Powered Content & PR Distribution. Get Amplified Today.
PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
Source: https://go.theregister.com/feed/www.theregister.com/2024/04/10/meta_mtia_chip/

Generative Data Intelligence

Meta’s next-gen AI chip serves up ads while sipping power

Holograph Pivots Into ‘Omnichain’ Gaming NFTs With New Funding – Decrypt

Italy’s Consob Expands Blacklist, Blocks 5 Financial Platforms

Latest Intelligence

Diary of a Transit Miracle

Bitcoin Token Runecoin Hits All-Time High Price Ahead of OKX Listing – Decrypt

What to Expect From the New Hong Kong Spot BTC and ETH ETFs

Ethereum transaction fees overtake Bitcoin as Runes speculation subsides

Hong Kong Prepares For Record-Breaking Debut Of Crypto ETFs As Issuers Expect $200-$300 Million Inflow

The VR Project SimuGaze Presale Goes Live Today