High-Bandwidth Memory (HBM) Is Sold Out. So What’s Next? (2024)

By Mark LaPedus

High-bandwidth memory (HBM), a chip technology that plays a central role in today’s AI systems, is expected to remain in short supply for the foreseeable future, possibly until 2026.

Two HBM suppliers, Micron and SK Hynix, are sold out of these memory chip products. Samsung, the other HBM supplier, is struggling with the technology. Besides the supply issues, prices are increasing for DRAMs, the key chips used for HBM.

A DRAM is a memory chip, which is widely used in computers, smartphones and other products. DRAMs provide the main memory functions in systems. In operation, these chips enable a quick and short-term access to the data.

Memory companies manufacture DRAM chips in large facilities called fabs. For HBM, a DRAM supplier takes a standard DRAM, and vertically stacks four, six or eight of these memory chips on top of each other. The DRAM dies are connected using tiny vertical wires called through-silicon vias (TSVs). The TSVs provide the electrical connections between each die.

HBM is a specialized memory device used in high-end systems, such as servers and supercomputers. In a high-end system, a processor chip and an HBM device are often assembled together and electrically connected in a unit called an IC package. In operation, HBM enables the data to transfer between the memory and processor at high speeds. HBM employs a 1024-bit memory bus, which is wider than standard DRAM. The wide bus enables high bandwidth in systems.

High-Bandwidth Memory (HBM) Is Sold Out. So What’s Next? (1)

Two views of HBM. Bottom diagram: An HBM DRAM stack and a processor are situated on an interposer in the same package. HBM and the processor are electrically connected to each other. Top diagram: A high-level view of an HBM DRAM stack connected to the processor. Source: Rambus

HBM isn’t new. Used in servers for several years, HBM was once viewed an expensive niche technology. HBM is still expensive today, but thanks to AI, the technology is moving into the limelight.

The latest AI technology, called generative AI (GAI), is taking off. In large data centers, specialized AI servers are running the latest GAI algorithms like ChatGPT. AI servers incorporate high-speed AI chips, known as GPUs, from Nvidia and others. Nvidia’s latest GPU architecture utilizes eight HBM stacks.

Demand for Nvidia’s GPUs, as well as HBM, is skyrocketing. Nvidia doesn’t make DRAM or HBM, but rather these devices are manufactured and supplied by three memory companies--Micron, Samsung and SK Hynix.

Thanks to the AI boom, South Korea’s SK Hynix, the world’s largest HBM supplier, is sold out of these products for 2024 and most of 2025. The same is true for U.S.-based Micron. Micron and SK Hynix are the key HBM suppliers to Nvidia.

South Korea’s Samsung is also shipping HBM. So far, though, Samsung’s HBMs have been unable to pass the certification process at Nvidia. Samsung, according to reports, is struggling with various yields issues with HBM. “I expect Samsung to contain the issues through screening short term, and then to resolve the issues, and for Samsung to be a significant supplier of HBM to Nvidia long term,” said Mark Webb, a principal/consultant at MKW Ventures Consulting.

Besides the supply issues, other events are taking place in the HBM world, including:

*Suppliers are selling HBM devices, which stack up to eight DRAM dies on top of each other. Vendors are ramping up 12-die high HBM products with 16-die in R&D.

*Today’s HBM devices are based on the HBM3E standard. HBMs based on the next-generation HBM4 standard are in R&D.

*China, which is behind in HBM, is working on the technology.

That’s still not fast enough. In a system, the goal is to move the data to and from the processor and memory. But as the system becomes inundated with more data, the data transfer rates may take too long, causing unwanted latency.

With all of this in mind, CSPs face a number of challenges. CSPs must not only keep up with the onslaught of data, but they need to deal with the power consumption problems.

So, what are the solutions? CSPs could upgrade the data center with new servers, networks and cooling systems. They also need help from the semiconductor industry. In response, DRAM vendors have developed new and advanced products at lower power.

Another solution is to move the DRAM closer to the processor, thereby speeding up the data transfer rates. For years, the semiconductor industry has been developing a variety of solutions here. For example, several vendors are working on compute-in-memory (CIM) or processor-in-memory (PIM) devices. The idea is to combine the processing and memory functions in the same chip. But that’s been a difficult technology to conquer and is still a work in progress.

Several years ago, the industry came up with another idea—assemble logic and memory dies in the same IC package. A package is a product that encapsulates the dies, and protect them from harsh operating conditions.

Over the years, the industry has developed a plethora of IC package types. One type, called 2.5D, appears to be the winner for high-end systems, at least for now. In one 2.5D configuration, an FPGA or processor die is placed on an interposer in the package. Then, a compute die is placed on the interposer, next to the FPGA or processor in the same package.

A stack of DRAM dies is placed on top of the compute die. The DRAM stack is known as HBM. The interposer, which acts like a silicon interconnection pad, consists of TSVs. The TSVs provide the connections between the FPGA or processor to the board. (See diagrams above)

In the 2.5D package, the chips operate as though it’s a single device. Plus, the FPGA or processor is next to the memory, enabling faster data access between the two chips. This is accomplished by tightly coupling the HBM memory with the compute die via a distributed interface. “This interface is split into multiple independent channels, which may operate asynchronously,” according to Synopsys in a blog. “Each channel interface features two pseudo-channels with a 32-bit data bus operating at double data rate (DDR). Every channel provides access to a distinct set of DRAM banks.”

In 2013, SK Hynix introduced the first HBM device. The first HBMs were based on the HBM1 standard. Specified at a 1 Gb/s data rate, HBM1 allowed vendors to stack up to eight 16Gb DRAM dies.

HBM2, the second HBM standard, appeared in 2016, followed by HBM3 in 2022. Today, vendors are shipping HBM devices based on the HBM3E standard. Specified at a 9.6 Gb/s data rate, HBM3E enables vendors to stack 16 32Gb DRAM dies.

“HBM3E/3 decreases core voltage to 1.1V from HBM2E’s 1.2V,” according to Rambus in a blog. “Also, HBM3 reduces IO signaling to 400mV from the 1.2V used in HBM2E. Lower voltages translate to lower power. These changes help offset the higher power consumption inherent in moving to higher data rates.”

JEDEC will soon release the specifications for the next-generation HBM4 standard. HBM4 products are due out in 2026.

High-Bandwidth Memory (HBM) Is Sold Out. So What’s Next? (2)

HBM specs. Source: Rambus

HBM supply and demand

Thanks to Nvidia, demand is overwhelming for HBM. In 2022, Nvidia introduced the H100, a GPU that consist of 80 billion transistors. The H100 architecture includes five HBM sites. Today, Nvidia is ramping up Blackwell, a new GPU architecture that consists of 208 billion transistors. This architecture makes use of eight HBMs.

With each new product introduction, Nvidia is increasing the HBM memory capacity. “Nvidia is releasing a new product or architecture every year. With these releases, they are approximately doubling the memory each time,” said MKW Ventures’ Webb.

To meet the demand, DRAM suppliers are expanding their production of HBM products. By year’s end, the DRAM industry is expected to allocate roughly 250K/m (14%) of total fab capacity for HBM, representing annual supply bit growth of 260%, according to TrendForce.

For now, though, suppliers are sold out of HBM products. “I am not convinced that ‘sold out’ is the best term. I would say that companies have orders through ‘25. They have orders for everything they can make for the next two years,” said MKW Ventures’ Webb.

The demand is driving up prices for server DRAM. Average contract prices for server DRAM are expected to rise by 8%-13% in the third quarter of 2024, compared to the previous period, according to TrendForce.

This could all change. “I would say there’s still some possibility that (HBM) orders won’t materialize and that people are pre-ordering based on plans and potentially double ordering or overstocking,” said MKW Ventures’ Webb.

There is another scenario. If the AI market slows, Meta, Microsoft and others could push out--or even halt--their server orders, which would impact chip vendors.

HBM or bust

As stated, memory makers produce DRAMs in a fab using various process steps. At each new generation, the DRAM is becoming more difficult to manufacture in the fab. The challenge is to reduce the size of the tiny capacitor in the DRAM at each generation.

For HBM, memory makers take standard DRAM, and then stack the devices and connect them using TSVs. To make HBM devices, the process steps are different than making DRAM.

The HBM process steps are well known and established. But at each generation, memory makers are dealing with new and more complex DRAMs. That adds complexity to the HBM manufacturing process.

Plus, the die sizes for HBM are 35%-45% larger than DDR5 DRAM, according to Avril Wu, an analyst at TrendForce. In addition, the yield rates for HBM are 20%-30% lower than DDR5 DRAM, Wu said.

So, who’s doing what in HBM? Today, Micron, Samsung and SK Hynix are shipping HBM3E products that support 8 DRAM die stacks. Vendors are now sampling HBMs with 12 DRAM die stacks. The 12-die products present some challenges. “The thermal issues on eight are very challenging. The 12-stack products will be even more challenging, but are being addressed,” Webb said.

Meanwhile, last year, SK Hynix announced what the company claimed was the industry’s first 12-layer HBM3E product with a 24GB memory capacity.

Earlier this year, Samsung began sampling a 12-stack HBM3E product. With a capacity of 36GB, Samsung’s product provides a bandwidth of up to 1,280 gigabytes per second (GB/s).

Not to be outdone, Micron is sampling a 36GB 12-die HBM3E, which is set to deliver greater than 1.2 TB/s performance. In the meantime, Micron is seeing huge demand for its current HBM products.

“Our HBM shipment ramp began in fiscal Q3, and we generated over $100 million in HBM3E revenue in the quarter,” said Sanjay Mehrotra, president and chief executive at Micron, in a recent conference call. “We expect to generate several hundred million dollars of revenue from HBM in fiscal 2024 and multiple billions of dollars in revenue from HBM in fiscal 2025.”

Others are also pursuing HBM, particularly China, which doesn’t have a presence here. ChangXin Memory Technologies (CSMT) and Huawei are separately developing HBM, according to reports. China appears to be developing older HBM2 products.

So where is this all leading? “We are in the early innings of a multiyear race to enable artificial general intelligence, or AGI, which will revolutionize all aspects of life. Enabling AGI will require training ever-increasing model sizes with trillions of parameters and sophisticated servers for inferencing. AI will also permeate to the edge via AI PCs and AI smartphones, as well as smart automobiles and intelligent industrial systems. These trends will drive significant growth in the demand for DRAM and NAND,” Micron’s Mehrotra said.

Let’s not forget GPUs and other chips too. The question is how long the AI boom will last and what are the future ramifications of the technology.