Skip to main content

[WP] L2 Cache and DRAM Architecture: Summary

· 2 min read
The CUDA Cache Maintainer

This blog summarizes basic architectural information of Device Memory and L2 Cache from NVIDIA's

The global and local memory areas accessed by CUDA programs reside in HBM memory space, i.e., “device memory”.

  • Constant memory space resides in device memory and is cached in the constant cache.
  • Texture and surface memory spaces reside in device memory. They are cached in texture cache.
  • The Level 2 (L2) cache caches reads from and writes to HBM (device) memory. It services memory requests from various subsystems within the GPU.

HBM and L2 memory spaces are accessible to all SMs and all applications running on the GPU.

Device Memory (DRAM) Overview

Ampere (SXM4)Hopper (SXM5)Hopper (PCIe)
DRAM40GB (HBM2, 5 stacks, 8 memory dies per stack)80GB (HBM3, 5 stacks)80GB (HBM2e, 5 stacks)
Data Rate1215 MHz DDR2619 MHz DDR1593 MHz DDR
Bandwidth1555 GB/sec3352 GB/sec2039 GB/sec

For more, please check "H100 HBM and L2 Cache Memory Architectures" section of Hopper Whitepaper, Hopper Architecture In-depth and Hopper Architecture In-depth.

L2 Cache

Ampere (SXM4)Hopper (SXM5)Hopper (PCIe)
Cache Size40MB50MB50MB
OrganizationThe L2 cache is divided into two partitions to enable higher bandwidth and lower latency memory access. Each L2 partition localizes and caches data for memory accesses from SMs in the GPCs directly connected to the partition.
Each L2 cache partition is divided into 40 L2 cache slices. Eight 512 KB L2 slices are associated with each memory controller.
Partitioned Crossbar but not necessarily 2-way split.Partitioned Crossbar but not necessarily 2-way split.
Read Bandwidth5120 Bytes/clkUnknownUnknown
Data CompressionThe NVIDIA Ampere architecture adds Compute Data Compression to accelerate unstructured sparsity and other compressible data patterns.SupportedSupported

For detailed info, please refer to whitepapers.