2 posts tagged with "Instruction Analysis"

[A] Dissecting the Volta Architecture: Notes

May 13, 2026 · 5 min read

The CUDA Cache Maintainer

My notes on the article Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. The Volta architecture fundamentally changes how AI computes. There are so many designs that still make an impact on latest architectures. I will dive into these same-in-Volta topics mentioned in Dissecting the NVIDIA Ampere GPU Architecture via Microbenchmarking:

May 9, 2026 · 7 min read

The CUDA Cache Maintainer

The relation between the number of instructions and the average cycles for ADD.U32 instruction (This reveal the existence of addition hardware pipeline)
The CPI for dependent and independent instructions
The Tensor Cores Latencies and Throughput
The memory accesses latencies
Instructions Clock Cycles for the (Ampere A100) GPU