Skip to main content

2 posts tagged with "Instruction Analysis"

View All Tags

[A] Dissecting the Volta Architecture: Notes

· 5 min read
The CUDA Cache Maintainer

My notes on the article Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. The Volta architecture fundamentally changes how AI computes. There are so many designs that still make an impact on latest architectures. I will dive into these same-in-Volta topics mentioned in Dissecting the NVIDIA Ampere GPU Architecture via Microbenchmarking:

  • Instruction Encoding
  • Dual-Port Register

[A] Demystifying NVIDIA Ampere Architecture: Notes

· 7 min read
The CUDA Cache Maintainer

My notes on the article Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis. I prefer to use it as a datasheet. You can find:

  • The relation between the number of instructions and the average cycles for ADD.U32 instruction (This reveal the existence of addition hardware pipeline)
  • The CPI for dependent and independent instructions
  • The Tensor Cores Latencies and Throughput
  • The memory accesses latencies
  • Instructions Clock Cycles for the (Ampere A100) GPU