Skip to main content

One post tagged with "PTX"

View All Tags

[A] Demystifying NVIDIA Ampere Architecture: Notes

· 7 min read
The CUDA Cache Maintainer

My notes on the article Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis. I prefer to use it as a datasheet. You can find:

  • The relation between the number of instructions and the average cycles for ADD.U32 instruction (This reveal the existence of addition hardware pipeline)
  • The CPI for dependent and independent instructions
  • The Tensor Cores Latencies and Throughput
  • The memory accesses latencies
  • Instructions Clock Cycles for the (Ampere A100) GPU