By TechPowerUp
Publication Date: 2025-12-05 16:25:00
There are many libraries to help developers extract performance, such as NVIDIA CUDA-X and NVIDIA CUTLASS. CUDA Tile introduces a new way to program GPUs at a higher level than SIMT. With the evolution of computational workloads, especially in AI, tensors have become a fundamental data type. NVIDIA has developed specialized hardware to operate on tensors, such as NVIDIA Tensor Cores (TC) and NVIDIA Tensor Memory Accelerators (TMA), which are now integral to every new GPU architecture. With more complex hardware, more software is needed to help harness these capabilities. CUDA Tile abstracts away tensor cores and their programming models so that code using CUDA Tile is compatible with current and future tensor core architectures.
Tile-based programming…