GPU Compute Engineer
Family: Low-level & domain-heavy
Programs GPUs for general-purpose workloads โ ML training, scientific simulation, and data processing โ squeezing maximum throughput from parallel hardware.
Day to day
Writes CUDA or ROCm kernels, profiles memory bandwidth and compute utilization, optimizes data layouts for coalesced access, and integrates GPU kernels into training frameworks.
Core skills
- CUDA/ROCm
- GPU architecture
- kernel optimization
- parallel algorithms
- C++