// Cross-posted from Easy Diffusion’s blog. Continued from Part 1. Spent a few days figuring out how to compile binary wheels of PyTorch and include all the necessary libraries (ROCm libs or CUDA libs). tl;dr - In Part 2, the compiled PyTorch wheels now include the required libraries (including ROCm). But this isn’t over yet. Torch starts now, but adding two numbers with it produces garbage values (on the GPU). There’s probably a bug in the included ROCBLAS version, might need to recompile ROCBLAS for gfx803 separately. Will tackle that in Part 3 (tbd).