~ / cmdr2

Tue Feb 18 16:44 2025

#ggml

// Part 2 in the “Simple introduction to ggml” series. At the end of Part 1, we learnt how to keep the model weights separate from temporary computation-only tensor variables. This allowed the model weights to stay in memory across multiple predictions (which is the usual behavior of machine learning programs during inference). Now let’s modify that to build a simple Neural Network model using ggml. If you’re new to ggml, I recommend reading Part 1 first.

Mon Feb 17 20:00 2025

#ggml

A simple introduction to ggml. // This is Part 1 in a series on ggml. You can read Part 2 after this one. This post uses the new “backend” API in ggml. I wrote this to explain ggml to myself. I’m still learning about it, so please feel free to suggest any corrections! Overall flow of a ggml program At a very high-level, a ggml program has the following steps: Define the tensor variables

Mon Feb 10 11:27 2025

#easydiffusion
#sdkit
#amd
#torchruntime
#windows
#intel
#integrated
#directml

// Cross-posted from Easy Diffusion’s blog. Easy Diffusion (and sdkit) now also support AMD on Windows automatically (using DirectML), thanks to integrating with torchruntime. It also supports integrated GPUs (Intel and AMD) on Windows, making Easy Diffusion faster on PCs without dedicated graphics cards.

Mon Feb 10 11:23 2025

#easydiffusion
#torchruntime
#sdkit

// Cross-posted from Easy Diffusion’s blog. Spent the last week or two getting torchruntime fully integrated into Easy Diffusion, and making sure that it handles all the edge-cases. Easy Diffusion now uses torchruntime to automatically install the best-possible version of torch (on the users’ computer) and support a wider variety of GPUs (as well as older GPUs). And it uses a GPU-agnostic device API, so Easy Diffusion will automatically support additional GPUs when they are supported by torchruntime.

Mon Jan 27 21:01 2025

#easydiffusion
#sdkit

// Cross-posted from Easy Diffusion’s blog. Worked on adding support for DirectML in sdkit. This allows AMD GPUs and Integrated GPUs to generate images on Windows. DirectML seems like it’s really inefficient with memory though. So for now it only manages to generate images using SD 1.5. XL and larger models fail to generate, even though I have a 12 GB of VRAM in my graphics card.

Wed Jan 22 17:19 2025

#rocm
#pytorch
#easydiffusion
#torchruntime

// Cross-posted from Easy Diffusion’s blog. Continued from Part 1. Spent a few days figuring out how to compile binary wheels of PyTorch and include all the necessary libraries (ROCm libs or CUDA libs). tl;dr - In Part 2, the compiled PyTorch wheels now include the required libraries (including ROCm). But this isn’t over yet. Torch starts now, but adding two numbers with it produces garbage values (on the GPU). There’s probably a bug in the included ROCBLAS version, might need to recompile ROCBLAS for gfx803 separately. Will tackle that in Part 3 (tbd).

Fri Jan 17 17:19 2025

#rocm
#pytorch
#easydiffusion
#torchruntime

// Cross-posted from Easy Diffusion’s blog. Continued in Part 2, where I figured out how to include the required libraries in the wheel. Spent all of yesterday trying to compile pytorch with the compile-time PYTORCH_ROCM_ARCH=gfx803 environment variable. tl;dr - In Part 1, I compiled wheels for PyTorch with ROCm, in order to add support for older AMD cards like RX 480. I managed to compile the wheels, but the wheel doesn’t include the required ROCm libraries. I figured that out in Part 2.

Mon Jan 13 14:46 2025

#easydiffusion
#torchruntime
#torch
#ml

// Cross-posted from Easy Diffusion’s blog. Spent the last few days writing torchruntime, which will automatically install the correct torch distribution based on the user’s OS and graphics card. This package was written by extracting this logic out of Easy Diffusion, and refactoring it into a cleaner implementation (with tests). It can be installed (on Win/Linux/Mac) using pip install torchruntime. The main intention is that it’ll be easier for developers to contribute updates (for e.g. for newer or older GPUs). It wasn’t easy to find or modify this code previously, since it was buried deep inside Easy Diffusion’s internals.

Sat Jan 04 19:57 2025

#easydiffusion
#amd
#directml

// Cross-posted from Easy Diffusion’s blog. Spent most of the day doing some support work for Easy Diffusion, and experimenting with torch-directml for AMD support on Windows. From the initial experiments, torch-directml seems to work properly with Easy Diffusion. I ran it on my NVIDIA card, and another user ran it on their AMD Radeon RX 7700 XT. It’s 7-10x faster than the CPU, so looks promising. It’s 2x slower than CUDA on my NVIDIA card, but users with NVIDIA cards are not the target audience of this change.

Fri Jan 03 15:38 2025

#easydiffusion
#ui
#v4

// Cross-posted from Easy Diffusion’s blog. Spent a few days prototyping a UI for Easy Diffusion v4. Files are at this repo. The main focus was to get a simple but pluggable UI, that was backed by a reactive data model, and to allow splitting the codebase into individual components (with their own files). And require only a text editor and a browser to develop, i.e. no compilation or nodejs-based developer experiences.

~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink