~ / cmdr2

Jan 17 17:19 2025

#rocm
#pytorch
#easydiffusion
#torchruntime

Continued in Part 2, where I figured out how to include the required libraries in the wheel. Spent all of yesterday trying to compile pytorch with the compile-time PYTORCH_ROCM_ARCH=gfx803 environment variable. tl;dr - In Part 1, I compiled wheels for PyTorch with ROCm, in order to add support for older AMD cards like RX 480. I managed to compile the wheels, but the wheel doesn’t include the required ROCm libraries. I figured that out in Part 2.

Jan 13 14:46 2025

#easydiffusion
#torchruntime
#torch
#ml

Spent the last few days writing torchruntime, which will automatically install the correct torch distribution based on the user’s OS and graphics card. This package was written by extracting this logic out of Easy Diffusion, and refactoring it into a cleaner implementation (with tests). It can be installed (on Win/Linux/Mac) using pip install torchruntime. The main intention is that it’ll be easier for developers to contribute updates (for e.g. for newer or older GPUs). It wasn’t easy to find or modify this code previously, since it was buried deep inside Easy Diffusion’s internals.

Jan 04 19:57 2025

#easydiffusion
#amd
#directml

Spent most of the day doing some support work for Easy Diffusion, and experimenting with torch-directml for AMD support on Windows. From the initial experiments, torch-directml seems to work properly with Easy Diffusion. I ran it on my NVIDIA card, and another user ran it on their AMD Radeon RX 7700 XT. It’s 7-10x faster than the CPU, so looks promising. It’s 2x slower than CUDA on my NVIDIA card, but users with NVIDIA cards are not the target audience of this change.

Jan 03 15:38 2025

#easydiffusion
#ui
#v4

Spent a few days prototyping a UI for Easy Diffusion v4. Files are at this repo. The main focus was to get a simple but pluggable UI, that was backed by a reactive data model, and to allow splitting the codebase into individual components (with their own files). And require only a text editor and a browser to develop, i.e. no compilation or nodejs-based developer experiences.

Dec 17 11:03 2024

#easydiffusion
#v4
#ui

Notes on two directions for ED4’s UI that I’m unlikely to continue on. One is to start a desktop app with a full-screen webview (for the app UI). The other is writing the tabbed browser-like shell of ED4 in a compiled language (like Go or C++) and loading the contents of the tabs as regular webpages (by using webviews). So it would load URLs like http://localhost:9000/ui/image_editor and http://localhost:9000/ui/settings etc.

Dec 14 19:47 2024

#easydiffusion
#ui
#design
#v4

Worked on a few UI design ideas for Easy Diffusion v4. I’ve uploaded the work-in-progress mockups at https://github.com/easydiffusion/files. So far, I’ve mocked out the design for the outer skeleton. That is, the new tabbed interface, the status bar, and the unified main menu. I also worked on how they would look like on mobile devices. It gives me a rough idea of the Vue components that would need to be written, and the surface area that plugins can impact. For e.g. plugins can add a new menu entry only in the Plugins sub-menu.

Nov 21 15:17 2024

#easydiffusion
#stable-diffusion
#c++

Spent some more time on the v4 experiments for Easy Diffusion (i.e. C++ based, fast-startup, lightweight). stable-diffusion.cpp is missing a few features, which will be necessary for Easy Diffusion’s typical workflow. I wasn’t keen on forking stable-diffusion.cpp, but it’s probably faster to work on a fork for now. For now, I’ve added live preview and per-step progress callbacks (based on a few pending pull-requests on sd.cpp). And protection from GGML_ASSERT killing the entire process. I’ve been looking at the ability to load individual models (like the vae) without needing to reload the entire SD model.

Nov 19 19:18 2024

#easydiffusion
#stable-diffusion

Spent a few days getting a C++ based version of Easy Diffusion working, using stable-diffusion.cpp. I’m working with a fork of stable-diffusion.cpp here, to add a few changes like per-step callbacks, live image previews etc. It doesn’t have a UI yet, and currently hardcodes a model path. It exposes a RESTful API server (written using the Crow C++ library), and uses a simple task manager that runs image generation tasks on a thread. The generated images are available at an API endpoint, and it shows the binary JPEG/PNG image (instead of base64 encoding).

Oct 16 18:10 2024

#stable-diffusion
#c++
#cuda
#easydiffusion
#lab
#performance
#featured

tl;dr - Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful! Part 1: Using sd.cpp as a library First, I tried calling the stable-diffusion.cpp library from a simple C++ program (which just loads the model and renders an image). Via dynamic linking. That worked, and its performance was the same as the example sd.exe CLI, and it detected and used the GPU correctly.

Sep 04 15:20 2024

#easydiffusion
#ai
#lab
#performance
#featured

tl;dr: Explored a possible optimization for Flux with diffusers when using enable_sequential_cpu_offload(). It did not work. While trying to use Flux (nearly 22 GB of weights) with diffusers on a 12 GB graphics card, I noticed that it barely used any GPU memory when using enable_sequential_cpu_offload(). And it was super slow. It turns out that the largest module in Flux’s transformer model is around 108 MB, so because diffusers streams modules one-at-a-time, the peak VRAM usage never crossed above a few hundred MBs.

~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink