~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  • #easydiffusion
  • #admin
  • #worklog

// Cross-posted from Easy Diffusion’s blog. Cleared the backlog of stale issues on ED’s github repo. This brought down the number of open issues from ~350 to 74. A number of those suggestions and issues are already being tracked on my task board. The others had either been fixed, or were really old (i.e. not relevant to reply anymore). While I’d have genuinely wanted to solve all of those unresolved issues, I was on a break from this project for nearly 1.5 years, so unfortunately it is what it is.

  • #tensorRT
  • #torch
  • #easydiffusion
  • #ggml
  • #cuda
  • #vulkan

// Cross-posted from Easy Diffusion’s blog. Experimented with TensorRT-RTX (a new library offered by NVIDIA). The first step was a tiny toy model, just to get the build and test setup working. The reference model in PyTorch: import torch import torch.nn as nn class TinyCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 8, 3, stride=1, padding=1) self.relu = nn.ReLU() self.pool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(8, 4) # 4-class toy output def forward(self, x): x = self.relu(self.conv(x)) x = self.pool(x).flatten(1) return self.fc(x)I ran this on a NVIDIA 4060 8 GB (Laptop) for 10K iterations, on Windows and WSL-with-Ubuntu, with float32 data.

  • #easydiffusion
  • #blog

// Cross-posted from Easy Diffusion’s blog. Development update for Easy Diffusion - It’s chugging along in starts and stops. Broadly, there are three tracks: Maintenance: The past few months have seen increased support for AMD, Intel and integrated GPUs. This includes AMD on Windows. Added support for the new AMD 9060/9070 cards last week, and the new NVIDIA 50xx cards in March. Flux to the main branch / release v3.5 to stable: Right now, Flux / v3.5 still requires you to enable ED beta first. And then install Forge. Last week I got Flux working in our main engine (with decent rendering speed). It still needs more work to support all the different models formats for Flux. Using Forge was a temporary arrangement, until Flux worked in our main engine.

  • #easydiffusion

// Cross-posted from Easy Diffusion’s blog. Upgraded the default version of Easy Diffusion to Python 3.9. Newer versions of torch don’t support Python 3.8, so this became urgent after the release of NVIDIA’s 50xx series GPUs. I choose 3.9 as a temporary fix (instead of a newer Python version), since it had the least amount of package conflicts. The future direction of Easy Diffusion’s backend is unclear right now - there are a bunch of possible paths. So I didn’t want to spend too much time on this. I also wanted to minimize the risk to existing users.

  • #easydiffusion
  • #sdkit
  • #amd
  • #torchruntime
  • #windows
  • #intel
  • #integrated
  • #directml

// Cross-posted from Easy Diffusion’s blog. Easy Diffusion (and sdkit) now also support AMD on Windows automatically (using DirectML), thanks to integrating with torchruntime. It also supports integrated GPUs (Intel and AMD) on Windows, making Easy Diffusion faster on PCs without dedicated graphics cards.

  • #easydiffusion
  • #torchruntime
  • #sdkit

// Cross-posted from Easy Diffusion’s blog. Spent the last week or two getting torchruntime fully integrated into Easy Diffusion, and making sure that it handles all the edge-cases. Easy Diffusion now uses torchruntime to automatically install the best-possible version of torch (on the users’ computer) and support a wider variety of GPUs (as well as older GPUs). And it uses a GPU-agnostic device API, so Easy Diffusion will automatically support additional GPUs when they are supported by torchruntime.

  • #easydiffusion
  • #sdkit
  • #freebird
  • #worklog

// Cross-posted from Easy Diffusion’s blog. Continued to test and fix issues in sdkit, after the change to support DirectML. The change is fairly intrusive, since it removes direct references to torch.cuda with a layer of abstraction. Fixed a few regressions, and it now passes all the regression tests for CPU and CUDA support (i.e. existing users). Will test for DirectML next, although it will fail (with out-of-memory) for anything but the simplest tests (since DirectML is quirky with memory allocation).

  • #easydiffusion
  • #sdkit

// Cross-posted from Easy Diffusion’s blog. Worked on adding support for DirectML in sdkit. This allows AMD GPUs and Integrated GPUs to generate images on Windows. DirectML seems like it’s really inefficient with memory though. So for now it only manages to generate images using SD 1.5. XL and larger models fail to generate, even though I have a 12 GB of VRAM in my graphics card.

  • #rocm
  • #pytorch
  • #easydiffusion
  • #torchruntime

// Cross-posted from Easy Diffusion’s blog. Continued from Part 1. Spent a few days figuring out how to compile binary wheels of PyTorch and include all the necessary libraries (ROCm libs or CUDA libs). tl;dr - In Part 2, the compiled PyTorch wheels now include the required libraries (including ROCm). But this isn’t over yet. Torch starts now, but adding two numbers with it produces garbage values (on the GPU). There’s probably a bug in the included ROCBLAS version, might need to recompile ROCBLAS for gfx803 separately. Will tackle that in Part 3 (tbd).

  • #rocm
  • #pytorch
  • #easydiffusion
  • #torchruntime

// Cross-posted from Easy Diffusion’s blog. Continued in Part 2, where I figured out how to include the required libraries in the wheel. Spent all of yesterday trying to compile pytorch with the compile-time PYTORCH_ROCM_ARCH=gfx803 environment variable. tl;dr - In Part 1, I compiled wheels for PyTorch with ROCm, in order to add support for older AMD cards like RX 480. I managed to compile the wheels, but the wheel doesn’t include the required ROCm libraries. I figured that out in Part 2.