~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  • #easydiffusion
  • #worklog

Upgraded stable-diffusion.cpp in Easy Diffusion (and sdkit) to the latest version. This adds support for two new models: Lens and PiD, along with a bunch of bug fixes (for e.g. Chroma rendering has been fixed). It also brings back --diffusion-fa by default, which speeds up rendering. And Chroma seems to be rendering faster than before.

  • #easydiffusion
  • #worklog

Just completed another round of fixing support issues in Easy Diffusion. Today’s evidence continues to convince me to move Easy Diffusion away from the Python ecosystem for AI inference on end-user PCs. And definitely to stay away from conda, it’s so leaky. Both are excellent in their own right (especially for training and research), but I don’t think they make sense for end-user inference. Easy Diffusion’s v4 full rollout (which drops Python, torch and conda) can’t come soon enough.

  • #sdkit
  • #easydiffusion

Released Easy Diffusion v4.3 (which updates to sdkit v3.2). This adds support for Ernie Image (and Ernie Turbo), as well as improved support for Anima models. It also includes a bunch of bug fixes in the rendering engine (i.e. stable-diffusion.cpp), and a few community-contributed bug fixes to the UI.

  • #easydiffusion

Development update for Easy Diffusion: the beta branch has been merged into main, so this releases v3.5 (webui) and v4 to everyone. This shouldn’t affect existing users who’re on the main branch, i.e. people using the v3 engine will continue doing so. The two engines (v3.5 and v4) are marked as optional, so new users will continue to get and use v3 by default. The main purpose of this update is to merge the two forked codebases that we’ve had for over 1.5 years. Now the main and beta branches are back in sync. This brings back the streamlined release process that we had previously, where new changes would first land in beta, and then get merged into main after testing.

  • #easydiffusion
  • #sdkit
  • #worklog

Got Easy Diffusion v4 working on Apple and Intel Macs. The performance difference ratio (vs ED v3) is similar to the ratio on Windows (with CUDA) and other deployment targets. So that indicates optimization opportunities in sd.cpp. It’s currently about 1.5x slower than diffusers-based Stable Diffusion. In other news, easyinstaller is also out with its first release, which means that Easy Diffusion can now start shipping AppImage, Flatpak, rpm, deb, pkg, dmg etc for the different platforms. Instead of requiring Linux and Mac users to use the terminal to install and start Easy Diffusion. Will work on this soon.

  • #easydiffusion
  • #worklog
  • #v4

Started the long-pending rewrite of Easy Diffusion’s server code. v4 intends to replace the Python (and PyTorch) based server with a simple C++ version. The reason for rewriting the server in C++ is to achieve sub-second startup time for the UI, and to reduce the download size (won’t need to distribute Python along with Easy Diffusion) or mess with conda/venv etc. And it’s also something that I want to do for personal taste, i.e. de-bloating what doesn’t need to be bloated.

  • #easydiffusion
  • #sdkit
  • #worklog

For Z-Image, the performance of the stock version of chromaForge is poorer than sd.cpp’s. Mainly because chromaForge isn’t able to run the smaller gguf quantized models that sd.cpp is able to run (chromaForge fails with the errors that I was fixing yesterday). If I really want to push through with this, it would be good to fix the remaining issues with gguf models in chromaForge. Only then can the performance be truly compared (in order to decide whether to release this into ED 3.5). I want to compare the performance of the smaller gguf models, because that’s what ED’s users will run typically.

  • #easydiffusion
  • #sdkit
  • #worklog

Worked on fixing Z-Image support in ED’s fork of chromaForge (a fork of Forge WebUI). Fixed a number of integration issues. It’s now crashing on a matrix multiplication error, which looks like an incorrectly transposed matrix (mostly due to reading the weights in the wrong order). I’ll try to install a stock version of chromaForge to see its raw performance with Z-Image (and whether it’s worth pursuing the integration), and also use it to help investigate the matrix multiplication error (and any future errors).

  • #worklog
  • #easydiffusion

Collecting the worklog over the past few weeks. Enabled Flash-Attention and CPU offloading by default in sdkit3 (i.e. Easy Diffusion v4). Added optional VAE tiling (and VAE tile size configuration) via config.yaml in Easy Diffusion v4. Created Easy Diffusion’s fork of Forge WebUI, in order to apply the patches required to run with ED. And also to try adding new features like Z-Image (which are missing in the seemingly-abandoned main Forge repo). Improved the heuristics used for killing and restarting the backend child process, since /ping requests are unreliable if the backend is under heavy load. Merged a few PRs (1 2) for torchruntime that improve support for pinning pre-cu128 torch versions and fix the order of detection of DirectML and CUDA (prefers CUDA). Added progress bars when downloading v4 backend artifacts.

  • #sdkit
  • #easydiffusion

The new engine that’ll power Easy Diffusion’s upcoming v4 release (i.e. sdkit3) has now been integrated into Easy Diffusion. It’s available to test by selecting v4 engine in the Settings tab (after enabling Beta). Please press Save and restart Easy Diffusion after selecting this. It uses stable-diffusion.cpp and ggml under-the-hood, and produces optimized, lightweight builds for the target hardware. The main benefits of Easy Diffusion’s new engine are: