~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  # filter by: [ posts | worklogs ]
  • #freebird

Updates from June 2025 Note: Freebird is free for students! If you’re a student at a school or college, please feel free to email or message me for a free copy! June 2025 marked a restart of the Freebird project, after a few months of maintenance-only fixes. Reliability My focus in June was on improving Freebird’s reliability. A number of long-standing critical bugs have been fixed, broken features have been repaired, and missing documentation has been updated. Basically, anything that crashed Freebird (or was urgently broken) was considered as an immediate priority.

  • #easydiffusion
  • #blog

Development update for Easy Diffusion - It’s chugging along in starts and stops. Broadly, there are three tracks: Maintenance: The past few months have seen increased support for AMD, Intel and integrated GPUs. This includes AMD on Windows. Added support for the new AMD 9060/9070 cards last week, and the new NVIDIA 50xx cards in March. Flux to the main branch / release v3.5 to stable: Right now, Flux / v3.5 still requires you to enable ED beta first. And then install Forge. Last week I got Flux working in our main engine (with decent rendering speed). It still needs more work to support all the different models formats for Flux. Using Forge was a temporary arrangement, until Flux worked in our main engine.

  • #vr
  • #dom
  • #webxr

Experimented with an idea for extending HTML/CSS/JS to define 3D scenes, treating a 3D scene as just a depth extension of the DOM model. This explores a syntax for defining a 3D scene in a web browser (especially for VR), without WebXR boilerplate and handling XR controller inputs as first-class browser events. I’ll explore a polyfill to support this on existing WebXR-compliant browsers. My previous attempt at this idea (back in 2014) didn’t go so well. At that point, I hadn’t built any VR experiences, and the syntax I came up with wasn’t very practical or productive (at creating anything beyond toy-sized scenes). I’m curious to see if I can do better this time, as most of my work since then has been about building VR experiences.

  • #ggml

Spent the last few days refactoring ggml-cpu.c in ggml. The ggml-cpu.c file is currently a monolith with around 15,000 lines of code, and needs to be refactored into separate files and de-duplicated using C++ function templates. The first part of that refactoring was pushed earlier today - https://github.com/ggml-org/ggml/pull/1144 I also worked on the next two PRs - one that splits SIMD Mapping definitions and vectorized functions into separate files, and another that moves all the operator functions (except mul_mat) into a separate C++ file. I tested the combined effect of these two PRs, and it successfully passed the runners on ggml-ci. These two PRs will shrink ggml-cpu.c to around 5k lines (down from 15k lines right now).

  • #easydiffusion

Upgraded the default version of Easy Diffusion to Python 3.9. Newer versions of torch don’t support Python 3.8, so this became urgent after the release of NVIDIA’s 50xx series GPUs. I choose 3.9 as a temporary fix (instead of a newer Python version), since it had the least amount of package conflicts. The future direction of Easy Diffusion’s backend is unclear right now - there are a bunch of possible paths. So I didn’t want to spend too much time on this. I also wanted to minimize the risk to existing users.

  • #ggml

// Part 2 in the “Simple introduction to ggml” series. At the end of Part 1, we learnt how to keep the model weights separate from temporary computation-only tensor variables. This allowed the model weights to stay in memory across multiple predictions (which is the usual behavior of machine learning programs during inference). Now let’s modify that to build a simple Neural Network model using ggml. If you’re new to ggml, I recommend reading Part 1 first.

  • #ggml

A simple introduction to ggml. // This is Part 1 in a series on ggml. You can read Part 2 after this one. This post uses the new “backend” API in ggml. I wrote this to explain ggml to myself. I’m still learning about it, so please feel free to suggest any corrections! Overall flow of a ggml program At a very high-level, a ggml program has the following steps: Define the tensor variables

  • #easydiffusion
  • #sdkit
  • #amd
  • #torchruntime
  • #windows
  • #intel
  • #integrated
  • #directml

Easy Diffusion (and sdkit) now also support AMD on Windows automatically (using DirectML), thanks to integrating with torchruntime. It also supports integrated GPUs (Intel and AMD) on Windows, making Easy Diffusion faster on PCs without dedicated graphics cards.

  • #easydiffusion
  • #torchruntime
  • #sdkit

Spent the last week or two getting torchruntime fully integrated into Easy Diffusion, and making sure that it handles all the edge-cases. Easy Diffusion now uses torchruntime to automatically install the best-possible version of torch (on the users’ computer) and support a wider variety of GPUs (as well as older GPUs). And it uses a GPU-agnostic device API, so Easy Diffusion will automatically support additional GPUs when they are supported by torchruntime.

  • #easydiffusion
  • #sdkit

Worked on adding support for DirectML in sdkit. This allows AMD GPUs and Integrated GPUs to generate images on Windows. DirectML seems like it’s really inefficient with memory though. So for now it only manages to generate images using SD 1.5. XL and larger models fail to generate, even though I have a 12 GB of VRAM in my graphics card.