~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  # filter by: [ posts | worklogs ]
  • #mlir
  • #easydiffusion
  • #sdkit

Good post on using MLIR for compiling ML models to GPUs. It gives a good broad overview of a GPU architecture, and how MLIR fits into that. The overall series looks pretty interesting too! Making a note here for future reference - https://www.stephendiehl.com/posts/mlir_gpu/

  • #easydiffusion
  • #samplers
  • #c++

Wrote a fresh implementation of most of the popular samplers and schedulers used for image generation (Stable Diffusion and Flux) at https://github.com/cmdr2/samplers.cpp. A few other schedulers (like Align Your Steps) have been left out for now, but are pretty easy to implement. It’s still work-in-progress, and is not ready for public use. The algorithmic port has been completed, and the next step is to test the output values against reference values (from another implementation, e.g. Forge WebUI). After that, I’ll translate it to C++.

  • #easydiffusion
  • #sdkit
  • #compilers

Some notes on machine-learning compilers, gathered while researching tech for Easy Diffusion’s next engine (i.e. sdkit v3). For context, see the design constraints of the new engine. tl;dr summary The current state is: Vendor-specific compilers are the only performant options on consumer GPUs right now. For e.g. TensorRT-RTX for NVIDIA, MiGraphX for AMD, OpenVINO for Intel. Cross-vendor compilers are just not performant enough right now for Stable Diffusion-class workloads on consumer GPUs. For e.g. like TVM, IREE, XLA. The focus of cross-vendor compilers seems to be either on datacenter hardware, or embedded devices. The performance on desktops and laptops is pretty poor. Mojo doesn’t target this category (and doesn’t support Windows). Probably because datacenters and embedded devices are currently where the attention (and money) is.

  • #easydiffusion
  • #sdkit
  • #engine

The design constraints for Easy Diffusion’s next engine (i.e. sdkit v3) are: Lean: Install size of < 200 MB uncompressed (excluding models). Fast: Performance within 10% of the best-possible speed on that GPU for that model. Capable: Supports Stable Diffusion 1.x, 2.x, 3.x, XL, Flux, Chroma, ControlNet, LORA, Embedding, VAE. Supports loading custom model weights (from civitai etc), and memory offloading (for smaller GPUs). Targets: Desktops and Laptops, Windows/Linux/Mac, NVIDIA/AMD/Intel/Apple. I think it’s possible, using ML compilers like TensorRT-RTX (and similar compilers for other platforms). See: Some notes on ML compilers.

  • #tailscale
  • #networking

Tailscale is genuinely super well-made. It’s crazy how well it works.

  • #freebird
  • #vr
  • #api

Freebird v2.2.2 released. It now exposes the states/values of the VR buttons (as custom properties) in FB-Controller-Right and FB-Controller-Left (see: XR Tracking Objects). These values will be updated every frame, when VR is running. You can use these properties to drive shapekeys, or use them in other scripts: To drive a shapekey, please right-click a property, e.g. ’trigger’, and click Copy as New Driver. Then right-click on your shapekey value, and select Paste Driver. To use in a script, use the custom property directly. E.g. bpy.data.objects["FB-Controller-Right"]["trigger"]

  • #freebird

Freebird v2.2.0 released - Freebird now exposes the VR headset and controller positions via three empty objects in the scene: FB-Headset, FB-Controller-Right, and FB-Controller-Left. These three empties live-track the position of the headset and the VR controllers. For e.g. you can attach objects to these empties to animate objects or bones.

  • #tkinter
  • #ui

Spent some time playing with Tkinter, and building a real desktop app with it. It’s pretty specific to my needs, but is open to customization by others. Building UIs with Tkinter was interesting (not frustrating), and it feels almost-there-but-not-quite-there. I still think that HTML/CSS/JS is the best API out there for UI (the good parts), but Tkinter’s mental model and API is quite nice too. Fairly intuitive.

  • #findstarlink
  • #performance
  • #ops

The migration of findstarlink.com to Cloudflare Pages hit an issue (that I can’t describe here), but I had to roll it back for “reasons”. Would’ve been a nice cost-saver, but for now it’ll stay on S3. But the overall infrastructure of findstarlink (various components) is now quite streamlined, and pleasant to develop-for again. I also hit an issue when trying to optimize the loading time of findstarlink.com’s homepage on slow internet connections. On such connections, it takes a long time to download and parse cities.js (600 KB uncompressed, 300 KB compressed). And the UI thread is blocked while that’s happening (often for 10+ seconds).

  • #cities
  • #findstarlink

Released cities-db, a database of ~32,000 cities (cities in the world with population > 15,000), compressed into a format suitable for auto-complete on web pages (~283 KB) or mobile apps. The data is fetched from GeoNames.org, and processed into a custom format. Why? This library was created for findstarlink.com. It would be pretty expensive to use the Google Maps API for auto-complete, or host a dedicated API endpoint. And I don’t see why we need a remote service for this.