~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  # filter by: [ posts | worklogs ]
  • #freebird
  • #worklog

Combining the worklogs for a few days. Worked on testing Freebird and Puppetry with the new XR API changes coming in Blender 5.1 (related to making navigation_scale read-only). Continuing to discuss and give feedback to Blender devs on their #xr chat channel. Investigated why grease pencil strokes render incorrectly in VR, by digging into Blender’s source for that. Haven’t found the reason/fix yet. It seems related to draw_grease_pencil_lib.glsl, but winmat and viewport_res don’t change when we change xr_session_state.navigation_scale or xr_session_settings.base_scale. Added the ability to set the location of the camera preview in Puppetry. Submitted a fix for the regression introduced in Blender 5.0.1 (causes Blender to crash when Freebird or Puppetry are started) - https://projects.blender.org/blender/blender/pulls/152237

  • #worklog
  • #easydiffusion

Collecting the worklog over the past few weeks. Enabled Flash-Attention and CPU offloading by default in sdkit3 (i.e. Easy Diffusion v4). Added optional VAE tiling (and VAE tile size configuration) via config.yaml in Easy Diffusion v4. Created Easy Diffusion’s fork of Forge WebUI, in order to apply the patches required to run with ED. And also to try adding new features like Z-Image (which are missing in the seemingly-abandoned main Forge repo). Improved the heuristics used for killing and restarting the backend child process, since /ping requests are unreliable if the backend is under heavy load. Merged a few PRs (1 2) for torchruntime that improve support for pinning pre-cu128 torch versions and fix the order of detection of DirectML and CUDA (prefers CUDA). Added progress bars when downloading v4 backend artifacts.

  • #worklog
  • #freebird

Collecting the worklog over the past 2 weeks. Added the ability to add and edit Text objects in Freebird while inside VR. This is useful for adding notes and labels while working in VR - https://x.com/freebirdxr/status/2004091164946059451 Added a “Camera Preview” feature in Puppetry, to allow the user to see the live view from the scene Camera, while recording. This is helpful in avoiding surprises after recording, for e.g. realizing that the movements weren’t captured correctly by the scene Camera.

  • #ml
  • #compiler
  • #onnx
  • #ggml
  • #sdkit
  • #worklog

Wrote a simple script to convert ONNX to GGML. It auto-generates C++ code that calls the corresponding ggml functions (for each ONNX operator). This file can then be compiled and run like a normal C++ ggml program, and will produce the same results as the original model in PyTorch. The generated file can work on multiple backends: CPU, CUDA, ROCm, Vulkan, Metal etc, by providing the correct compiler flags during cmake -B, e.g. -D GGML_CUDA=1 for CUDA.

  • #worklog
  • #findstarlink

Migrated findstarlink.com back to S3 (from Cloudflare Pages), and started rewriting the website to improve loading speed.

  • #screenrecorder
  • #worklog
  • #tkinter

Built a simple screen recorder for myself using Python and Tkinter, and a few Windows-specific calls (via ctypes). I wanted something just like the Windows Snipping Tool, but with a few customizations for my workflow.

  • #easydiffusion
  • #admin
  • #worklog

Cleared the backlog of stale issues on ED’s github repo. This brought down the number of open issues from ~350 to 74. A number of those suggestions and issues are already being tracked on my task board. The others had either been fixed, or were really old (i.e. not relevant to reply anymore). While I’d have genuinely wanted to solve all of those unresolved issues, I was on a break from this project for nearly 1.5 years, so unfortunately it is what it is.

  • #ggml
  • #worklog

Added support for float16 ADD/SUB/MUL/DIV operations in the CUDA backend of ggml. Also fixed the CPU implementation of these operations in float16 to work with repeating tensors, and added test cases. PR: https://github.com/ggml-org/ggml/pull/1121 Discussed making ggml-cpu.c into a C++ file, so that we can use function templates to de-duplicate a huge amount of code in that file. Also worked on adding float16 support (in CUDA and CPU) for a number of unary operators, like SQRT, RELU, GELU, SIGMOID, LOG, COS, CLAMP etc. It seems to be passing the tests, so will propose this as a PR soon.

  • #cuda
  • #worklog

Good tutorial for understanding the basics of CUDA: https://www.pyspur.dev/blog/introduction_cuda_programming. It also links to NVIDIA’s simple tutorial. Implemented a simple float16 addition kernel in CUDA at https://github.com/cmdr2/study/blob/main/ml/cuda/half_add.cu. Compile it using nvcc -o half_add half_add.cu.

  • #easydiffusion
  • #sdkit
  • #freebird
  • #worklog

Continued to test and fix issues in sdkit, after the change to support DirectML. The change is fairly intrusive, since it removes direct references to torch.cuda with a layer of abstraction. Fixed a few regressions, and it now passes all the regression tests for CPU and CUDA support (i.e. existing users). Will test for DirectML next, although it will fail (with out-of-memory) for anything but the simplest tests (since DirectML is quirky with memory allocation).