~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  # filter by: [ posts | worklogs ]
  • #worklog
  • #findstarlink

Migrated findstarlink.com back to S3 (from Cloudflare Pages), and started rewriting the website to improve loading speed.

  • #screenrecorder
  • #worklog
  • #tkinter

Built a simple screen recorder for myself using Python and Tkinter, and a few Windows-specific calls (via ctypes). I wanted something just like the Windows Snipping Tool, but with a few customizations for my workflow.

  • #easydiffusion
  • #admin
  • #worklog

// Cross-posted from Easy Diffusion’s blog. Cleared the backlog of stale issues on ED’s github repo. This brought down the number of open issues from ~350 to 74. A number of those suggestions and issues are already being tracked on my task board. The others had either been fixed, or were really old (i.e. not relevant to reply anymore). While I’d have genuinely wanted to solve all of those unresolved issues, I was on a break from this project for nearly 1.5 years, so unfortunately it is what it is.

  • #ggml
  • #worklog

Added support for float16 ADD/SUB/MUL/DIV operations in the CUDA backend of ggml. Also fixed the CPU implementation of these operations in float16 to work with repeating tensors, and added test cases. PR: https://github.com/ggml-org/ggml/pull/1121 Discussed making ggml-cpu.c into a C++ file, so that we can use function templates to de-duplicate a huge amount of code in that file. Also worked on adding float16 support (in CUDA and CPU) for a number of unary operators, like SQRT, RELU, GELU, SIGMOID, LOG, COS, CLAMP etc. It seems to be passing the tests, so will propose this as a PR soon.

  • #cuda
  • #worklog

Good tutorial for understanding the basics of CUDA: https://www.pyspur.dev/blog/introduction_cuda_programming. It also links to NVIDIA’s simple tutorial. Implemented a simple float16 addition kernel in CUDA at https://github.com/cmdr2/study/blob/main/ml/cuda/half_add.cu. Compile it using nvcc -o half_add half_add.cu.

  • #easydiffusion
  • #sdkit
  • #freebird
  • #worklog

// Cross-posted from Easy Diffusion’s blog. Continued to test and fix issues in sdkit, after the change to support DirectML. The change is fairly intrusive, since it removes direct references to torch.cuda with a layer of abstraction. Fixed a few regressions, and it now passes all the regression tests for CPU and CUDA support (i.e. existing users). Will test for DirectML next, although it will fail (with out-of-memory) for anything but the simplest tests (since DirectML is quirky with memory allocation).