~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  • #tensorRT
  • #torch
  • #easydiffusion
  • #ggml
  • #cuda
  • #vulkan

// Cross-posted from Easy Diffusion’s blog. Experimented with TensorRT-RTX (a new library offered by NVIDIA). The first step was a tiny toy model, just to get the build and test setup working. The reference model in PyTorch: import torch import torch.nn as nn class TinyCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 8, 3, stride=1, padding=1) self.relu = nn.ReLU() self.pool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(8, 4) # 4-class toy output def forward(self, x): x = self.relu(self.conv(x)) x = self.pool(x).flatten(1) return self.fc(x)I ran this on a NVIDIA 4060 8 GB (Laptop) for 10K iterations, on Windows and WSL-with-Ubuntu, with float32 data.

  • #cuda
  • #worklog

Good tutorial for understanding the basics of CUDA: https://www.pyspur.dev/blog/introduction_cuda_programming. It also links to NVIDIA’s simple tutorial. Implemented a simple float16 addition kernel in CUDA at https://github.com/cmdr2/study/blob/main/ml/cuda/half_add.cu. Compile it using nvcc -o half_add half_add.cu.

  • #stable-diffusion
  • #c++
  • #cuda
  • #easydiffusion
  • #lab
  • #performance
  • #featured

// Cross-posted from Easy Diffusion’s blog. tl;dr - Today, I worked on using stable-diffusion.cpp in a simple C++ program. As a linked library, as well as compiling sd.cpp from scratch (with and without CUDA). The intent was to get a tiny and fast-starting executable UI for Stable Diffusion working. Also, ChatGPT is very helpful! Part 1: Using sd.cpp as a library First, I tried calling the stable-diffusion.cpp library from a simple C++ program (which just loads the model and renders an image). Via dynamic linking. That worked, and its performance was the same as the example sd.exe CLI, and it detected and used the GPU correctly.