~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  • #tensorRT
  • #torch
  • #easydiffusion
  • #ggml
  • #cuda
  • #vulkan

// Cross-posted from Easy Diffusion’s blog. Experimented with TensorRT-RTX (a new library offered by NVIDIA). The first step was a tiny toy model, just to get the build and test setup working. The reference model in PyTorch: import torch import torch.nn as nn class TinyCNN(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 8, 3, stride=1, padding=1) self.relu = nn.ReLU() self.pool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(8, 4) # 4-class toy output def forward(self, x): x = self.relu(self.conv(x)) x = self.pool(x).flatten(1) return self.fc(x)I ran this on a NVIDIA 4060 8 GB (Laptop) for 10K iterations, on Windows and WSL-with-Ubuntu, with float32 data.

  • #ggml

Spent the last few days refactoring ggml-cpu.c in ggml. The ggml-cpu.c file is currently a monolith with around 15,000 lines of code, and needs to be refactored into separate files and de-duplicated using C++ function templates. The first part of that refactoring was pushed earlier today - https://github.com/ggml-org/ggml/pull/1144 I also worked on the next two PRs - one that splits SIMD Mapping definitions and vectorized functions into separate files, and another that moves all the operator functions (except mul_mat) into a separate C++ file. I tested the combined effect of these two PRs, and it successfully passed the runners on ggml-ci. These two PRs will shrink ggml-cpu.c to around 5k lines (down from 15k lines right now).

  • #ggml
  • #worklog

Added support for float16 ADD/SUB/MUL/DIV operations in the CUDA backend of ggml. Also fixed the CPU implementation of these operations in float16 to work with repeating tensors, and added test cases. PR: https://github.com/ggml-org/ggml/pull/1121 Discussed making ggml-cpu.c into a C++ file, so that we can use function templates to de-duplicate a huge amount of code in that file. Also worked on adding float16 support (in CUDA and CPU) for a number of unary operators, like SQRT, RELU, GELU, SIGMOID, LOG, COS, CLAMP etc. It seems to be passing the tests, so will propose this as a PR soon.

  • #ggml

// Part 2 in the “Simple introduction to ggml” series. At the end of Part 1, we learnt how to keep the model weights separate from temporary computation-only tensor variables. This allowed the model weights to stay in memory across multiple predictions (which is the usual behavior of machine learning programs during inference). Now let’s modify that to build a simple Neural Network model using ggml. If you’re new to ggml, I recommend reading Part 1 first.

  • #ggml

A simple introduction to ggml. // This is Part 1 in a series on ggml. You can read Part 2 after this one. This post uses the new “backend” API in ggml. I wrote this to explain ggml to myself. I’m still learning about it, so please feel free to suggest any corrections! Overall flow of a ggml program At a very high-level, a ggml program has the following steps: Define the tensor variables