~ / cmdr2

projects: freebird, easy diffusion

hacks: carbon editor, torchruntime, findstarlink

  # filter by: [ posts | worklogs ]
  • #vr
  • #ui
  • #freebird

// Cross-posted from Freebird’s blog. Really need to figure out a way to render standard HTML elements (styled with CSS and modified with JS) in a 3D scene. Reinventing excellent libraries like PrimeVue again inside 3D (for rendering in VR) is just wasteful. There have been attempts, e.g. A-Frame, but we really need to view the webpage in 3D. Just regular HTML elements. The regular DOM renderer. The pieces feel like they’re there conceptually, but the implementation gap is probably big enough (that it hasn’t happened yet).

  • #c++
  • #imgui
  • #browser

A simple browser-like shell using ImGui and GLFW. It was supposed to show a webview, but I couldn’t figure out how to embed a webview in the window (instead of it popping up in its own window). Maybe I’ll revisit this in the future if I can figure it out. Create a folder named thirdparty (alongside main.cpp and CMakeLists.txt) and clone the git repositories for imgui and glfw into the thirdparty folder. Then compile using:

  • #findstarlink
  • #ai
  • #llm

I spent some time today doing support for Freebird, Puppetry and Easy Diffusion. Identified a bug in Freebird (bone axis gizmos aren’t scaling correctly in VR), got annoyed by how little documentation I’ve written for Puppetry’s scripting API, and got reminded about how annoying it is for Easy Diffusion to force-download the poor quality starter model (stock SD 1.4) during installation. The majority of the day was spent in using a local LLM for classifying emails. I get a lot of repetitive emails for FindStarlink - people telling me whether they saw Starlink or not (using the predictions on the website). The first part of my reply is always a boilerplate “Glad you saw it” or “Sorry about that”, followed by email-specific replies. I’d really like the system to auto-fill the first part of the email, if it’s a report about Starlink sighting.

  • #ai
  • #ml
  • #llm

Built two experiments using locally-hosted LLMs. One is a script that lets two bots chat with each other endlessly. The other is a browser bookmarklet that summarizes the selected text in 300 words or less. Both use an OpenAI-compatible API, so they can be pointed at regular OpenAI-compatible remote servers, or your own locally-hosted servers (like LMStudio). Bot Chat Summarize Bookmarklet The bot chat script is interesting, but the conversation starts stagnating/repeating after 20-30 messages. The conversation is definitely very interesting initially. The script lets you define the names and descriptions of the two bots, the scene description, and the first message by the first bot. After that, it lets the two bots talk to each other endlessly.

  • #easydiffusion
  • #v4
  • #ui

// Cross-posted from Easy Diffusion’s blog. Notes on two directions for ED4’s UI that I’m unlikely to continue on. One is to start a desktop app with a full-screen webview (for the app UI). The other is writing the tabbed browser-like shell of ED4 in a compiled language (like Go or C++) and loading the contents of the tabs as regular webpages (by using webviews). So it would load URLs like http://localhost:9000/ui/image_editor and http://localhost:9000/ui/settings etc.

  • #easydiffusion
  • #ui
  • #design
  • #v4

// Cross-posted from Easy Diffusion’s blog. Worked on a few UI design ideas for Easy Diffusion v4. I’ve uploaded the work-in-progress mockups at https://github.com/easydiffusion/files. So far, I’ve mocked out the design for the outer skeleton. That is, the new tabbed interface, the status bar, and the unified main menu. I also worked on how they would look like on mobile devices. It gives me a rough idea of the Vue components that would need to be written, and the surface area that plugins can impact. For e.g. plugins can add a new menu entry only in the Plugins sub-menu.

  • #freebird
  • #vr
  • #ar
  • #blender

// Cross-posted from Freebird’s blog. Freebird is finally out on sale - https://freebirdxr.com/buy It’s still called an Early Access version, since it needs more work to feel like a cohesive product. It’s already got quite a lot of features, and it’s definitely useful. But I think it’s still missing a few key features, and needs an overall “fine-tuning” of the user experience and interface. So yeah, lots more to do. But it feels good to get something out on sale after nearly 4 years of development. Freebird has already spent 2 years in free public beta, so quite a number of people have already used it.

  • #ai
  • #learning
  • #self-awareness

Today I explored an idea for what might happen if an AI model runs continuously, processing inputs, acting and receiving sensory inputs without interruption. Maybe in a text-adventure game. Instead of responding to isolated prompts, the AI would live in a simulated environment, interacting with its world in real time. The experiment is about observing whether behaviors like an understanding of time, awareness, or even a sense of self could emerge naturally through sustained operation.

  • #ml
  • #transformers
  • #diffusion

Spent a few days learning more about Diffusion models, UNets and Transformers. Wrote a few toy implementations of a denoising diffusion model (following diffusers’ tutorial) and a simple multi-headed self-attention model for next-character prediction (following Karpathy’s video). The non-latent version of the denoising model was trained on the Smithsonian Butterfly dataset, and it successfully generates new butterfly images. But it’s unconditional (i.e. no text prompts), and non-latent (i.e. works directly on the image data, instead of a compressed latent space).

  • #easydiffusion
  • #stable-diffusion
  • #c++

// Cross-posted from Easy Diffusion’s blog. Spent some more time on the v4 experiments for Easy Diffusion (i.e. C++ based, fast-startup, lightweight). stable-diffusion.cpp is missing a few features, which will be necessary for Easy Diffusion’s typical workflow. I wasn’t keen on forking stable-diffusion.cpp, but it’s probably faster to work on a fork for now. For now, I’ve added live preview and per-step progress callbacks (based on a few pending pull-requests on sd.cpp). And protection from GGML_ASSERT killing the entire process. I’ve been looking at the ability to load individual models (like the vae) without needing to reload the entire SD model.