// Cross-posted from Easy Diffusion’s blog. The design constraints for Easy Diffusion’s next engine (i.e. sdkit v3) are: Lean: Install size of < 200 MB uncompressed (excluding models). Fast: Performance within 10% of the best-possible speed on that GPU for that model. Capable: Supports Stable Diffusion 1.x, 2.x, 3.x, XL, Flux, Chroma, ControlNet, LORA, Embedding, VAE. Supports loading custom model weights (from civitai etc), and memory offloading (for smaller GPUs). Targets: Desktops and Laptops, Windows/Linux/Mac, NVIDIA/AMD/Intel/Apple. I think it’s possible, using ML compilers like TensorRT-RTX (and similar compilers for other platforms). See: Some notes on ML compilers.