It looks like ggml has recently added basic automatic operator fusion into their graph executor (example). It uses a hand-coded list of simple rule-based substitutions (e.g. fuse a matrix multiply followed by add into one op, or a matrix multiply followed by GLU activation into one op etc). Each fused op is a hand-written kernel. These fusion rules are specified per backend (e.g. separate rules for CUDA/ROCm, separate for Vulkan, separate for Metal etc), presumably people may not have written fused ops for certain backends (either due to the backend’s popularity, or lack of sufficient gain in performance).
So ggml is starting to evolve from a simple 1-1 operator execution engine. But at heart, it’s still a dynamic collection of hand-written kernels, instead of a code generating compiler.