Krypton
GitHub →Krypton is a learning-focused tensor library: dynamic tensors, broadcasting, and autograd built from scratch, with matmul as the performance centerpiece.
- Kernel ladder from naive → L1/L2 tiled → SIMD (NEON/AVX2) → parallel (Rayon), with ~80× speedup from naive to optimized at 512² and peak throughput in the ~50–70 GFLOPS range on typical hardware.
- Systems techniques: cache-friendly tile sizes (~64×64 in L1), register-blocked micro-kernels, B packing so SIMD loads are contiguous, and row-chunk parallelism without false sharing.
- Define-by-run autograd: gradients flow through the same matmul kernels used in the forward pass.
- End-to-end validation: MNIST MLP training binary compares backends: naive matmul can take tens of minutes per epoch while SIMD+parallel finishes an epoch in seconds; 86+ tests, clippy-clean.
Rust · SIMD · Rayon · Autograd