Correctness is proven; the memory story needs a harness that isn't rigged in my favor
HypercubeQuant is an in-progress inference-time memory experiment for transformer serving on a single NVIDIA A100. This is an early teaser note: what it is, where it sits in the space, what the current evidence actually supports, and what the next gates are. The short version: ou...
A scientific note on what the current evidence actually supports
This note separates three questions that are easy to conflate in prefix-cache work: exact-prefix serving fairness against SGLang's RadixCache, server-side long-context correctness, and tiered compression in the Hugging Face Qwen path. The current evidence supports a narrow but re...
Most agent toolchains assume the AI is a guest on a human's wallet. Froglet flips that: the protocol gives bots signed economic primitives so one agent can publish a service, discover a peer, and settle a deal without ever seeing a human credential. This is a technical note on ho...