Ever wondered if a single gaming GPU could power a massive Llama 3.1 model? This project shows how to run the 70‑billion‑parameter Llama on one RTX 3090 by routing data straight from NVMe storage to the GPU, skipping the CPU. Check out the full guide and code here:
https://github.com/xaskasdf/ntransformer