You have 6-7 articles bookmarked about Claude Code. You've seen the wave. You want to be a part of it. Here's a comprehensive guide from someone who's been using coding AI since 2021 and read all those Claude Code guides so you don't have to.
Positional encoding: how transformers know word orderWords have meaning, but so does their order. How do transformers, which process all tokens at once, understand that "dog bites man" differs from "man bites dog"? A first principles walkthrough from sinusoidal encodings to RoPE.
The pragmatic tradeoff of tied embeddingsIn deep learning, we commonly trade compute for accuracy. Quantization sacrifices precision for speed. Distillation trades model size for latency. Weight sharing reduces parameters at the cost of expressivity. Tied embeddings are one such tradeoff.
How did we make stardust think?From carbon atoms forged in dying stars to neurons firing in your skull to silicon learning to see. The improbable chain that led to artificial intelligence. A first principles and historical journey through neural foundations, backpropagation, and recurrence.