A Field Guide to How Practitioners Talk About LLMs

Sit in a room where people deploy LLMs for a living and the vocabulary shifts under you. Someone mentions “shooting the vector” and the rest of the table nods. Someone else says “we’re decode-bound” and a third person asks whether it’s an MoE or dense bottleneck. None of these phrases get explained because the people using them assume everyone in the conversation knows.

For the last year I’ve been collecting the phrases that gate that conversation: the ones you have to know to participate, and the ones that do not appear in any single textbook because the field moves faster than textbooks. Today we’re publishing the first version of that collection as a tool: the LLM Lexicon, 25 terms across serving, training, prompting, and evaluation, with a flashcard mode for active recall.

It’s a living document. New terms drop in as the field shifts.

Why a glossary, and why now

Two reasons.

First, the gap between what model marketing says and what practitioners say has gotten wider. Vendor pages describe capability; practitioners describe the cost of producing that capability. A model card reports a benchmark score; a serving engineer wants to know whether the KV cache fits and what the prefill-versus-decode split looks like at your traffic shape. Both conversations are about the same model and they don’t overlap in vocabulary. Bridging that gap is the point of the glossary.

Second, the agent era has multiplied the surface area. A 2023 LLM conversation could stay on prompting and call it done. A 2026 conversation has to cover tool calling, MCP, prompt injection, reasoning traces, and context-window economics in the same breath. The vocabulary expanded, and most of the recent additions are operational rather than theoretical.

Worked example: “shooting the vector”

The phrase that nudged me to publish this was “shooting the vector.” A colleague used it in passing and assumed I knew what they meant. I half-knew. The full name is activation steering or representation engineering: you compute a direction in the model’s residual stream that corresponds to a behavioral trait (refusal, sycophancy, truthfulness, a topic), then you add or subtract that direction at inference time to amplify or suppress the trait. No retraining, no fine-tuning, just a vector you add to the hidden states at one chosen layer.

Practitioners “shoot the refusal vector at minus two” to make an open-weight model stop refusing benign requests. They shoot a topic vector at plus one to bias generation toward a domain. They shoot a hallucination-probe vector at minus one to suppress confident wrong answers. The technique sits in a useful middle ground: prompting is cheap but fragile, fine-tuning is durable but expensive, steering is small, fast, and runtime-toggleable.

The reason “shooting” caught on is that the operation looks like aiming: you’ve computed a direction, and at runtime you fire it into the residual stream with a scalar coefficient. The card in the lexicon for this term has the inline diagram and the longer treatment. It’s the headline entry because it’s exactly the kind of phrase that excludes you from a conversation if you don’t know it.

What’s in the v1 collection

Twenty-five terms across four categories.

Serving and infra covers the operational vocabulary: prefill versus decode, KV cache, quantization (FP8, FP4, INT4, AWQ, online versus static), MoE versus dense, context window, speculative decoding, and tensor parallelism. These are the phrases that come up the moment you stop talking about a model in the abstract and start talking about it on hardware.

Training and alignment covers the production pipeline: pre-training, SFT, RLHF, DPO, LoRA, distillation, and steering vectors. Each has its own failure mode and its own use case. You’ll hear “we DPO’d it” in 2026 where you would have heard “we RLHF’d it” in 2023.

Prompting and agents covers the layer between the model and the application: system prompts, in-context learning, chain of thought, tool calling, MCP, and prompt injection. The agent-substrate vocabulary is the youngest and the fastest-moving; expect this category to grow.

Eval and capability covers how we judge whether the model is any good: perplexity, pass@k, LLM-as-judge, hallucination, and saturation. Half of reading a model release is knowing which benchmarks still discriminate and which have saturated.

The editorial bar

A few rules I set for what makes the cut:

Practitioners actually say it. If the term appears in a glossary but never in a real deployment conversation, it’s not in. The test is whether someone has used the phrase in a Slack thread or a code review.
It does work in a sentence. Each term has to clarify rather than decorate. “Inference” is too generic to make it; “decode” is specific enough that the conversation moves when you use it.
The card explains when you’d hear it. Every entry has a “you’ll hear it when” field, because knowing the definition without knowing the situational use is half a victory. A term you can define but can’t recognize in the wild is still opaque.
No vendor marketing terms. “Foundation model” makes it because researchers use it. “Generative AI platform” does not, because nobody serious calls it that in conversation.

How to use the tool

/tools/llm-lexicon/ has two view modes. Browse is the glossary; you can search, filter by category, and click any card to flip it for the longer treatment. Flashcards mode shows one card at a time with keyboard shortcuts for prev, next, and flip — better for active recall. Progress is stored in your browser; you can mark cards as known and the lexicon tracks them locally. No login.

If you want to dispute a definition, suggest a term, or argue for a different framing, the field guide is meant to evolve. We add entries when the vocabulary moves.

What’s next

A few things on the roadmap:

A quiz mode (multiple-choice and “name the term from the description”) once the term set settles.
Attributed quotes from the AI thinkers whose framings became the standard. Karpathy’s “LLM OS,” Sutton’s “Bitter Lesson,” LeCun’s autoregressive critique, Chollet’s program-synthesis-vs-interpolation distinction. Hearing how the people who built or framed these ideas describe them is the best way to internalize the vocabulary.
A deeper-cut layer for the people who already know the basics and want the next twenty-five phrases.

For now, the v1 collection should get you through most of a 2026 LLM conversation without needing to bluff. The vocabulary is the entrance fee, and it’s a fixable one.

A Field Guide to How Practitioners Talk About LLMs

Why a glossary, and why now

Worked example: “shooting the vector”

What’s in the v1 collection

The editorial bar

How to use the tool

What’s next

More from the blog

What Open-Weight LLMs Actually Fit on a Single H200

Q-Day: The Plan Is Crypto Agility

47 Days: The End of Manual Certificate Management

Want to discuss this topic?