Temperature is a decoding parameter for generative models that scales the logits before sampling, controlling how random or deterministic the next-token selection is.
What is Temperature (LLM Sampling Temperature)?
When a language model predicts the next token, it produces logits, which are unnormalized scores for each token in the vocabulary. Temperature modifies these logits before applying softmax. With temperature T, the logits are divided by T. Lower temperatures, such as 0.2 to 0.7, sharpen the probability distribution so high probability tokens become even more dominant, which makes outputs more consistent and less creative. A temperature of 1.0 leaves the distribution unchanged. Higher temperatures, such as 1.1 to 1.5, flatten the distribution, increasing the chance of selecting lower probability tokens and producing more diverse outputs. Temperature is most useful when you are sampling, for example with top-p sampling, rather than when you use greedy decoding.
Where temperature is used and why it matters
Temperature is used in chat assistants, creative writing, code generation, and multi-agent workflows to tune the trade-off between diversity and reliability. In production systems, it is often set lower for tasks that demand precision, like customer support, data extraction, and tool calling, because variability can break formats or introduce hallucinations. It is set higher for brainstorming, marketing copy, and ideation where diversity is a feature. Temperature also interacts with other decoding controls. For example, if you use a strict top-p value, raising temperature might still not introduce much diversity because the candidate set is already narrow. Conversely, a very high temperature with a high top-p can increase off-topic responses.
Examples
1) Factual Q and A: temperature 0.2 to 0.5 to reduce variability and keep answers stable.
2) Brainstorming: temperature 0.8 to 1.2 to encourage multiple plausible suggestions.
3) Code generation: temperature 0.0 to 0.3 when strict correctness and deterministic outputs are required.
FAQs
1. Does temperature improve accuracy?
Not directly. Lower temperature can reduce randomness, but it does not make the model “more correct” in a strict sense.
2. What is the difference between temperature and top-p?
Temperature reshapes probabilities, while top-p restricts the candidate tokens by cumulative probability mass.
3. Can I set temperature to zero?
Some APIs allow it and treat it as deterministic sampling, similar to greedy decoding.
4. How do I pick a temperature for production?
Start low, evaluate with task-specific tests, then increase only if you need more variety without breaking constraints.