Top-p (Nucleus Sampling)
An alternative to temperature: keep only the most-likely tokens that add up to probability p.
Top-p sampling truncates the probability distribution to the smallest set of tokens whose cumulative probability is at least p. It's a more adaptive way to control randomness than temperature the cutoff shifts based on how confident the model is.
A top-p of 0.9 means: 'only sample from the tokens that make up the top 90% of probability mass'. Lower values are safer; 1.0 means no truncation.
In practice teams pick either temperature or top-p (not both) and tune that. For most assistants, top-p around 0.9 with temperature 0 is a solid default.