LoRA (Low-Rank Adaptation)
A cheap fine-tuning technique that trains small adapter weights instead of the entire model.
LoRA freezes the original model weights and trains a small set of low-rank matrices that are added during inference. The result is fine-tuning quality close to full fine-tuning at a tiny fraction of the compute, memory, and storage cost.
LoRA adapters are typically a few megabytes vs gigabytes for full weights. You can have many task-specific LoRAs and swap them in at inference time, which is how Stable Diffusion communities ship hundreds of style variants.
QLoRA combines LoRA with quantization to fine-tune even very large models on consumer GPUs. Tools like Hugging Face PEFT, axolotl, and Unsloth make LoRA training accessible.