AI Definition

LoRA (Low-Rank Adaptation)

A cheap fine-tuning technique that trains small adapter weights instead of the entire model.

LoRA freezes the original model weights and trains a small set of low-rank matrices that are added during inference. The result is fine-tuning quality close to full fine-tuning at a tiny fraction of the compute, memory, and storage cost.

LoRA adapters are typically a few megabytes vs gigabytes for full weights. You can have many task-specific LoRAs and swap them in at inference time, which is how Stable Diffusion communities ship hundreds of style variants.

QLoRA combines LoRA with quantization to fine-tune even very large models on consumer GPUs. Tools like Hugging Face PEFT, axolotl, and Unsloth make LoRA training accessible.

Related concepts

Fine-tuning

Continuing to train an existing model on your own data so it specializes for your task.

Quantization

Compressing model weights to lower precision (4-bit, 8-bit) so they're cheaper to run.

Want help applying this in production?

Our engineers ship AI features into production every week. Tell us what you're building.

Get a Free Quote Contact Us