Fine-tuning
Adapting LLMs to domain-specific tasks via LoRA — i.e., train small adapter weights on custom datasets to specialize model behavior.
Overview
Fine-tuning trains a LoRA (Low-Rank Adaptation) adapter on top of an LLM base model, to be used at inference time with completion.
Load any supported LLM using modelType: "llm". Then call finetune() with the dataset and training settings. Training can be done in two modes:
- SFT: chat-based; enable with
assistantLossOnly: true. - Causal: raw text; default (
assistantLossOnly: false).
The output is a small .gguf adapter file that you can pass to completion() via modelConfig.lora.
Functions
Use the following sequence of function calls:
loadModel()finetune()unloadModel()
For how to use each function, see SDK — API reference.
Models
You can fine-tune any llama.cpp-compatible text-generation/chat model. Base model file format: *.gguf.
For models available as constants, see SDK — Models.
Training
SFT
Supervised fine-tuning (SFT) teaches the model how to respond to prompts. Use it for chat tuning, instruction following, or any task where you want to shape assistant responses.
Dataset format: JSONL where each line is a JSON object with a messages array. Supported roles: system, user, assistant, and tool. Example:
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"2+2 equals 4."}]}
{"messages":[{"role":"user","content":"What is the capital of France?"},{"role":"assistant","content":"The capital of France is Paris."}]}Causal
Causal fine-tuning adapts the model to a domain by training on raw text. Use it for domain adaptation, style transfer, or tasks where you want the model to better reflect specialized vocabulary, patterns, or tone.
Dataset format: plain text file. Example:
This is sample training text.
Another paragraph of content.Example
The following script loads an LLM, runs fine-tuning on a chat dataset with a separate eval file, and optionally demonstrates pause/resume when invoked with --pause-resume:
import {
finetune,
loadModel,
QWEN3_600M_INST_Q4,
unloadModel,
type FinetuneHandle,
type FinetuneResult,
type FinetuneRunParams,
} from "@qvac/sdk";
const pauseResumeEnabled = process.argv.includes("--pause-resume");
let modelId: string | undefined;
let exitCode = 0;
async function readProgress(
handle: FinetuneHandle,
onTick: (globalSteps: number) => void,
) {
for await (const tick of handle.progressStream) {
const phase = tick.is_train ? "train" : "val";
console.log(
`epoch=${tick.current_epoch + 1} step=${tick.global_steps} batch=${tick.current_batch}/${tick.total_batches} ${phase} loss=${tick.loss?.toFixed(4)} acc=${tick.accuracy?.toFixed(4)} eta=${Math.round(tick.eta_ms / 1000)}s`,
);
onTick(tick.global_steps);
}
}
try {
modelId = await loadModel({
modelSrc: QWEN3_600M_INST_Q4,
modelType: "llm",
modelConfig: {
device: "gpu",
ctx_size: 512,
},
});
console.log(`Model loaded with ID: ${modelId}`);
const loadedModelId = modelId;
const finetuneParams: FinetuneRunParams = {
modelId: loadedModelId,
options: {
trainDatasetDir: "./examples/finetune/input/small_train_HF.jsonl",
validation: {
type: "dataset",
path: "./examples/finetune/input/small_eval_HF.jsonl",
},
numberOfEpochs: 2,
learningRate: 1e-4,
lrMin: 1e-8,
loraModules: "attn_q,attn_k,attn_v,attn_o,ffn_gate,ffn_up,ffn_down",
assistantLossOnly: true,
checkpointSaveSteps: 2,
checkpointSaveDir: "./examples/finetune/results/checkpoints",
outputParametersDir: "./examples/finetune/results",
},
};
const handle = finetune(finetuneParams);
let pauseRequested = false;
let pauseResultPromise: Promise<FinetuneResult> | undefined;
const progressTask = readProgress(handle, (globalSteps) => {
if (pauseResumeEnabled && !pauseRequested && globalSteps >= 10) {
pauseRequested = true;
console.log("Requesting a pause so the run can be resumed...");
pauseResultPromise = finetune({
modelId: loadedModelId,
operation: "pause",
});
}
});
const initialResult = await handle.result;
await progressTask;
if (pauseResultPromise) {
await pauseResultPromise;
}
console.log("Initial finetune result:", initialResult);
if (pauseResumeEnabled && initialResult.status === "PAUSED") {
console.log("Resuming from the saved checkpoint...");
const resumedHandle = finetune({
...finetuneParams,
operation: "resume",
});
const resumedProgressTask = readProgress(resumedHandle, function () {});
const resumedResult = await resumedHandle.result;
await resumedProgressTask;
console.log("Resumed finetune result:", resumedResult);
}
} catch (error) {
console.error("Error:", error);
exitCode = 1;
} finally {
if (modelId) {
await unloadModel({ modelId, clearStorage: false });
}
}
process.exit(exitCode);Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.