Training
Create and manage LoRA fine-tuning jobs on Lightning Rod datasets. Access via lr.training on your LightningRod client.
Training jobs use one of two configuration types: GRPO (reinforcement-style training for forecasting) or SFT (supervised fine-tuning on labeled question–answer pairs). Pass the matching SDK config class; the API stores a discriminated config on the job. When you read job.config from get or list, it is a generated GRPOTrainingConfig or SFTTrainingConfig from the API (not the thin SDK wrapper classes).
Before starting a training job, prepare your generated dataset with prepare_for_training. This filters invalid samples, deduplicates, and creates the training-ready train_dataset you pass to lr.training.estimate_cost(...), lr.training.create(...), or lr.training.run(...).
GRPOTrainingConfig
Use for forward-looking / GRPO training. Configure base model, training steps, and optional LoRA parameters:
base_model_id
str
—
HuggingFace model ID for LoRA base (e.g. "Qwen/Qwen3-8B")
training_steps
int
—
Number of training loop iterations
batch_size
int | None
None
Rows per batch; used to slice train_rows each step
lora_rank
int | None
None
LoRA adapter rank
learning_rate
float | None
None
Step size for weight updates; higher values learn faster but may overshoot
adam_beta1
float | None
None
Exponential decay rate for first-moment estimates (moving average of gradients)
adam_beta2
float | None
None
Exponential decay rate for second-moment estimates (moving average of squared gradients)
num_rollouts
int | None
None
Samples per prompt for GRPO
max_response_length
int | None
None
Max tokens for sampling
start_idx
int | None
None
Row index to skip at start; train_rows = train_rows[start_idx:]
save_frequency
int | None
None
Checkpoint frequency in training steps (server default if omitted)
SFTTrainingConfig
Use for supervised fine-tuning. Same core hyperparameters as GRPO where applicable, plus SFT-specific fields. No num_rollouts or max_response_length.
base_model_id
str
—
HuggingFace model ID for LoRA base
training_steps
int
—
Number of training loop iterations
batch_size
int | None
None
Rows per batch
lora_rank
int | None
None
LoRA adapter rank
learning_rate
float | None
None
Step size for weight updates
adam_beta1
float | None
None
Adam β₁
adam_beta2
float | None
None
Adam β₂
start_idx
int | None
None
Row index to skip at start
save_frequency
int | None
None
Checkpoint frequency in training steps (server default if omitted)
resume_from
str | None
None
Resume from a Tinker checkpoint path
epochs
int | None
None
Passes over the training data (server default if omitted)
Methods
estimate_cost
Estimate training cost before running:
For SFT, use SFTTrainingConfig the same way.
Returns EstimateTrainingCostResponse with total_cost_dollars, prefill_tokens, sample_tokens, train_tokens, effective_steps, notes, and optional warning_message.
create
Create a training job without waiting:
run
Create a job and poll until completion. In notebooks, shows a live progress display. Outside notebooks, raises on failure:
Training job fields: On a completed job, job.model_id is the final adapter. job.model_id_by_step maps training step (string keys) to intermediate checkpoint model IDs so you can run evals or inference on a specific checkpoint, not only the final one. See Evaluating Intermediate Checkpoints.
get
Fetch a single job by ID:
list
List training jobs with pagination and optional status filter:
Example
See notebooks/getting_started/05_grpo_training.ipynb for GRPO forecasting workflow and notebooks/getting_started/06_sft_training.ipynb for SFT.
Last updated
