gearsTraining

Create and manage LoRA fine-tuning jobs on Lightning Rod datasets. Access via lr.training on your LightningRod client.

Training jobs use one of two configuration types: GRPO (reinforcement-style training for forecasting) or SFT (supervised fine-tuning on labeled question–answer pairs). Pass the matching SDK config class; the API stores a discriminated config on the job. When you read job.config from get or list, it is a generated GRPOTrainingConfig or SFTTrainingConfig from the API (not the thin SDK wrapper classes).

Before starting a training job, prepare your generated dataset with prepare_for_training. This filters invalid samples, deduplicates, and creates the training-ready train_dataset you pass to lr.training.estimate_cost(...), lr.training.create(...), or lr.training.run(...).

GRPOTrainingConfig

Use for forward-looking / GRPO training. Configure base model, training steps, and optional LoRA parameters:

Field
Type
Default
Description

base_model_id

str

HuggingFace model ID for LoRA base (e.g. "Qwen/Qwen3-8B")

training_steps

int

Number of training loop iterations

batch_size

int | None

None

Rows per batch; used to slice train_rows each step

lora_rank

int | None

None

LoRA adapter rank

learning_rate

float | None

None

Step size for weight updates; higher values learn faster but may overshoot

adam_beta1

float | None

None

Exponential decay rate for first-moment estimates (moving average of gradients)

adam_beta2

float | None

None

Exponential decay rate for second-moment estimates (moving average of squared gradients)

num_rollouts

int | None

None

Samples per prompt for GRPO

max_response_length

int | None

None

Max tokens for sampling

start_idx

int | None

None

Row index to skip at start; train_rows = train_rows[start_idx:]

save_frequency

int | None

None

Checkpoint frequency in training steps (server default if omitted)

SFTTrainingConfig

Use for supervised fine-tuning. Same core hyperparameters as GRPO where applicable, plus SFT-specific fields. No num_rollouts or max_response_length.

Field
Type
Default
Description

base_model_id

str

HuggingFace model ID for LoRA base

training_steps

int

Number of training loop iterations

batch_size

int | None

None

Rows per batch

lora_rank

int | None

None

LoRA adapter rank

learning_rate

float | None

None

Step size for weight updates

adam_beta1

float | None

None

Adam β₁

adam_beta2

float | None

None

Adam β₂

start_idx

int | None

None

Row index to skip at start

save_frequency

int | None

None

Checkpoint frequency in training steps (server default if omitted)

resume_from

str | None

None

Resume from a Tinker checkpoint path

epochs

int | None

None

Passes over the training data (server default if omitted)

Methods

estimate_cost

Estimate training cost before running:

For SFT, use SFTTrainingConfig the same way.

Returns EstimateTrainingCostResponse with total_cost_dollars, prefill_tokens, sample_tokens, train_tokens, effective_steps, notes, and optional warning_message.

create

Create a training job without waiting:

run

Create a job and poll until completion. In notebooks, shows a live progress display. Outside notebooks, raises on failure:

Training job fields: On a completed job, job.model_id is the final adapter. job.model_id_by_step maps training step (string keys) to intermediate checkpoint model IDs so you can run evals or inference on a specific checkpoint, not only the final one. See Evaluating Intermediate Checkpoints.

get

Fetch a single job by ID:

list

List training jobs with pagination and optional status filter:

Example

See notebooks/getting_started/05_grpo_training.ipynbarrow-up-right for GRPO forecasting workflow and notebooks/getting_started/06_sft_training.ipynbarrow-up-right for SFT.

Last updated