# Training

Create and manage LoRA fine-tuning jobs on Lightning Rod datasets. Access via `lr.training` on your `LightningRod` client.

Training jobs use one of two configuration types: **GRPO** (reinforcement-style training for forecasting) or **SFT** (supervised fine-tuning on labeled question–answer pairs). Pass the matching SDK config class; the API stores a discriminated config on the job. When you read `job.config` from `get` or `list`, it is a generated `GRPOTrainingConfig` or `SFTTrainingConfig` from the API (not the thin SDK wrapper classes).

Before starting a training job, prepare your generated dataset with [`prepare_for_training`](/python-sdk/fine-tuning-beta/data-preparation.md). This filters invalid samples, deduplicates, and creates the training-ready `train_dataset` you pass to `lr.training.estimate_cost(...)`, `lr.training.create(...)`, or `lr.training.run(...)`.

## GRPOTrainingConfig

Use for forward-looking / GRPO training. Configure base model, training steps, and optional LoRA parameters:

| Field                 | Type          | Default | Description                                                                              |
| --------------------- | ------------- | ------- | ---------------------------------------------------------------------------------------- |
| `base_model_id`       | str           | —       | HuggingFace model ID for LoRA base (e.g. `"Qwen/Qwen3-8B"`)                              |
| `training_steps`      | int           | —       | Number of training loop iterations                                                       |
| `batch_size`          | int \| None   | None    | Rows per batch; used to slice train\_rows each step                                      |
| `lora_rank`           | int \| None   | None    | LoRA adapter rank                                                                        |
| `learning_rate`       | float \| None | None    | Step size for weight updates; higher values learn faster but may overshoot               |
| `adam_beta1`          | float \| None | None    | Exponential decay rate for first-moment estimates (moving average of gradients)          |
| `adam_beta2`          | float \| None | None    | Exponential decay rate for second-moment estimates (moving average of squared gradients) |
| `num_rollouts`        | int \| None   | None    | Samples per prompt for GRPO                                                              |
| `max_response_length` | int \| None   | None    | Max tokens for sampling                                                                  |
| `start_idx`           | int \| None   | None    | Row index to skip at start; train\_rows = train\_rows\[start\_idx:]                      |
| `save_frequency`      | int \| None   | None    | Checkpoint frequency in training steps (server default if omitted)                       |

## SFTTrainingConfig

Use for supervised fine-tuning. Same core hyperparameters as GRPO where applicable, plus SFT-specific fields. **No** `num_rollouts` or `max_response_length`.

| Field            | Type          | Default | Description                                                        |
| ---------------- | ------------- | ------- | ------------------------------------------------------------------ |
| `base_model_id`  | str           | —       | HuggingFace model ID for LoRA base                                 |
| `training_steps` | int           | —       | Number of training loop iterations                                 |
| `batch_size`     | int \| None   | None    | Rows per batch                                                     |
| `lora_rank`      | int \| None   | None    | LoRA adapter rank                                                  |
| `learning_rate`  | float \| None | None    | Step size for weight updates                                       |
| `adam_beta1`     | float \| None | None    | Adam β₁                                                            |
| `adam_beta2`     | float \| None | None    | Adam β₂                                                            |
| `start_idx`      | int \| None   | None    | Row index to skip at start                                         |
| `save_frequency` | int \| None   | None    | Checkpoint frequency in training steps (server default if omitted) |
| `resume_from`    | str \| None   | None    | Resume from a Tinker checkpoint path                               |
| `epochs`         | int \| None   | None    | Passes over the training data (server default if omitted)          |

## Methods

### estimate\_cost

Estimate training cost before running:

```python
from lightningrod.training import GRPOTrainingConfig

config = GRPOTrainingConfig(
    base_model_id="openai/gpt-oss-120b",
    training_steps=50,
)

cost_estimate = lr.training.estimate_cost(config, dataset=train_dataset)
print(f"Estimated cost: ${cost_estimate.total_cost_dollars:.2f}")
print(f"Effective steps: {cost_estimate.effective_steps}")
print(f"Train tokens: {cost_estimate.train_tokens}")
```

For SFT, use `SFTTrainingConfig` the same way.

Returns `EstimateTrainingCostResponse` with `total_cost_dollars`, `prefill_tokens`, `sample_tokens`, `train_tokens`, `effective_steps`, `notes`, and optional `warning_message`.

### create

Create a training job without waiting:

```python
job = lr.training.create(config, dataset=train_dataset, name="My fine-tune")
print(job.id, job.status)
```

### run

Create a job and poll until completion. In notebooks, shows a live progress display. Outside notebooks, raises on failure:

```python
job = lr.training.run(
    config,
    dataset=train_dataset,
    name="Forecasting fine-tune",
    poll_interval=15.0,
)
print(f"Model ID: {job.model_id}")
```

**Training job fields:** On a completed job, **`job.model_id`** is the final adapter. **`job.model_id_by_step`** maps training step (string keys) to intermediate checkpoint model IDs so you can run evals or inference on a specific checkpoint, not only the final one. See [Evaluating Intermediate Checkpoints](/python-sdk/fine-tuning-beta/evaluation.md#evaluating-intermediate-checkpoints).

### get

Fetch a single job by ID:

```python
job = lr.training.get(job_id)
```

### list

List training jobs with pagination and optional status filter:

```python
response = lr.training.list(page=1, limit=10, status="completed")
for job in response.jobs:
    print(job.id, job.model_id)
```

## Example

See [notebooks/getting\_started/05\_grpo\_training.ipynb](https://github.com/lightning-rod-labs/lightningrod-python-sdk/blob/main/notebooks/getting_started/05_grpo_training.ipynb) for GRPO forecasting workflow and [notebooks/getting\_started/06\_sft\_training.ipynb](https://github.com/lightning-rod-labs/lightningrod-python-sdk/blob/main/notebooks/getting_started/06_sft_training.ipynb) for SFT.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.lightningrod.ai/python-sdk/fine-tuning-beta/training.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
