clock-rotate-leftChangelog

v0.1.23 — May 2026

New: Dataset linting API

Validate dataset quality before training with lr.datasets.linter. Run all rules or a specific subset on any dataset, with live progress display in notebooks. Use display_lint_overview and display_lint_detailed to inspect results, and get_lint_affected_sample_ids to extract flagged sample IDs for filtering.

See Datasets — Linting.

New: Reasoning comparison for evals

Compare the reasoning quality of two models side-by-side using an LLM judge during evaluation. Pass ReasoningComparisonOptions to lr.evals.run or lr.evals.create, or use the reasoning_comparison_sample_size shorthand on run_from_training_job. The judge model, sample count, and instructions are all configurable.

See Evaluation — Reasoning Comparison.

New: Eval result download and loading

Download per-model eval rollout results as Parquet files with lr.evals.download_results, or load them directly into pandas DataFrames with lr.evals.load_results.

See Evaluation — Downloading Results.

v0.1.22 — April 2026

Breaking: training config split (GRPO vs SFT)

The single TrainingConfig export is removed. Use GRPOTrainingConfig for GRPO / forward-looking training (same hyperparameters as before, including num_rollouts and max_response_length) and SFTTrainingConfig for supervised fine-tuning (epochs, resume_from, and shared LoRA fields; no rollouts or max response length).

lr.training.create, estimate_cost, and run accept either config type. TrainingJob.config from the API remains a discriminated union of the generated API models.

See Training for field tables.

Breaking: evals.run takes dataset and models; training defaults are run_from_training_job

lr.evals.run(dataset, models) creates an eval job, waits, and shows live progress (same as before minus model inference from the training job). For the previous behavior—base + fine-tuned from a completed TrainingJob—use lr.evals.run_from_training_job(config, job, dataset, *, extra_models=None). SFTTrainingConfig raises NotImplementedError from run_from_training_job until SFT eval metrics exist; use lr.evals.run(dataset, models) or lr.evals.create(...) with an explicit model list.

See Evaluation.

New: SFT getting-started notebook

notebooks/getting_started/06_sft_training.ipynbarrow-up-right walks through hosted SFT with SFTTrainingConfig.

Docs

The content-learning agent examplesarrow-up-right SFT section now uses the Lightning Rod training API instead of a raw Tinker-only loop.

v0.1.21 — April 2026

New: KeyDeduplication

Remove near-duplicate questions from your pipeline with exact or fuzzy field matching. Runs after question generation, before labeling. Pass KeyDeduplication() to QuestionPipeline(deduplication=...) to enable.

Default behavior matches on question_text (90% similarity) and date_close (exact). Customize with KeyMatchConfig to control which fields are compared and their similarity thresholds.

See Deduplication.

v0.1.19 — April 2026

New: ContinuousValueOnlyAnswerType

A new answer type for questions that expect a single scalar point estimate (e.g. 42.5) rather than a full {mean, stddev} distribution. Scored via CONTINUOUS_VALUE_ONLY_LOG_SCORE. Use ContinuousAnswerType when uncertainty-aware predictions are needed; use ContinuousValueOnlyAnswerType when you want a single number.

See Answer Types.

New: CsvSeedGenerator

Generate seeds from a CSV file uploaded via lr.files.upload(). Each row becomes a seed. Configure which column maps to seed text, labels, and dates.

See Seed Generators.

New: TopicTreeSeedGenerator

Generate diverse seeds by recursively decomposing broad topics into specific subtopics. An LLM breaks each root topic into tree_degree subtopics, repeated tree_depth levels deep. Produces tree_degree^tree_depth seeds per root topic — useful for synthetic data generation without a news or document source.

See Seed Generators.

New: FileSetDocumentContextGenerator

A new context generator that resolves a single document by temporal ordering, downloads its full text, and appends it as context. Supports optional LLM processing before injection and a character limit. Use this instead of QdrantContextGenerator when you want the complete text of one specific document rather than RAG chunks from multiple documents.

See Labeling and Context.

New: FileSetDocumentLabeler

A new labeler that resolves a single document by temporal ordering and uses an LLM to extract a structured label from its full text. Use this instead of QdrantRAGLabeler when labeling from the complete content of one document (e.g. Federal Reserve Beige Book reports).

See Labeling and Context.

Updated: TemporalConstraint — new values

TemporalConstraint now has five values (previously two). The additions enable single-document resolution:

  • NEXT_DOCUMENT — first document after the seed timestamp

  • PREVIOUS_DOCUMENT — most recent document before the seed timestamp

  • EQUAL — document with an exact matching date

These are primarily used with FileSetDocumentContextGenerator and FileSetDocumentLabeler.

See Labeling and Context.

Updated: Multi-model evals and intermediate checkpoint access

The evals API has been updated to accept a list[EvalModel] instead of a single model_id, enabling multiple models to be evaluated in a single job. The new EvalModel class accepts a model_id and an optional label for display.

Training jobs now expose model_id_by_step — a dict mapping training step numbers to intermediate checkpoint model IDs, enabling evaluation of checkpoints before the final model.

Before:

After:

See Evaluation.

New: Example builder utilities

Three helper functions for building formatted question example strings to pass as examples / bad_examples in question generators:

  • binary_example(question, comment=None)

  • continuous_example(question, comment=None)

  • multiple_choice_example(question, options, label=None, comment=None)

See Answer Types — Example Builder Utilities.

Last updated