boltIntroduction

Fully Automated. Zero Manual Labels. — Configure, Generate Training Data, Use

The Lightning Rod SDK provides a simple, but powerful end-to-end API for generating custom synthetic datasets and fine-tuning LLMs.

Transform news articles, documents, and other real-world data into high-quality training samples automatically.

How It Works

The SDK follows a pipeline-based workflow:

  1. Seeds — Raw data from news articles, documents, or custom sources

  2. Questions — AI-generated forecasting questions from seeds

  3. Context — Optional enrichment with relevant news or RAG-retrieved documents

  4. Labels — Ground truth answers resolved via web search

  5. Dataset — Training-ready samples in a format you can use immediately

  6. Train — Fine-tune models on your datasets (early access)

  7. Eval — Run evals against the test dataset (early access)

  8. Inference — Run predictions with your fine-tuned model (early access)

You configure a pipeline, run it, and receive a labeled dataset. No manual question writing or labeling required. With early access, you can also train, evaluate, and run inference with models end-to-end.

Research Foundation

Lightning Rod is based on our research: Future-as-Label: Scalable Supervision from Real-World Outcomesarrow-up-right. We use this approach to generate the Future-as-Label training datasetarrow-up-right for our paper.

What's Next

  • Quickstart — Install the SDK and generate your first dataset in minutes

  • Examples — Run tutorials and end-to-end notebooks in Google Colab

  • Dataset Generation — Deep dive into pipelines, seed generators, and question types

  • Forecasting — Get probability estimates with foresight-v3 forecasting model

  • Fine Tuning — Fine-tune models on your generated datasets (early access)

Last updated