Introduction

The Lightning Rod SDK provides a simple, but powerful end-to-end API for generating custom synthetic datasets and fine-tuning LLMs.
Transform news articles, documents, and other real-world data into high-quality training samples automatically.
How It Works
The SDK follows a pipeline-based workflow:
Seeds — Raw data from news articles, documents, or custom sources
Questions — AI-generated forecasting questions from seeds
Context — Optional enrichment with relevant news or RAG-retrieved documents
Labels — Ground truth answers resolved via web search
Dataset — Training-ready samples in a format you can use immediately
Train — Fine-tune models on your datasets (early access)
Eval — Run evals against the test dataset (early access)
Inference — Run predictions with your fine-tuned model (early access)
You configure a pipeline, run it, and receive a labeled dataset. No manual question writing or labeling required. With early access, you can also train, evaluate, and run inference with models end-to-end.
Research Foundation
Lightning Rod is based on our research: Future-as-Label: Scalable Supervision from Real-World Outcomes. We use this approach to generate the Future-as-Label training dataset for our paper.
What's Next
Quickstart — Install the SDK and generate your first dataset in minutes
Examples — Run tutorials and end-to-end notebooks in Google Colab
Dataset Generation — Deep dive into pipelines, seed generators, and question types
Forecasting — Get probability estimates with foresight-v3 forecasting model
Fine Tuning — Fine-tune models on your generated datasets (early access)
Last updated
