Data Methodology
Last reviewed May 2026 — About S.I.R.
What we track
S.I.R. monitors legislation directly relevant to seven regulated industries across all 50 U.S. states: e-liquid / vapor products, hemp / CBD, functional mushrooms, recreational marijuana, medicinal marijuana, kratom, and peptides.
We track bills at the state level only. Federal legislation and municipal ordinances are outside current scope. Active legislative sessions are prioritized; bills from prior sessions are retained but may not receive ongoing enrichment.
Data sources
Bill data comes from two external APIs, both free-tier:
- LegiScanPrimary source for e-liquid, hemp, mushrooms, recreational marijuana, and medicinal marijuana. LegiScan provides bill text, sponsors, vote records, committee assignments, and action history. We operate within the LegiScan free-tier API quota (~30,000 requests/month). Bill text is fetched on initial discovery and periodically refreshed when status changes.
- Open StatesSupplementary source for kratom and peptides. Open States aggregates legislative data from official state APIs and websites. We stay within the free-tier rate limit (250 requests/day, 10/minute). Open States coverage is strong for most states but can lag official publication by 24–48 hours during session peaks.
Both APIs are third-party aggregators — their data is derived from official state legislature websites, but may lag official publication by hours to days. S.I.R. does not scrape official state websites directly.
Bill inclusion rules
Not every bill touching a relevant keyword enters the platform. Inclusion is governed by a two-pass keyword filter:
- Domain match — the bill title or description must contain a term clearly associated with the industry (e.g. vapor, kratom, mitragyna, cannabis).
- Legislative relevance — a secondary filter removes bills that only mention the industry incidentally (e.g. broad budget bills that list vapor tax revenue as one of dozens of line items).
Bills that pass both filters are fetched, stored in our repository, and made available on the platform. Bills that fail either filter are ignored; no partial records are created.
Bills are deduplicated before storage. If LegiScan and Open States return the same bill, only one record is kept. The source of record is preserved in the bill's metadata.
Bill status classification
Each bill is assigned one of seven normalized statuses regardless of how the source API labels it:
| Status | Meaning |
|---|---|
| introduced | Filed; not yet assigned to committee |
| in_committee | Referred to or actively in committee |
| passed_one_chamber | Passed House or Senate, awaiting the other |
| passed_both_chambers | Passed both chambers, awaiting governor |
| signed | Signed into law by the governor |
| vetoed | Vetoed by the governor |
| dead | Failed, tabled, or session ended without passage |
Status is mapped from the source API's progress codes at ingest time. Ambiguous statuses default to in_committee. Status changes detected between pipeline runs trigger bill alert emails for subscribed users.
AI enrichment
Bill text is often long, technical, and written in legislative language. To make bills more accessible, S.I.R. runs an optional AI enrichment pass after each pipeline run using Google Gemini 2.5 Flash-Lite (free tier).
AI enrichment produces:
- A plain-English summary of the bill's intent and key provisions
- An industry impact classification (positive / negative / neutral / mixed)
- A brief rationale for the impact classification
What AI does not do:
- AI does not determine whether a bill is legally binding or enforceable
- AI does not provide legal advice or compliance guidance
- AI-generated summaries may contain errors, omissions, or mischaracterizations
- Thin bills (no retrievable text) receive no AI enrichment and are flagged as such
AI enrichment is a convenience layer, not an authoritative interpretation. Always verify against the official bill text before acting on any S.I.R. summary.
Enrichment runs on a 12-second delay between bills to stay within free-tier rate limits. Not all bills are enriched on every pipeline run — priority is given to newly introduced and recently status-changed bills.
Human overrides
S.I.R. administrators can manually override any AI-generated field (summary, impact classification, impact rationale) or bill status for any bill. Overrides are stored separately from the pipeline data and take precedence over automated values.
Overrides are used to correct factual errors in AI output, update bills whose source-API status has not yet been refreshed, or flag bills that slipped through keyword filters but are clearly relevant.
Bills with active overrides display the corrected values. The override itself is not surfaced to end users — only the corrected data is shown.
Update schedule
The data pipeline runs automatically once per day at 11:00 AM UTC (7 AM ET / 6 AM CT) via GitHub Actions. Each run:
- Fetches new and updated bills from LegiScan and Open States
- Merges, deduplicates, and normalizes bill records
- Applies AI enrichment to unenriched or newly-changed bills
- Updates state overview summaries and heatmap data
- Rebuilds RSS feeds
- Commits updated JSON to the repository; triggers a site redeploy
The pipeline run timestamp and bill count are embedded in every deployed build (visible on bill detail pages as “Last pipeline run”). If the pipeline fails, the previously deployed data remains live and the team is alerted.
Manual pipeline runs can be triggered via GitHub Actions workflow dispatch for urgent updates between scheduled runs.
Reporting errors
If you find a bill that is missing, miscategorized, or contains an incorrect AI summary, use the feedback button on any bill detail page or contact us through the About page. Corrections are reviewed by a human and applied as overrides within one business day.
For questions about our data practices or privacy, see our Privacy Policy and Terms of Service.