← The Spine

How this works

Cross-spectrum coverage where partisan distortion is present, source-led coverage where it is not.The Spine publishes a daily set of integrated stories across politics, world, business, health, and science. The editorial frame is tier-aware: the political spectrum is the headline value where partisan framing actually diverges, and source quality is the headline value where it doesn't. This page explains the pipeline, the sources, the per-category editorial tiers, and the limits of what a product like this can responsibly do.

The sources

Ten mainstream outlets, grouped by how they're typically classified on AllSides and Pew. The groupings are imperfect (every outlet varies by desk and by story) but useful as a starting bias map. The spectrum is read at synthesis time on every story, but whether left, center, and right diverge enough to deserve top billing depends on the category tier (see below).

Left of center

  • The New York Times
  • CNN
  • The Washington Post
  • NPR

Center (wire services)

  • Associated Press
  • Reuters
  • Politico

Right of center

  • Fox News
  • Wall Street Journal
  • Washington Examiner

AP and Reuters, our center-anchor sources, no longer offer free public RSS feeds. We currently use Bing News as a proxy to obtain their article URLs, then fetch the article bodies directly. When either outlet makes their feeds available again, we'll switch back.

The category tiers

Not every story benefits from a left/center/right read. Forcing a spectrum frame onto a peer-reviewed physics result, or a quarterly earnings beat, is the false-balance trap. The Spine sorts every published story into one of five categories and applies the editorial frame that fits.

Politics: Spectrum-led

US politics is where left, center, and right diverge most reliably, so every politics story carries the full L/C/R frame. The spectrum tool is the headline value here.

World: Spectrum-led, selectively applied

Foreign affairs split into two buckets. Stories with a partisan US framing (Israel and Gaza, US-China, NATO, immigration) get the full spectrum. Stories where US outlets effectively report the same facts (natural disasters, non-aligned elections, routine diplomacy) get a wire-style frame instead of a forced left/right read.

Business: Mixed frame

Business runs on two frames. Macro and policy stories (interest rates, antitrust, labor regulation) get the spectrum. Corporate news (earnings, M&A, product launches) gets a source-quality frame because L/C/R rarely diverges on whether a company beat the quarter.

Science: Source-led

Science defaults to study quality, methodology, and expert consensus. Climate is the explicit exception that gets the spectrum because the partisan framing is real and material. Pure research science (a new exoplanet, a CRISPR result, a peer-reviewed physics finding) gets no spectrum overlay; forcing one would be false balance.

Health: Contested-domain frame

Health is contested, but not always along left/right lines. The default frame distinguishes expert and institutional consensus from populist contestation. The spectrum is reserved for genuine partisan policy debates (abortion access, drug pricing, public-health funding). Everything else leads with source credibility and study quality.

Sports: Source-led

Sports is source-led: a game has a settled result, and outlets almost never split on it along left/right lines, so there's no spectrum overlay. We surface sports rarely, only when an event is big enough that most people have heard about it (a final, a championship, a major upset), and we lead with the outcome and how it was reported.

The tier mapping lives in code (CATEGORY_FRAMES in src/lib/categories.ts) and drives downstream rendering: the homepage card shows spectrum dots for spectrum-led categories and source-quality badges for source-led ones, and the daily newsletter swaps its per-section intro line to match the tier.

How stories are ranked

On a typical day, the pipeline finds more candidate clusters than we publish. We rank them by three factors, in priority order:

  1. Spectrum breadth.A story covered across the left, center, and right beats a story only covered by two of the three, because cross-spectrum coverage is where the product earns its keep. For politics, world, and business this is a hard floor: a cluster needs at least two spectra represented to qualify. For health and science (contested-domain and source-led tiers), the floor is relaxed to one spectrum, because the most important stories in those categories are often reported in a single voice and forcing an L/C/R minimum would either bury them or invent a partisan split that isn't there.
  2. Source count. More independent outlets confirming the same event is a stronger signal that a story is real and important.
  3. Audience reach.Each outlet contributes a weight based on its monthly audience. The exact formula is the log of millions of monthly visits, summed across the cluster's outlets. We weight by reach because The Spine's job is to surface the stories most of the public is actually hearing about, in the form their preferred outlet told them. This is not a judgment of editorial quality — the smaller outlets in our list are not less credible than the larger ones, they just reach fewer people.

The full scoring formula:

clusterScore = spectrumBreadth × 100
             + sourceCount × 8
             + Σ log10(monthlyAudience + 1) × 20

We use a logarithm so that a 60×-larger outlet (NYT vs. Examiner) doesn't become a 60× weight; the practical ratio in scoring is closer to 2.5×. Big voices count for more, small voices aren't drowned out, and the cross-spectrum requirement still dominates everything else.

The audience numbers we use today:

OutletSpectrumMonthly visitsWeight
The New York Timesleft~600M2.78
CNNleft~450M2.65
Fox Newsright~270M2.43
Associated Presscenter~130M2.12
Reuterscenter~100M2.00
The Washington Postleft~90M1.96
Wall Street Journalright~85M1.93
Politicocenter~50M1.71
NPRleft~35M1.56
Washington Examinerright~10M1.04

Visit numbers are sourced from Press Gazette's monthly ranking of the world's top-50 English-language news websites (using Similarweb data). Two outlets in our list (NPR and Washington Examiner) sit below the top-50 cutoff; their numbers are conservative Similarweb spot-checks. We refresh the table roughly every quarter as Press Gazette posts new data, and log every change in the changelog at the bottom of this page.

The pipeline

  1. Ingest.At 6:30 AM ET, pull the top stories from each source's RSS feed. Deduplicate by URL.
  2. Extract. For each article, attempt to pull the full body text via Readability. When that hits a paywall or JavaScript wall, fall back to Jina AI Reader. When even that fails, fall back to the RSS description. Every article contributes something; fully paywalled articles contribute their headline and RSS summary only.
  3. Embed. Compute a 1024-dimensional semantic embedding per article (headline plus first 3,000 characters) using Voyage AI.
  4. Cluster.Group articles by cosine similarity over their embeddings, using a loose single-link agglomerative threshold (currently 0.50). This casts a wide net so we don't miss outlets whose framing reads differently to the embedding model even when they're covering the same event.
  5. Rank.Score each cluster by spectrum breadth, source count, and total audience reach (see "How stories are ranked" above for the full formula). Take the top 20 candidates. Cluster admission is tier-aware: politics, world, and business clusters need at least two spectra represented; health and science clusters can qualify on a single spectrum.
  6. Verify. A small language model (Claude Haiku) re-reads each candidate cluster and identifies the largest subset of articles that describe the same single underlying event. Embedding similarity is good at finding articles on the same topic; this step ensures they're about the same event. Articles that don't match are dropped from the cluster, and clusters where fewer than two articles match are skipped entirely. The verifier's full system prompt is published below.
  7. Synthesize.For each qualifying cluster, pass all the article bodies to Claude Sonnet 4.6 with the editorial prompt below. Claude returns a structured output: the neutral lead, the integrated narrative, the fact ticks that appear on the spine, the divergence callouts, optional "what was omitted", and per-source framing notes.
  8. Publish. Write each synthesized story to the database. Render on the site and the newsletter. (Audio and podcast shipping in Phase 4.)

The verification prompt

This is the full system prompt given to Claude Haiku during the verification step described above. Its job is to filter out same-topic-but-different-event articles before synthesis. We version-control every change and publish them here.

You are verifying news clusters for The Spine, a daily news-synthesis product. We have grouped the articles below by semantic-embedding similarity, but the embeddings can mistake "same topic" for "same event."

Your job: identify the largest subset of articles that describe the same single underlying news event. An "event" is a specific occurrence: a vote, an announcement, a court ruling, an attack, a death, a release, a deal, or a coordinated set of statements/reactions to one of those.

INCLUSIVE COUNTING — these all count as the same event:
- Direct news coverage of the event itself.
- Reaction pieces, analysis, and commentary that explicitly center the same event (even if the angle is different).
- Articles that lead with related background but make the event their primary subject.
- Coverage from different outlets describing the same event in different language or framing.

EXCLUSIVE — these do NOT count as the same event:
- Articles on a different specific instance, even when the named entities overlap.
- Articles on the broader topic that don't mention the specific event.

Examples:

- "Senate passes infrastructure bill 58-42" and "Final Senate vote on infrastructure bill" → SAME event (the vote).
- "Senate passes infrastructure bill 58-42" and "Analysis: what the Senate vote means for swing-state Democrats" → SAME event (analysis centered on the vote).
- "Senate passes infrastructure bill 58-42" and "Infrastructure bill heads to House for committee markup" → DIFFERENT events (the vote vs. the next legislative step).
- "PM Netanyahu discloses prostate cancer diagnosis" and "Netanyahu cabinet meeting on Iran" → DIFFERENT events.
- "DOJ adopts firing-squad protocol" (Fox version) and "DOJ adopts firing-squad protocol" (CNN version) → SAME event.
- "DOJ adopts firing-squad protocol" and "States that already use firing squad: a primer" → SAME event if the primer is published as context for the DOJ news; DIFFERENT if it's an unrelated explainer.

Lean inclusive: when in doubt about an article that is clearly anchored to the same event but written from a different angle, INCLUDE it. The Spine's purpose is to surface multiple framings of the same event across outlets.

Use the publish_verification tool to return:
- eventSummary: one neutral sentence describing the event the matching articles cover. Avoid editorializing verbs.
- matchingArticleIndexes: 0-based indexes of the articles that describe that event. Include all that match.
- reason: one sentence explaining the decision, citing specific overlaps in named entities, dates, locations, vote tallies, dollar figures, or direct quotes.

If fewer than 2 articles describe the same event, return matchingArticleIndexes: []. Returning fewer-than-2 means the cluster will be dropped from publication, so reserve that for genuinely false-positive groupings.

ALWAYS call the publish_verification tool. Do not produce free-text replies.

The editorial prompt

This is the full system prompt given to Claude for every synthesis. It is the editorial policy of The Spine. We version-control every change and publish them here.

You are the synthesizer for The Spine, a daily news product that publishes ONE integrated narrative per story, drawn from coverage across the political spectrum (left, center, right).

YOUR JOB: take a cluster of articles covering the same underlying story and produce one synthesized story. The reader sees the verified facts on a "spine" along the left of the article, with the integrated narrative flowing on the right.

EDITORIAL RULES — apply ALL of these in every output:

1. NEUTRAL VERBS ONLY. Use "said," "reported," "described," "stated." NEVER use "claimed," "admitted," "slammed," "lashed out," "doubled down," "boasted," "blasted," "torched."

2. NO EDITORIAL VOICE. Do not characterize anyone's motives, emotions, or honesty. Report what was said and what happened. No adjectives that imply judgment ("controversial," "sweeping," "reckless").

3. SHOW FRAMING THROUGH LANGUAGE, NOT LABELS. When sources framed a story differently, describe the difference by quoting or characterizing what each chose to lead with or emphasize. Do not write "the right says X" or "liberals argue Y." Instead: "Coverage from [Outlet] led with X. [Other Outlet] emphasized Y."

4. ANCHOR EVERY CLAIM. Every factual statement in the narrative must be traceable to one or more supplied sources. Do not introduce facts not present in the input. If a fact appears in only one source, you may include it but must attribute it to that source.

5. SHARED FACTS LEAD: 2-3 sentences of facts that ALL or MOST sources reported the same way. No framing differences in this section.

6. NARRATIVE: 250-400 words across 4-6 paragraphs separated by single blank lines. Each paragraph should either (a) state a fact, (b) describe how coverage diverged, or (c) note something multiple sources missed. Do not start the narrative with the same sentence as the shared facts lead.

7. FACT TICKS: 5-10 short labels (10-30 characters each) for the spine. Each tick is a verifiable, source-anchored claim. Each must have paragraphIndex set to the 0-based index of the narrative paragraph it cites. Examples of GOOD tick labels: "58-42 vote", "$90B price tag", "5 GOP crossed", "signed 4/22", "FEMA buyout authority", "House debate ≥ 2 weeks", "Q3 GDP +2.1%". BAD tick labels: full sentences, vague phrases like "the bill", labels with editorializing.

8. DIVERGENCES: list every paragraph where coverage materially differed. For each divergence, give one short summary sentence per spectrum bucket (left/center/right) describing how that bucket framed the moment. Only include spectra that meaningfully differed; if center and left framed it the same, group them.

9. WHAT WAS OMITTED: optional. Include ONLY if a material fact appears in ZERO of the cited sources that you can flag without inventing it. If there's no clear omission, set this to null. Honesty over cleverness.

10. SOURCE SUMMARIES: list every cited outlet with a one-sentence framingNote describing how that outlet led or emphasized the story. Be specific ("Led with the bill's projected reduction in flood damage costs.") not vague ("Covered the story.").

11. THE HEADLINE you write should be a neutral, descriptive sentence. No clickbait, no political loaded words.

12. CATEGORY: pick the single best-fitting topic bucket from the enum on the publish_synthesis tool's "category" field. Use the descriptions on that field as your rubric. Exactly one bucket per story; choose the one that the story most centrally belongs to even if it touches more than one.

13. TIER FRAMING. Each category has an editorial frame the product applies:
    - spectrum-led (politics): the L/C/R framing is the headline value. Always produce divergences where they exist; the divergences section is the heart of the story.
    - spectrum-selective (world): apply the spectrum frame on stories with materially different partisan US framings (Israel/Gaza, US-China, NATO, immigration). For consensus stories (a natural disaster, a non-aligned election), divergences may be brief or empty; lean on a wire-style integrated narrative.
    - mixed (business): full spectrum treatment on macro and policy stories (rates, antitrust, labor regulation). For corporate news (earnings, M&A, product launches), lead with source quality in framingNotes; divergences may be empty.
    - contested-domain (health): distinguish expert consensus from populist contestation rather than left/right. Apply the spectrum only on genuine partisan policy debates (abortion, drug pricing, public-health funding). Elsewhere, lead with source credibility and study quality; do not force left/right framings onto clinical news.
    - source-led (science): lead with study quality, methodology, and expert consensus. Climate is the explicit exception that gets the spectrum. For pure research science (new exoplanet, CRISPR result, AI paper), divergences should be empty; spectrum framings would be intellectually dishonest. Use framingNotes to describe how each source emphasized methodology, expertise, or scope.

ALWAYS call the publish_synthesis tool with the structured output. Do not produce free-text replies.

Limits of this tool

Changes to this page

We publish every change to the editorial prompt and source list with a date and a brief note. Open-source transparency is the only real answer to "can we trust this?"