Newsjunkie

Schema.org

The Shared Vocabulary for Structured Data on the Web

Jun 4, 2026

Online consortium · Governed via W3C Schema.org Community Group · Founded by Google, Microsoft, Yahoo, and Yandex

Overview

Schema.org is the internet's shared vocabulary for structured data — a collaboratively maintained set of schemas (types and their properties) that web publishers embed in their pages to tell search engines, AI systems, and other applications not just what a page says, but what it is. Launched on June 2, 2011, by the three companies that operated the world's dominant search engines — Google, Microsoft through Bing, and Yahoo — it was the search industry's collective answer to a decade of fragmentation: multiple competing markup standards (Microformats, RDFa, various proprietary vocabularies) had produced a landscape where most webmasters implemented nothing, and those who did implement markup often did it incorrectly, because each search engine read the standards differently. Schema.org resolved the fragmentation by creating a single vocabulary that all the major search engines agreed to support. Yandex, the dominant Russian search engine, joined the initiative in November 2011.

The vocabulary currently comprises 827 types, 1,528 properties, and 14 data types — covering everything from the obvious (Person, Organization, Product, Event) to the highly specific (ReportageNewsArticle, MusicRecording, ChemicalSubstance, LegislativeBuilding). More than 45 million web domains now use Schema.org markup encoding more than 450 billion objects, making it one of the most widely deployed standards on the internet. Since April 2015, the W3C Schema.org Community Group has been the primary governance forum — a public, open community process through which proposed additions and modifications to the vocabulary are debated, refined, and adopted. The schema is licensed under Creative Commons Attribution-ShareAlike 3.0.

The Problem Schema.org Solved: Before 2011

The problem that Schema.org addressed had been building since the late 1990s. Web pages were written for human eyes: a news article might contain the author's name, the publication date, the headline, and the subject of the report, but all of those facts were embedded in HTML that made them indistinguishable from navigation links, advertisements, footers, and body text. Search engines extracted meaning from web pages through statistical analysis of word patterns — essentially reading tea leaves. This produced approximate results for many common queries but broke down for anything requiring precise factual extraction: who wrote this article, when was it published, what is this event's date and location, what is the price of this product.

Various communities developed their own solutions. Microformats encoded specific data types (contact information, calendar events, book reviews) in HTML class attributes. RDFa allowed arbitrary semantic annotations within HTML markup. Yahoo's SearchMonkey project let developers create custom structured data enrichments. Google developed its own rich snippets vocabulary. The result was that a publisher who wanted to communicate structured data to all the major search engines simultaneously had to implement multiple different vocabularies — or choose one and accept that the others would not benefit from the markup. Most publishers chose to implement nothing, because the effort was not worth the partial benefit.

Schema.org's founding insight was that the benefit to webmasters required universal adoption by search engines, and universal adoption by search engines required a single shared vocabulary rather than competitive proprietary standards. The founding companies' willingness to cooperate on a shared vocabulary — despite competing aggressively on every other dimension of search — reflected this logic. The vocabulary was designed to be extensible and domain-agnostic: starting with a practical core of the most common use cases, then expanding through community-driven additions to cover specialized domains including news publishing.

Governance: The W3C Community Group

Schema.org is governed through a two-tier structure. Day-to-day operations — including decisions about specific schema additions and modifications — are handled by a steering group that includes representatives of the four founding companies, a representative of the W3C (the World Wide Web Consortium, the main international web standards body), and a small number of individuals who have contributed substantially to the schema's development. Steering group discussions are public. The broader community participates through the W3C Schema.org Community Group (which hosts the primary public discussion mailing list) and through GitHub, where the schema's source files are maintained and where issues, proposals, and pull requests are openly tracked. Schema.org releases new versions periodically — the most recent as of the time of writing was version 29.4, released December 8, 2025.

News-Relevant Schema Types

Schema.org includes a dedicated News markup vocabulary designed for news publishers. These types signal to Google News, Google Search, and AI systems the specific character and provenance of journalism content. The most important for news organizations are:

NewsArticle

The primary type for news content — an article whose content reports news or provides background context. Subtype of Article. Recommended for time-sensitive journalism eligible for Google News surfaces. Key properties: headline, author, datePublished, dateModified, image, publisher, description.

ReportageNewsArticle

A stricter subtype of NewsArticle representing journalism that meets the highest standards of factual reporting — verified facts, multiple perspectives, author accountability, minimal opinion. Explicitly distinct from opinion, analysis, sponsored, or satirical content. A credibility signal to search and AI systems.

OpinionNewsArticle

Subtype of NewsArticle for opinion and editorial content — explicitly flagged as the work of a named person expressing a point of view rather than factual reporting. Allows publishers to accurately classify their mixed content portfolio.

LiveBlogPosting

For real-time coverage of breaking news or live events, updated continuously. Supports liveBlogUpdate for individual updates within a liveblog. Eligible for special display treatment in Google Search for live events.

NewsMediaOrganization

Subtype of Organization specifically for news publishers. Includes properties for publishingPrinciples (URL of editorial standards statement), verificationFactCheckingPolicy, correctionsPolicy, diversityPolicy, and ethicsPolicy — allowing publishers to assert their editorial standards in machine-readable form.

Claim / ClaimReview

Specifically for fact-checkers and verification journalism: ClaimReview marks up a page that reviews the accuracy of a claim made elsewhere, with properties for the claim text, who made it, and the reviewer's rating. Enables Google's fact-check labels in search results.

Use Case: A Nonprofit News Organization Implements Schema.org

Sample Use Case — Small Nonprofit Regional Newsroom

Scenario: A nonprofit regional news outlet — call it the Bridgewater Beacon, covering a mid-sized city for about 12,000 readers — wants to improve its visibility in Google News and Google Search, ensure its journalists get proper attribution when their work surfaces in AI answers, and signal its editorial standards and nonprofit status to search engines and potential readers. It publishes a mix of reported news, opinion columns, and a weekly newsletter. Its CMS is WordPress. It has no dedicated developer but a staff editor with moderate technical comfort.

What it implements and why: The Beacon adds three layers of Schema.org markup: organization-level, article-level, and article-subtype differentiation. All markup uses JSON-LD — the format Google explicitly recommends, delivered in a <script type="application/ld+json"> block, decoupled from visible HTML and therefore easier to maintain through a WordPress plugin than inline Microdata.

Layer 1 — Organization (site-wide, in page header): A NewsMediaOrganization block on every page that identifies the Beacon as a news organization, links to its About page, states its nonprofit status, and — critically — links to its publishingPrinciples (the editorial standards page), its correctionsPolicy, and its verificationFactCheckingPolicy. This allows Google to understand the Beacon as a credentialed news publisher rather than a generic website, improving eligibility for Google News indexing and signalling editorial accountability to AI systems that use Schema.org data.

Layer 2 — Article (per story): Every news article page gets a NewsArticle block with headline, author (linked to a Person block with the reporter's name and URL), datePublished, dateModified, image (with ImageObject subtype for dimensions), publisher (pointing back to the org block), and description (the article's summary). The dateModified property is particularly important: it tells Google when a story was updated, which is a signal for re-indexing and for surfacing the most current version of a developing story.

Layer 3 — Subtype differentiation: Reported news stories use ReportageNewsArticle, which tells Google and AI systems that the content meets the standard of fact-based journalism with multiple sources and minimal opinion. Opinion columns by named contributors use OpinionNewsArticle, which correctly classifies that content as perspective rather than reporting. Breaking news liveblogs during city council votes or major local events use LiveBlogPosting.

Result: The Beacon's reported stories become eligible for Google News inclusion and the "Top Stories" carousel in Google Search. The author markup means reporters receive proper attribution in AI-generated summaries and knowledge panels. The NewsMediaOrganization data and publishingPrinciples link give the outlet a documented record of editorial accountability readable by both search engines and AI systems evaluating source credibility. The subtype differentiation — distinguishing reported from opinion content — means the outlet's editorial integrity is not compromised by opaque aggregation of its mixed content portfolio into a single undifferentiated signal.

The JSON-LD: How It Looks in Practice

Below is a simplified example of what the article-level markup for a ReportageNewsArticle looks like in JSON-LD — the format that goes inside a <script type="application/ld+json"> tag in the page's <head>.

Example: ReportageNewsArticle JSON-LD markup

// Placed inside <script type="application/ld+json"> in the page <head>
{
  "@context": "https://schema.org",
  "@type": "ReportageNewsArticle",
  "headline": "City Council Votes to Cut Library Hours Amid Budget Shortfall",
  "description": "Bridgewater's city council voted 5-4 to reduce library hours by 20 percent, citing a $2.3M budget deficit.",
  "datePublished": "2026-05-15T19:30:00-05:00",
  "dateModified": "2026-05-16T08:15:00-05:00",
  "author": {
    "@type": "Person",
    "name": "Maria Solis",
    "url": "https://bridgewaterbeacon.org/author/maria-solis"
  },
  "publisher": {
    "@type": "NewsMediaOrganization",
    "name": "Bridgewater Beacon",
    "url": "https://bridgewaterbeacon.org",
    "publishingPrinciples": "https://bridgewaterbeacon.org/about/editorial-standards",
    "correctionsPolicy": "https://bridgewaterbeacon.org/about/corrections",
    "logo": {
      "@type": "ImageObject",
      "url": "https://bridgewaterbeacon.org/logo-600x60.png",
      "width": 600,
      "height": 60
    }
  },
  "image": {
    "@type": "ImageObject",
    "url": "https://bridgewaterbeacon.org/wp-content/uploads/2026/05/council-vote.jpg",
    "width": 1200,
    "height": 628
  },
  "url": "https://bridgewaterbeacon.org/news/city-council-library-hours-2026",
  "isAccessibleForFree": "True"
}

Schema.org and AI Discoverability

The significance of Schema.org has grown substantially with the rise of AI-powered search and generative AI systems. Google's AI Overviews, Perplexity, Bing Copilot, and other AI systems use Schema.org structured data as signals when evaluating source credibility, attributing content, and assembling summaries. A news organization whose articles carry ReportageNewsArticle markup with named authors, ISO 8601 timestamps, and linked publishingPrinciples is providing AI systems with machine-readable evidence of its editorial character — evidence that is qualitatively different from, and more reliable than, what can be inferred from text analysis alone. Google's John Mueller, in late 2025 guidance on schema priorities for 2026, specifically named Article and NewsArticle as "foundational for publishers and content-rich websites" among a small number of schema types worth maintaining regardless of changes to rich result features.

The isAccessibleForFree property — which publishers can set to "True" to indicate open-access content or use in combination with hasPart and isPartOf to mark paywalled content with free preview sections — is also significant for nonprofit and free-access news organizations: it signals to Google that the content is publicly available, improving eligibility for crawling and indexing without the "first click free" ambiguity that affected earlier paywall implementations.

Access and Participation

Schema.org's vocabulary is freely accessible and browsable at schema.org. The News markup guide is at schema.org/docs/news.html. Google's implementation documentation for Article and NewsArticle structured data — which specifies which Schema.org properties Google supports for rich results in its search products — is at developers.google.com/search/docs/appearance/structured-data/article. The W3C Schema.org Community Group is open to public participation at w3.org/community/schemaorg. The schema's source files and issue tracker are on GitHub at github.com/schemaorg/schemaorg. Rich Results testing tool: search.google.com/test/rich-results.