1.5.2
Newsjunkie.net is a resource guide for journalists. We show who's behind the news, and provide tools to help navigate the modern business of information.
Use of Data1.5.2
1.5.2

There is a sentence I keep returning to from a recent bulletin sent by our friends at the Internet Archive: "The public record is now at risk of disappearing."
Since late 2025, a wave of major news publishers, among them The New York Times, The Atlantic, and USA Today Co., which alone controls hundreds of local newspaper titles, have begun blocking the Wayback Machine's crawlers. The Times implemented what the Wayback Machine director Mark Graham characterized as a "hard block." USA Today's action effectively erased hundreds of local publications from the living historical record in a single decision.
The stated reason is AI. Publishers are concerned that Internet Archive's enormous repository of structured, historical web content could be scraped for free by commercial AI companies to train their models. That concern is not entirely unreasonable—the web's content is being consumed at industrial scale by the AI industry, and publishers are right to resist being exploited without compensation or consent.
Blocking the Wayback Machine is the wrong response, and the people who will suffer the most will be researchers, historians, journalists, and the public, who rely on non-governmental archiving as a fail-safe backup of institutional memory and accountability. At this writing, there is nothing conclusive showing that AI companies are training on the Wayback Machine’s news archives. What is certain is that when a publisher blocks the Internet Archive, a gap in the record occurs.
The Wayback Machine has preserved more than one trillion web pages over nearly 30 years of continuous operation. It is used daily by librarians, scholars, legal teams, and working journalists. It maintains permanent citations for millions of news articles referenced across Wikipedia in 249 languages. Courts have allowed archived pages into evidence. Journalists have used Wayback Machine snapshots to prove official statements have been altered or removed. This tool of holding power to account is being weakened.
The internet is not self-preserving. A 2024 Pew Research Center study found that 38 percent of web pages that existed in 2013 were no longer accessible a decade later. Data drops away casually, without notice. The Wayback Machine has been the corrective to that inherent amnesia—not a perfect one, but the most comprehensive solution we currently have.
The direct predecessor of the internet was the U.S. Department of Defense initiative ARPANET. The systems that underpin the modern internet emerged from federally funded university research. Today, DNS (the domain name system that makes every web address resolvable) is overseen by ICANN, a nonprofit working under a framework negotiated with the U.S. government and other global internet players. SMTP (Simple Mail Transfer Protocol), also developed through ARPANET, is maintained by the IETF (Internet Engineering Task Force). Every email sent anywhere in the world depends on it. Standards bodies like W3C are consensus-driven institutions.
The internet is a commons—built by public investment, maintained through shared governance, and dependent on norms of open access that commercial actors have benefited from enormously. When a publisher blocks a nonprofit archive acting in the community interest from crawling public web pages, it is selectively privatizing something the commons made possible.
Message to publishers: If you don't like the open nature of the internet, don't use it. |
For the community that reads this newsletter—academics, researchers, scientists, archivists, journalists, social activists, and the organizations that fund or house them—this is cannot be an abstract dispute. The Wayback Machine is a foundational piece of the knowledge system. It is the kind of shared resource that must be protected from privatization, from paywalling, and from unfounded attacks on its reputation.
Fight for the Future, an energetic digital rights organization, has launched an open letter and petition calling on news publishers to stop blocking the Internet Archive and commit to working with it to preserve journalism in the Wayback Machine. The petition has already been signed by more than 200 working journalists, including Rachel Maddow. The EFF has added its voice. The argument is straightforward: the Wayback Machine does not engage in irresponsible behavior; it has been an effective partner to journalism for decades. The sensible response to AI fears is to work with the Internet Archive on protections, not to deny preservation outright.
Newsjunkie has worked with the Internet Archive before, to promote awareness of the End-of-Term Harvest campaign that sought to preserve critical federal data as the Trump administration took office. We believe deeply that we must treat archiving as an urgent, present-tense act rather than a technical afterthought. We are asking our readers to do the same now.
Take actionThe Internet Archive and Fight for the Future are asking the public to sign an open letter to news publishers. The letter calls on publishers to stop blocking the Wayback Machine and to commit to working with the Archive to preserve the journalistic record. If you are a researcher who has relied on archived pages. If you are a journalist who has used the Wayback Machine to verify a story. If you are an archivist or librarian who understands what institutional memory costs when it is lost—this letter needs your signature. |
The same Internet Archive bulletin raises a second threat to knowledge access: ebook licensing constraints for public libraries. A coalition of library organizations, including the Public Library Association, the Association for Rural & Small Libraries, the Urban Libraries Council, and others representing public libraries across North America—has issued a joint letter to the publishing industry calling for sustainable ebook pricing models.
Libraries typically pay three times the consumer price for an ebook license, and those licenses expire, meaning the public investment simply vanishes. Nothing is left to lend or preserve. Acquiring ebooks "has become unsustainable, and for many small libraries, impossible," according to Kate Laughlin, executive director of the Association for Rural & Small Libraries.
Unlike print collections owned by libraries, digital collections are leased, and the rights-holders set the terms. The net effect is to shift community control over knowledge channels to corporations. That is a pattern our readers will recognize across the sector right now, and it demands resistance.
Newsjunkie reported on the news archiving problem in The IA problem with AI. Nieman Lab's reporting on how and why the blocking began is here. The Internet Archive's full bulletin on both the Wayback Machine petition and the ebook pricing campaign is available at blog.archive.org.
Across archiving, ebook licensing, and federal research, the same logic is at work: public knowledge is being enclosed, monetized, suppressed, or disappeared. The knowledge sector's job is to call out these violations, resist them, and build the alternatives.
Sign the letter. Tell a colleague. Keep fighting.
— Gordon J. Whiting, Publisher
Who’s behind the news is published by Newsjunkie, an independent news company.
© 2026 Newsjunkie.net