Newsjunkie

Why the Wayback Machine matters

Gordon Whiting on AI, news publishers, public archives, and the fight to preserve the journalistic record

Peter Landau interviews Gordon Whiting/Jun 12, 2026

Newsjunkie publisher Gordon Whiting noted in his last newsletter that major news publishers, including The New York Times, The Atlantic, and USA Today Co., have begun blocking the Internet Archive’s Wayback Machine from archiving their stories. The stated reason is concern over AI scraping, but the move threatens the public record and weakens one of journalism’s most important accountability tools. Newsjunkie managing editor Peter Landau spoke with Whiting about the dispute, why the internet should be treated as a public commons, and what publishers should do instead.

You write that publishers are wounding the Internet Archive to keep AI at bay, or to jockey for a better deal. What do you think is really driving these blocks: fear of AI, business strategy, or something else?

I don’t know precisely, but there’s a lot of money at play. Let’s just take this: Jeff Bezos owns The Washington Post. He would certainly be a person you would assume knows all the dimensions of the matter at hand.

Thinking along those lines, I would say publishers are probably trying to engineer some sort of rights and payment-for-rights deal. I don’t know exactly what it is, but it has to be something like that.

What they’re doing through legal maneuvers with the Internet Archive and the Wayback Machine is closing off the back door. The less leaky you make it, the more power you have at the table when you’re negotiating with the other side.

You acknowledge that publishers have a legitimate concern regarding AI companies scraping their work. Where do you think their argument stops making sense?

They’re acting unilaterally. There are certainly ways to cooperate here, especially for the purposes of the Internet Archive, which serves a public good.

It doesn’t make sense for publishers to simply say, “You can’t do that anymore because that’s our stuff.” It’s disingenuous.

Why is the Wayback Machine different from a commercial AI scraper? What do you think publishers are missing when they treat preservation and extraction as the same thing?

That is a really good question because, functionally, they are exactly the same. But it’s really about intention.

We have long-standing laws about how libraries can make copyrighted information available to the public. A library can lend a book without making further compensation to the publisher after buying the book.

There are also fair use laws. You can make quotations within another document under fair use. So we’ve gotten along with statutes, policies, and regulation. We just need to bring those principles forward, because the walls of the building have shifted. The playing field has changed, whatever metaphor you want to use.

That’s what Congress is there for: to adapt us into the modern age. Each age is the modern age. That’s the job at hand.

Speaking of the modern age, you describe the internet as a commons. What does that mean in practical terms? How should that idea shape the responsibility of publishers, archives, and platforms?

Not everything is common. That’s the first thing to understand.

But it’s not hard to make the case that the internet was built with public money and a great amount of cooperation — not just American cooperation, but cooperation from interested parties, governments, organizations, universities, and individuals all over the world.

A railroad, originally, was not like that. There are other things that are done privately, where you make the investment, you own the system, and this is not that.

First, we need to grasp the definition of those different types of distribution, storage, and access systems. But this is a commons.

A corporation or private entity is entitled to use it. However, you can’t set all the rules.

You could go in a lot of directions and say, “Yes, but it’s illegal to use somebody else’s material.” Yes, that’s true. In this case, publishers are telling the Wayback Machine, “Don’t do this anymore.”

But it’s kind of selective prosecution, if you will. There are a lot of ways that The New York Times or The Washington Post material gets out there. By hitting the Internet Archive and the Wayback Machine, they’re causing a lot of harm when maybe sitting down at the table would be a better solution.

A lot of people might think that if The New York Times wants to control its own archives, why shouldn’t it? What’s your answer to that?

They should control their own archives. But we have copyright laws. Things go into the public domain. Eventually, an archive may still be held digitally or even on paper by a company or institution, but at a certain point, the public is entitled to use it without restriction after copyright expires.

That’s another thing to understand: it’s not forever. Even though the company, or the inheritors of that company, paid to have it made, after a period of time, under copyright law, it becomes public.

So The New York Times and others are entitled to protect their work. But nobody forced them to use the internet.

They didn’t use the internet very much in the ’90s. It was very selective. Why did publishers not use it, even though it was there? Well into the early 2000s, they were very wary about using the internet because they found it too leaky in terms of getting money out of it.

Now it’s their main channel. But nobody forced them. They use it ubiquitously, and now they want to set policies and rules that change the nature of an open-access system.

How do journalists themselves rely on the Wayback Machine in ways the public might not realize? Can you talk about why archived pages matter for verification, accountability, and corrections to the public record?

This is a really interesting one.

Digital isn’t paper, obviously, but it’s something we’ve all experienced. You write a paper in high school or college and turn it in, but then the deadline hasn’t passed, so you go back and make a change. It’s easy to do.

That’s a simple example, but there are a million ways words get changed, paragraphs get changed, things get taken out, and things get put in after publication — sometimes long after publication.

This happens all the time. In journalism, it’s a very dangerous thing. When something makes it into print, there will be a print copy in a library or archive someplace, but not necessarily digitally.

That’s why journalists use the Internet Archive for accountability. You can see a version that maybe is no longer there, or a statement from somebody that has changed. What was the original?

It could be that the original was wrong and a correction was made. But there’s also the possibility that a change was made to hide something, cover something up, or alter the political message.

The Internet Archive and the Wayback Machine are there to capture that. With paper, the problem is not of the same proportion. With digital, everything is very ephemeral. A snapshot freezes documentation at that moment.

In the newsletter, you connect this fight to ebook pricing and library access. What’s the common thread between blocking web archives and restrictive ebook licensing?

Corporate copyright hegemony is sort of what we’re talking about here: creeping restrictions, using technology to encase materials and create stronger gatekeeping, so that you have to pay and pay again.

That’s what happened with ebooks. Some publishers will not publish a title in print; it’s only available as an ebook. And the ebook is not something you own.

When a library bought a physical book, it owned that book and could loan it out as much as it wanted and make it accessible as much as it wanted. But with an ebook, the rights-holders set the parameters. After a while, the license expires — sometimes after only a year — and the library has to re-up.

It’s a way of forcing more money into these companies. That’s the parallel.

Copyrights — whether for music, film, books, artwork, or photographs — really become something when they mass. If you have 10 films in your catalog, that isn’t really much of a catalog. But if you’re Warner Bros., which is in play right now, you have thousands and thousands of movie titles. That mass means something.

If you can get an extra few cents per use, that’s a lot of money. That’s the thinking. That’s the strategy.

It’s not wrong-headed to make a profit. But there’s a public good involved here.

Disney famously fought to extend copyright as long as possible. Eventually, they couldn’t get it extended any longer, and Steamboat Willie, the original Mickey Mouse cartoon, went into the public domain last year or the year before. Versions of Mickey Mouse are now free to use, and they didn’t like that. They still don’t like that, I guess.

They don’t like the idea that even after 100 years, they have to give up revenue. And control of the market, too. That’s very important to them.

Do you see this as part of a larger pattern of public knowledge being privatized or enclosed? How does this relate to the work Newsjunkie has been doing around disappearing data, libraries, and public records?

I don’t quite know how to say that. There is an attempt to control the flow of it more, but it’s ironic because the internet is this open-access, wide-use system where everybody is a publisher, reader, or consumer. There’s this amazing flow of information, and yet we’re seeing attempts to corral and control it.

There are also attempts at disinformation, deception, suppression, and distortion. Let’s just put it in those terms: the distortion of information.

But in terms of corralling, limiting, and privatizing — yes. I’d also like to hear someone else’s insight into what this actually is. It’s not exactly walling off because the material is still available. If you want to go to the dark web, you can get anything.

So are they reducing convenience and therefore creating the ability to charge more? I don’t know. It’s a constant conundrum about this system that we love and hate: the internet.

What would a responsible compromise between publishers and the Internet Archive look like? Are there protections against AI scraping that would preserve journalism without breaking the public record?

Yes. They could sit down. The Internet Archive is a responsible actor. They could limit abusive scraping, and they already do. If they see inbound links coming in and see something combing through everything, they’ll limit it. They can, and they do.

The idea that it’s somehow open season for AI through the Wayback Machine — there may be problems, I don’t know what they are, but what publishers are saying is not right. That is just a pretext.

They can sit down. Maybe a news story is not available immediately. That’s generally true now, but maybe the archive waits longer before it’s available through the Wayback Machine — a week, 10 days, three weeks.

But to say never? That’s wrong. It should be there. It really is part of history. Come on.

In the newsletter, you call on the knowledge sector to speak out. Who exactly needs to be louder right now: journalists, librarians, researchers, funders, universities?

Journalists are there to have a voice, but all of us should be louder as a chorus. Let’s get together.

Drop in at Prairie Fire at Newsjunkie, sign up for the newsletter, and stay up on this. There are developments going on throughout the knowledge sector, whether it’s research and science, voter activism, protection of voting access, or pure and applied science.

The attacks have been both pure and applied. What’s the difference? Theoretical physics is theoretical. You’re not doing fieldwork on theoretical physics. But sampling water in Flint, Michigan, is applied. You have scientists working on that.

Water and air issues are being worked on by organizations all over the world, both governmental and nongovernmental. There are also soil and agriculture issues.

There are people you wouldn’t think are of the same stripe, but they are because of this matter of access to information and threats to that access — threats to funding, threats to the preservation of science, and threats to further science.

We’re all in this together. Let’s be a loud chorus challenging these attacks.

You may have already answered my next question, but outside of archivists or journalists, why should people care if a newspaper blocks the Wayback Machine?

I’m sure we could find a cute metaphor. How would you feel if half the streets in your neighborhood were suddenly closed to you, but other people could still drive on them? Or if the magazine and history sections of your library were closed off unless you had a special access pass?

But I don’t think we need that kind of cute metaphor. The real question is: Why, in the internet age, should we settle for less access to knowledge?

Finally, if you could say one thing directly to major publishers blocking the Wayback Machine, what would it be?

It’s in your interest to work with the Wayback Machine.

You want your information to be talked about, read, listened to, and viewed. You want that. Now, you need to make a profit. We all understand that. But don’t block it. If you do, you’re walling yourself off.

What was the Edgar Allan Poe story? The Masque of the Red Death. They tried to wall themselves off. It doesn’t work. You can’t really do it.

Remember the Soviet Union. It doesn’t work. Don’t be the Soviet Union.

Sign the petition to support the Wayback Machine. Go to Fight for the Future.

Source

Newsjunkie interview, June 10, 2026

Why the Wayback Machine matters

Source

Related Links

The New York Times

by Damon Gitelman

The Atlantic

By Newsjunkie Staff

USA Today

By Newsjunkie Staff

The Internet Archive

by Andrew Checchia

Washington Post

By Newsjunkie Staff

Warner Bros. Discovery

By Newsjunkie Staff

The Walt Disney Company

By Newsjunkie Staff

Jeff Bezos

By Newsjunkie Staff