1.5.2
Newsjunkie.net is a resource guide for journalists. We show who's behind the news, and provide tools to help navigate the modern business of information.
Use of Data1.5.2
1.5.2



We've been surveying the journalism and news scene in India closely. The thing that happened there some years ago is now happening here. The consolidation of the media allowed the government to apply pressure via tax or licensing threats, or have a crony buy them out. You’re seeing that here with CBS. The government or an oligarch or an autocrat, to get your boot on the neck and have them conform.
We have this with corporations hoovering up and consolidating. That level of downward control happens naturally within corporate structures. You don't need authoritarian governments, but it certainly is teed up for them. Libraries are funded locally in general. Why don't we have libraries play their normal function of supporting local news, local authors, local musicians, local filmmakers? That’s what the library system has been good at for hundreds of years, and we've lost it. There's not another thing on offer yet.
I think we need a game with many winners, where there aren't central points of control. Take Wikipedia, one of the great marvels of the internet. Why did it have to write those articles? They had to be written from scratch to avoid infringing on held copyrights and allow the articles to be shared. I thought that the judiciaries, especially in the United States, would remember what libraries were for and back up the Internet Archive and libraries. I was wrong.
The United States has shifted to be completely corporate in terms of anti-library and pro-corporation. One-hundred years ago we had the Carnegie libraries. Publishers hated libraries because they lent books. At that time, the legislatures and the judiciaries supported libraries. There was a need to educate people in the United States. But at the end of the 20th century, we saw the fall of antitrust and the rise of dominant global corporations. Europe and Japan didn’t have user-contributed content, social media or a search engine. Why not? Because their publishers wouldn't allow it.
Let me shift gears. I want to talk about the Internet Archive as a repository. You have a facility called Archive-It, and your organization and mine have been talking about and working a bit on end of term (EOT) [a project preserving U.S. federal government websites during presidential transitions for historical record]. You are championing and leading the EOT project. Talk about the Internet Archive as an actual primary repository of original work.
People are using the Internet Archive as a repository digitally, because where else do you put things? You can put them in the cloud, but what if you stop paying? It’s gone. Put it on a hard drive? Hard drives are amazing now. They can store 30 terabytes, which is a lot of material. But try to keep them indexed. How do you do that? There are large-scale platforms, but they blink on and off, or get pressured by somebody to delete what’s on them.
The Internet Archive tries to play the part of a cloud provider role for the open world. If you want to store something for a long time, we can help. I mean, there are certainly things that we can be pressured to take off. We get that too, but we will try to make things available.
Go to archive.org and hit the Upload button. Lots of people add books, music, video, webpages, photographs–whatever they want to share with the next generation. We help them organize it, make it into collections that are easy to look through and use, download whatever. We want to make it so that if you give something to the public it doesn't cost you. In fact, you get a tax deduction and a pat on the back. Thank you for donating to the public.
We have been interviewing scientists who are former employees of the federal government. Their work has been taken offline. The Internet Archive can be a place where research can live and be continued. You can provide some hope at a time when so much scientific information is being lost.
I hope that's happening. Since 2004, the Internet Archive, working with a bunch of other organizations, crawls government websites to take a snapshot at the end of the last and the beginning of the new presidential term. There's always changes, and there's massive changes this time. Libraries are there to try and be repositories of government documents.
But asking a publisher, especially the government, to have their old materials available, is not going to work very well. That's why we need libraries. Unfortunately, the way the World Wide Web works, there's not an easy way to go and say, I want the newer edition of this. If it's not on the same URL, then where are you going to find it, right?
The web was awesome because it was simple. But it's also problematic because of its simplicity. Publishers would make copies and they would end up in libraries or people's hands. The web didn't work that way. There's only one place where things are from, and once those go away, they’re gone.
The Wayback Machine was a kludge to fix this problem. Really, what we want is a more decentralized system the way publishing used to work, where people sold things and they would end up in many libraries. That way, if one library burns down, or a publisher goes out of business or changes priorities, you'd still have a continuity of publications that people can depend on. You can still stand on the shoulders of giants, as Newton said, which is not working very well because of technological issues, compounded by corporate and government problems.
I'm delighted to hear you use the word Kluge. I haven't heard that one in a while. What's the pie chart of your user base? Who are the regulars? How does it break out?
I think we largely have the Wikipedia generation. It's the people who turn to their screens to answer questions. That is journalists, lawyers, academics, and kids these days. We’re also woven into Wikipedia. We fixed links on almost 25 million broken links in Wikipedia. There's millions of book links on Wikipedia that go to the Internet Archive. Even if people don't think that they've used the Internet Archive, they probably have. We're up for being infrastructure, reliable infrastructure.
We get a lot of fan mail from journalists, from people who lose their old websites, people hitting 404, documents not found, trying to do research in the middle of the night and finding it in books. Genealogists love us, not just for the Wayback Machine, but because of the periodicals, newspapers, things that people have uploaded over the years.
We're about the 200th most popular website. Wikipedia, I guess is number five or six. I'm a bit envious, but we're pretty popular. We get used by maybe two or three million users a day. Often they don't even go to the Wayback Machine or the website. They just use resources that are linked into other things that they're doing. We're just part of the infrastructure.
You’ve recently opened Internet Archive Europe.
Reopened. It was started in 2004. We launched Internet Archive Canada in 2006, but we're reinvigorating Internet Archive Europe.
If the web is worldwide, why do you have different editions of the Internet Archive?
It's because we wanted to be within the European value system. There are certain things that aren't allowed in Europe. We're trying to adapt Internet Archive Europe to adapt to the European value set. I think we can't depend on the United States. We can't depend on one thing. We really want many. And so the idea of internet archives, plural, just makes all the sense in the world.
Any future ones planned?
Internet Archive Switzerland has just gotten its nonprofit tax status. In the next month or two, they will be able to start moving forward.
We just launched a new section of our site, called Guide to Public Archives. It’s our attempt to wrangle into a single directory, all of the major repositories of the world with essential information. We've been blown away at how many deep and well-organized ones there are. We thought there were a few hundred. We've got 4,000 collections in 800 archives. And that's just the beginning.
Let us know how we can be helpful, because they may be having some struggles. We would like to work with them. I'm glad you're doing directories. Let's make it so all ships float.
How many books do you have available?
I'm not sure, five, six million, something like that. We want more, and to make them as available as we can. Using the Wayback Machine is not the easiest thing. Using the Internet Archive is not the easiest thing. I'd say full-text searching is something I do a lot on the archive. If you go to archive.org, there are several search bars. Go to the main one and then do a full-text search.
The television collection is probably one of the least used, which I'm really shocked about, because it's awesome. Jon Stewart, back in the days of The Daily Show, made such good use of archives like that. It wasn't ours, but we’re hoping that people quote from television to augment their argument in their journalism or blogs.
I've had questions from journalists about where they can find blogs. I've pointed them to you.
We have them. You have to know what the URLs are. That makes it hard. You can do a few keyword searches.
Is it true that Senator Padilla (D-CA) has made you a Federal Depository Library?
Okay, it's really nerdy, but it's kind of important. When the government needs to publish things, they give it to the United States Government Publishing Office, which gives it to Federal Depository Libraries.
We launched the Democracy’s Library project. The idea is to have all the works of all democracies at the federal, provincial and municipal levels. We launched it with the United States and Canada. We also officially joined the Federal Depository Library Program to get direct access to the materials and make them publicly available for free.
We also work with the other federal depository libraries, a lot of which are deaccessioning materials. We want to flip everything digital and hold on to a copy.
What is the open web?
The original vision of the internet that I signed on to was to build the library at your fingertips, to bake it so that you have access to the library and that personal publishing would be ubiquitous. That everybody had something to share and pass on to another generation. No matter who you are, everyone has something that's worthy of being in the library, and the library should be permanent and accessible to everyone. That was the vision of what I wanted the Internet to be back in 1980.
In many ways, we've gotten there. By evidence of a billion people's voices being on the open internet, and that people have wanted to share what they know. What I didn't foresee were the forces against this. These are things that came out of the dropping of antitrust in the ’80s. With the multinational publishing conglomerates, I didn't foresee antitrust being completely tossed. My vision was based on the New Deal America I grew up in, and the idea of public education.
We have some countervailing forces to that now. The works that are on the internet are vast, but we're missing a lot of the published works of humankind. As Mike Lesk, the father of digital libraries, was worried about the 20th century. He was confident we would be able to access the works of the 19th century because they’re out of copyright. And he was confident we were going to have access to the 21st Century, because people would adapt to the internet. I think he was wrong there. But, he said he was worried about the 20th century, and that we're seeing such confusion by the digital learner generation, anybody under the age of 40, about what happened in the 20th century is attributable to not having a library that's as good as the library I grew up with.
That's because copyright holders constrain access.
They use a whole army of tools to constrain access. Copyright is one, license is another. It's often not even for monetary reasons. As a top attorney for a publishing conglomerate told me, "It's not about money, it's about control." During periods of authoritarianism, those motivations are even more apparent.
© newsjunkie.net 2025, illustrations © Gordon Henderson 2025
The Internet Archive is a nonprofit digital library with the ambitious goal of achieving "universal access to all knowledge." It recently met a milestone of preserving its 1 trillionth webpage, but it also archives books, music, software, and video.
Best known for its Wayback Machine, which lets users see how websites looked in the past, the site functions as a massive, free online library providing public access to cultural heritage and acting as a vital research and backup tool for the digital world.
Gordon Whiting, publisher of newsjunkie.net, sat down with Internet Archive founder and Digital Librarian Brewster Kahle to discuss the future of information gathering at this pivotal time where the delete button can erase data faster than a book burning.
Before we dive in, give us a sense of the services and infrastructure that make up the Internet Archive.
This month marks a milestone: 1 trillion webpages archived for posterity. That's maybe 1 billion voices from around the world from almost the last 30 years. That's an astonishing achievement for humanity. The Internet Archive works with 1,250 libraries from around the world to figure out what webpages are worthy of being collected at what times, frequencies, and to make sure all of that ends up on servers that are in the United States and somewhat replicated in Canada, Europe, and in Alexandria, Egypt.
Alexandria, a significant name in the history of libraries.
Absolutely, and it's been a pleasure to work with the Library of Alexandria in Egypt, the new one. There are a couple hundred people at the Internet Archive building these collections, working with people, keeping the servers all going. But also archiving television, books, music, and video. These collections make up a library that people can use for free by going to archive.org or the website openlibrary.org, which focuses on books. People can use it in many ways, like the libraries that I went to and used when I was growing up.
A lot of our readers are primarily news people who are starting to see that the things that we've taken for granted, like your neighborhood library or the Library of Congress, are not unbreakable. What does it mean to keep the Internet Archive going now, where there’s no guarantee it will be there in the future?
Libraries are under attack in many ways. We haven’t seen this happen in our lifetime. We have book bannings, but also massive defunding of libraries going on. We have a school district in Texas that shuttered their whole library system, junior high school and high school.
Structurally, a bigger problem is that consolidated publishing conglomerates, which control most of the media ecosystem, are not allowing libraries to own anything. They don't sell books or music or movies. They just let libraries license access on behalf of their patrons to publisher database products that can be surveilled and changed at any time. They don't own the ebooks they lend, which can be changed or removed, and you'd never know it.
Even news organizations aren't selling to libraries. Libraries have often been a major supporter of local authorship, local creativity, by being socialized funding structures. A lot of that is gone. We have a fragility to our media ecosystem, which is really unwarranted. Libraries are still massively supported by people. We should be buying materials so that we can preserve them.
Comment on the culture of convenience. We download an app, we agree to all the terms, and don't really worry about it. We’re agreeing to things that are similar to what you just laid out. We don't own things. Third parties surveil us. We've come to just accept a lot of these things, and I take it that you don't feel that that's a positive development.
Yes, it's horrible. The current problem is that people can't figure out what's true. So I like the line from Current Affairs magazine: The truth is paywalled. But the lies are free. Basically, propagandists are running the show. Anybody under the age of 40 was brought on screens. They have a library that is in many ways thinner and worse than the library system that I grew up with. I could walk into a library and be able to read past issues of periodicals or books. You could get things on loan.
It was slow. It wasn’t a panacea, but the opportunity in going digital was to make things better. We have a trillion webpages, which is awesome, but a lot of the materials that are in publishers’ hands are not available to this generation. What’s missing is information from the 20th century. Why do so many people not know about the Holocaust? People are losing access to information.
A substantial amount of the 20th century still is under copyright protection and not freely accessible.
Correct. Copyright protection was 14 years at that time of Ben Franklin. But now it takes 100 years for work to move into the public domain. It's absurd. We extended copyright to almost 100 years or more. The corporate level of control over access to information is acute. If you'd have both consolidation of media and control over the past, and make that available to government control, it's not good.
All it takes, this is not a big revolution to suggest, to make this completely turn around is just to sell things. Go back to actually having capitalism work, not feudalism, where everything is licensed and you never get to own anything. Buy ebooks, buy newspapers, buy periodicals in electronic form or not. Yes, some people make copies and do bad things? They always have. But can we have a fervent flowering ecosystem of journalism and authorship? Yes.
I think more people would be interested in buying ebooks than using awkward, non-integrated web reading experiences. We have Wikipedia, which is fantastic, but try to link from a Wikipedia article to a book page on a Kindle. It's not going to work.
Ted [Nelson, a mutual friend] would be jumping for joy to hear you say that things should be sold. He corrected me in 1992 when I asked, could a book be downloaded? He said, not downloaded, bought.
There are lots of solutions to this. You can have many authors, some of which sell their books, many publishers, many booksellers, many libraries. It has worked for hundreds of years, why stop now? There's no central point of control. No one can lean on a publisher and say, make this disappear. The idea of media control by governments and by large corporations is all on the rise. We have the opportunity to do something, which is, sell things. Just sell things.