10.26.2015 Research
Arnold Foundation Announces $1.9 Million Grant to Develop Internet Archive Search Engine
Houston — The Laura and John Arnold Foundation (LJAF) today announced a $1.9 million grant to the Internet Archive, the world’s largest public digital library, to develop a search engine that will provide unprecedented access to its extensive collection of webpages, also known as the Wayback Machine. The search engine will allow researchers, historians, and others to retrieve data and information from the billions of webpages and websites stored in the Wayback Machine and will ensure that there is a comprehensive, open record of the Internet that is accessible to all.
Each day, more than 600,000 people use the Wayback Machine to access digital records of significant events ranging from the Iraq War to NASA’s exploration of Mars. However, users are currently limited in their ability to find content because the portal’s search technology is outdated. The Wayback Machine also lacks an effective cataloging system. While partners such as public libraries and universities curate a small portion of the archive’s content, most is stored only by URL and date, and visitors must enter an exact website address in order to access content. The new search engine will enable users to perform more robust searches and search for websites by entering a topic or keyword. This enhanced functionality will allow individuals to uncover an extensive selection of relevant content and will create unparalleled access to our digital history.
“It is important that as our methods of communicating evolve, our methods of preserving information also change,” LJAF Vice President of Venture Development Kelli Rhee explained. “We need an open, free, public record that can be used to hold governments accountable, to ensure that cited evidence is accurate and complete, and to guide decision-making in the digital age. By making it easier to search the Wayback Machine, the Internet Archive is helping to preserve information within the infrastructure of the Web.”
Studies show that, on average, a webpage is altered or deleted after only 100 days of being online. When pages are erased, edited, or abruptly moved, there can be wide-ranging consequences. For example, public figures can retract or modify statements after issuing them and annotations in scientific studies can be eliminated. The Internet Archive has worked to address this issue by preserving more than 439 billion Web captures — including webpages, video, and images — through data donations, automated archiving of millions of websites, and collaboration with a thousand scholars and librarians.
“The Web exists in a land of the perpetual present,” explained Brewster Kahle, the Internet Archive’s founder and digital librarian. “It needs a memory, a historical record for scholars, journalists and the public to reference our digital past. That’s why we first built the Wayback Machine back in 2001. By expanding the Wayback Machine’s capabilities so that users are able to search for sites, we will dramatically enhance its practical value.”
In addition to upgrading its search capabilities, the grant will allow the Internet Archive to optimize the quality of the billion webpages that are captured each week and improve the playback of media. The group will pilot a beta version of its new search feature in 2016 before publicly releasing the technology in 2017.
About the Internet Archive
The Internet Archive is a non-profit digital library founded by Brewster Kahle in 1996 with the mission to provide “Universal access to all Knowledge.” The organization seeks to preserve the world’s cultural heritage and to provide open access to our shared knowledge in the digital era, supporting the work of historians, scholars, journalists, students, and the blind and reading disabled, as well as the general public. The Internet Archive’s digital collections include more than 25 petabytes of unique data: 439 billion Web pages, moving images (a million films), audio (2 million recordings; 100,000 concerts), texts (2.7 million digital books), software (100,000 items) and television (3 million hours). Each day, 2 – 3 million visitors use or contribute to the archive, making it one of the world’s top 250 sites. It has created new models for digital conservation by forging alliances with more than 400 libraries, universities and national archives around the world. The Internet Archive champions the public benefit of online access to our cultural heritage and open standards for its preservation, discovery and presentation. https://archive.org/