Library manages digital archiving

Jan. 23, 2012, 2:01 a.m.

In the face of challenges posed by an increase in social media, archivists at the Stanford University Libraries have adapted new technologies to digitally archive a traditionally community-driven database of Stanford documents.

 

Library Archivist Daniel Hartwig said that documents have been traditionally collected from Stanford staff or alumni who feel that they have something to contribute to the study of the history of Stanford. The catalog includes personal letters of former University President Donald Tresidder, lecture notes from students in the 1960s and materials from the controversial work of psychology professor Philip Zimbardo.

 

Library manages digital archiving
(OLLIE KHAKWANI/The Stanford Daily)

“We’re lucky in that our focus is limited to Stanford so we have a kind of built-in mechanism there for alumni to donate things,” Hartwig said.

 

However, now that correspondence and official documents are often in digital form, collecting materials for the archives has become more difficult.

 

“We’ve known how to archive paper for quite some time,” said Andrew Herkovic, director of communications and development for the Stanford Libraries. “There’s a lot of new art and science that needs to be developed for digital archiving. We’ve been doing a lot of web archiving for Stanford websites. There’s no copyright issues there…so we’ve captured well over 300 discrete websites totaling, I think, close to half a terabyte of data.”

 

Major challenges in digitizing the archives include collecting documentation of student life from email and on websites like Facebook and Twitter.

 

“Right now we’ve set up an electronic drop box so rather than, say, mailing things physically to us, you can just log in, put something in a folder and you’re done,” Hartwig said.

 

However, issues of legality complicate digital archiving when personal correspondences take place over third-party websites such as Facebook or Twitter. If a member of Facebook posts content, it becomes the property of Facebook.

 

“Definitely for those third-party proprietary products [such as social networks and image sharing sites], the main issue is copyright,” Hartwig said. “We would like to capture those [websites], but we don’t really have permission.”

 

“Copyright colors everything we do,” Herkovic said.

 

The library has recognized the importance of social networking in driving recent movements such as the Arab Spring, thus motivating archivists’ search for this data.

 

“When things pop up like [the Occupy Movement], we try to be proactive and go after them,” Hartwig said. “These tools really aren’t set up for outsiders to capture easily.”

 

Stanford Archives is now using tools such as Archive Facebook to save records of Facebook activities.

 

“It’s a plug-in you install through your Firefox browser,” Hartwig said. “It creates a local copy of basically your entire Facebook presence and then you can copy that over to our drop box.”

 

Patti Hanlon-Baker, a lecturer in the Program in Writing and Rhetoric (PWR), has experienced first-hand through her research the challenges that are occurring in the Stanford archives.

 

Hanlon-Baker said that having first-person material has often been more helpful in her research than official documents or newspaper articles, “just to understand the depth of the conversation.”

 

“As someone who teaches rhetoric and thinks about structure of rhetoric…to not have first person documents like fliers I find troubling,” Hanlon-Baker said.

 

Other large organizations such as Google and the Library of Congress are on the cutting edge of archiving digitized information.

 

According to Hartwig, Google is “building these tools to get your data or your material out of their services and to import them into someplace else. They’re a little bit ahead of the curve in terms of portability of material.”

 

The Library of Congress is taking in the entire archive of Twitter.

 

The archives are also using a tool called digital forensics, normally used in the context of law enforcement, to “authenticate and make readable things that may be of interest from a particular computer or server,” according to Herkovic.

 

“It’s always a catch-up game, a reactive effort,” Hartwig said, “[Students] are always going to use what is easiest or what makes most sense to them, so we kind of have to bite the bullet and see if it can work for us.”

 

The amount of digital information available to archivists is increasing exponentially, Herkovic said.

 

“This also has the effect that it will swamp a lot of researchers because there’s so much more stuff available,” he added.

 

Herkovic attributed this increase in material to the fact that people save more information on a computer than they would on paper.

 

“Archiving is getting more challenging…fast,” Herkovic said.

 

Both Hartwig and Herkovic agreed that the digitalization of information will have a major effect on the jobs of archivists and researchers.

 

“I think this all will have a profound effect on the way people do research,” Herkovic said.



Login or create an account