When you look at online pictures of an illuminated manuscript or other historic documents, do you see a data set? Will Noel, noted medieval manuscript specialist and open data advocate, does - and hopes you will too. On Tuesday, March 27, the Friends of the Lehigh University Libraries sponsored a talk on historical documents in a technological world given by Noel, the director of the Kislak Center for Special Collections, Rare Books and Manuscripts at the University of Pennsylvania.
Noel has made it his personal mission to bring digital technologies and data analysis to manuscript studies. Technology allows for a wider audience to experience the materials, as he demonstrated with the UPenn Scribes of the Cairo Geniza project that crowdsourced tagging to analyze a massive collection of digitized manuscripts. Thousands of volunteers viewed and sorted the images, which helped reveal new insights to scholars - for example, that most of the manuscripts illustrated with flowers tended to be marriage contracts.
But Noel argues that the images’ underlying data should be as thoroughly detailed and openly available as the images themselves. Currently, digital images aren’t treated like unique reproductions of historic materials, despite differences in image quality, source, and image manipulations that would distinguish two digital images of the same original. Two pictures of one document taken ten years apart would be quite different - but the digital images themselves wouldn’t tell us how.
“This is a humanities problem, not a geek problem,” Noel said. Unlike established scholarly practice, which explores in detail the differences between an original manuscript and its copies, digital facsimiles often lack information about when and how they were created. As a result, scholars who wish to study the images may not have sufficient descriptive information, or metadata, about the images themselves to make an accurate assessment.
For example, faculty and students at Trinity University in Texas created a digital edition of The Lament of St Anselm using UPenn’s digital images, without ever touching the original manuscript. But, Noel warned, the images have no permanent identifier in the event that something happens to the OPenn repository where the images are housed. Nor does the new edition provide information about the conditions under which the manuscript was digitized.
The raw data, Noel argued, is analogous to provenance records that scholars use to trace the ownership and management of art and historic materials. Keeping and sharing data was a priority during Noel’s own work with the Archimedes Palimpsest. He took advantage of the flexibility of digital files, manipulating and processing multi-spectral images to reveal text that had been scraped off when the parchment was reused. By sharing the raw data and open source code for his project, as well as the images he manipulated, Noel has made a scholarly contribution that also allows others to replicate his work or derive new insights from looking at the data themselves.
So how can digital humanities and special collections scholars respond? By documenting and cataloging the images as though they are original objects themselves, and with an eye to the future, working for an equivalent of a DOI for digital images. This, Noel explains, would remain faithful to methodological practice while allowing for unanticipated advances by future scholars.
To learn more about Lehigh’s collaborative medieval manuscript digitization project with Will Noel, visit the Bibliophilly site.