Big Data and Digital Humanities

In the past one month I have been introduced to the concept of digital humanities and my first faint understanding of the subject can be summed up as the application of computational methods to digital data in the field of humanities to facilitate research. As I delved deeper into the subject, it became clearer that the definition of digital humanities is broader than what I had initially understood. Its applications in arts and cultural institutions, museums, social media platforms, literary circles and performing arts has given the subject an inter-disciplinary character. The subject is an amalgamation of new media, technologies like virtual reality, augmented reality, artificial intelligence and use of analytics and algorithms. It also dawned upon me that there is one central element to this entire digital humanities eco-system that cannot be ignored – big data. Without it none of these would have been achievable and big data holds the key in shaping future developments.

The term ‘big data’ was first documented in 1997, in a paper by scientists at NASA describing a problem they had with visualisation. Since then, the term has been used and defined in different contexts and across various fields dealing with huge quantity of digital data. At present, it is a buzz-word with data visualisation, analytics and data science gaining prominence across different fields in the digital world including humanities research. So, one might wonder how did all this data in the humanities come from?

The association of big data with digital humanities in the literary domain can be traced back to 1949 when Italian Jesuit priest, Father Roberto Busa attempted to digitise the work of St Thomas Acquinas. The project was sponsored by IBM and is considered the beginning of preservation and curation of literary material. By the 1960s, digitisation and indexing of textual corpora had gained momentum in parts of Europe and the United States. It is also the time when quantitative analysis of textual data in the form of machine calculations of commonly used words and expressions began. But there was not much progress due to limitation in technology.

The next fifteen years, between 1970 to mid 1980s was termed as the age of ”consolidation”. More digitisation projects were undertaken as scholars understood the scope of computers in academics. Another significant milestone was the establishment of the Oxford Text Archive (OTA) in 1976. Researchers had to deposit their papers and the OTA archived and maintained the text and made it available to others for academic purposes. This was the beginning of digital libraries. Then in 1986, SGML (Standard Generalised Mark-up Language) solved the confusion caused by different and conflicting encoding schemes in humanities electronic texts. The arrival of the Internet in the early 1990s provided the much needed connectivity and access to this huge digital data collection to scholars, students and the common man. Finding information, sharing and publishing scholarly content was now happening between a larger community, not limited to just academia. Also, electronic resources were not limited to text anymore and information sharing in the form of images, audio and videos became a reality.

The interpretation and utilisation of big data has manifested in user experiences that were beyond imagination a decade ago. Data visualisation has evolved as great tool for aesthetic presentation and insightful research. New media is the platform and computational methods the tool to execute these projects. Museums and art galleries have employed data visualisation to create interactive and innovative presentations that have resulted in increased user participation. Data visualisation by the British Museum in collaboration with Google (https://britishmuseum.withgoogle.com) is a great example to that effect. The application of virtual reality has changed how people experience being in a museum or an art gallery. From experiencing being in historical city, or seeing historical characters come alive on a VR headset to an interactive art exhibition, new media has made it all possible.

On the scholarly front, digital data has opened up avenues for researchers to engage with literary content in innovative ways. The Blake Archives has given access to the works of William Blake to a global audience, thereby enabling further research. An interesting observation by Franco Moretti is that it’s a small part of literary field that scholars work on. Close reading does not help. He also adds that “it’s not a matter of time but of method: a field this large cannot be understood by stitching together separate bits of knowledge about individual cases…”. So, the question that pops up is – What is the right approach? While there are no absolute answers, there have been attempts at analysing quantitative data. Representation in the form of maps, timelines and graphs have helped in noticing new patterns and trends that would not be possible through detailed reading.

The role of big data has been immense in sculpting the landscape of digital humanities. The ever increasing data pool is only widening the scope of research. There is an ever increasing demand to create better algorithms for data analysis that would help in getting more conclusive results in the future. Research in the field of humanities cannot be limited to quantitative analysis. The computer assisted methods are yet to yield insights that are qualitative in nature . It can be said that quantitative analysis through computational methods is just a means and not the goal itself. For now, I seem to agree with Dr.Lev Manovich’s approach. He says –  “we need to carefully understand what is possible in practice, as opposed to in principle. We also need to be clear about what skills digital humanists need to take advantage of the new scale of human data”.

Bibliography:

Websites referred:

https://britishmuseum.withgoogle.com/

http://www.blakearchive.org/

Leave a comment