International Journal of Internet Science


A peer reviewed open access journal for empirical findings, methodology, and theory of social and behavioral science concerning the Internet and its implications for individuals, social groups, organizations, and society.

Volume 11, Issue 1 (2016)

Digitizing a Large Corpus of Handwritten Documents Using Crowdsourcing and Cultural Consensus Theory
Prutha S. Deshpande, Sean Tauber, Stephanie M. Chang, Sergio Gago, & Kimberly A. Jameson
University of California, Irvine, USA

Abstract: We investigated using internet-based procedures to convert information from a large handwritten archive of ethnographic survey data into a computer addressable database. Rather than manually transcribing the archive's estimated 23,000 pages of handwritten data, we sought to develop novel crowdsourcing task designs, and to use an innovative variation of Cultural Consensus Analysis (CCT) to objectively aggregate crowdsourced responses based on a formal process model of shared knowledge. Experiment 1used simulated internet-based tasks conducted on human subject pool participants in a university laboratory. Experiment 2 used a similar design with the exception that it was implemented on an internet-based research platform (i.e., Amazon Mechanical Turk). Results from these investigations shed light on several uncertainties concerning the utility of CCT analyses with crowdsourced transcription data. For example, they clarify (1) whether crowdsourced tasks are practical as a method for automating the transcription of the archive's handwritten material, (2) whether responses from perceptually-based tasks inherent to transcribing handwritten documents can be analyzed using CCT, and (3) if CCT is appropriate as a model of the transcription challenge, then do the results produce accurate answer-key estimates that could serve as correct transcriptions of the archive's data. Our results address these issues and convey how CCT modeling can be modified and made appropriate for aggregating such data. Implications of these analyses and uses of CCT in large-scale crowdsourced data collection platforms are discussed.

Keywords: Crowdsourcing, cultural consensus theory, shared knowledge, handwriting transcription, individual differences

pdf Download full paper

Creative Commons License
The article is published under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.