Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

Historizing topic models:… - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

Historizing topic models: A distant reading of topic modeling texts within historical studies

Konferensbidrag (offentliggjort, men ej förlagsutgivet)
Författare Rene Brauer
Mats Fridlund
Publicerad i Cultural Research in the Context of "Digital Humanities", Proceedings of International Conference, 3-5 October 2013, St. Petersburg, Russia
Förlag Asterion
Förlagsort Sankt Peterburg
Publiceringsår 2013
Publicerad vid Institutionen för geovetenskaper
Språk en
Länkar https://www.academia.edu/6385340/Hi...
Ämnesord topic modeling, digital history, digital humanities, historical methodology, Latent Dirichlet Allocation
Ämneskategorier Lingvistik, Idé- o lärdomshistoria, Historia

Sammanfattning

Topic modeling (TM) is a method used within the new ‘digital history’ that represents a data driven methodology that might be closest to fulfilling literary historian Franco Moretti’s promise of making possible ‘distant reading’ of large text quantities. Inspired by this promise, TM has been used for historical studies since the early 2000s and this study provides a survey of the state of the art of TM among historical studies by giving a historical and methodological introduction into the use of TM within historical minded research. TM’s was first being developed for data mining within natural language processing and machine learning in the 1990s and had as its overwhelming benefit its ability to cover magnitudes more of data as compared to traditional methods. The primary topic model used is the Latent Dirichlet Allocation that allows TM to be used as a search function, a quantitative check of intuition or as a summarization tool for large corpora of texts. Having many competing theories and assumptions that are constantly being challenged and developed TM in itself currently represents a very active area of research within computer science. The survey of historical texts take its starting point as the first peer-reviewed historical article in 2006 and end point the publication of the firs research monograph in 2013 and identified 23 historical studies employing TM. To provide a general overview of the field the studies were examined using a distant reading quantitative approach and analyzed according to authors’ academic background, gender, academic seniority and country of academic institution; corpora’s type, language, chronology, and geographical focus. The results showed most authors being junior untenured male researchers, primarily affiliated with US-universities and the texts consisting of a substantial number of non-standard online texts. Despite the application within historical studies TM still comes across as a technology driven approach with majority of authors having a background in technical disciplines. Corpora where primarily focused on English texts with a US or global focus and with an emphasis on recent history. All in all TM appear to an emergent rather than established historical methodology.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?