Workshop: "Subject Indexing Early Modern Dissertations: Towards a Methodology for ML-based Text Classification Using Metadata", 17 February 2021

More information

Workshop held by Stefan Heßbrüggen and Jörg Walter @ Development and Application of Category Systems for Text Research – 4th forTEXT expert workshop, 17 February 2021

For a digital historian of philosophy, VD17, Germany’s national bibliography of prints between 1601 and 1700, is an immensely valuable data source: it provides metadata for thousands of dissertations that were defended at a philosophical faculty. The printing of dissertations was linked to the practice of public disputations involving a praeses, a respondent and one or two opponents. The metadata for these prints allow for a bird’s-eye view of teaching and scholarship in academic German philosophy of the 17th century. We will discuss how to classify these texts according to the subdiscipline they belong to, e. g. as metaphysical, ethical, or philological dissertation. Our approach was originally based on the hypothesis that the subject matter of dissertations mirrors the internal structure of a faculty of philosophy, e. g. with a professor for metaphysics and logic, another for poetics and eloquence etc. And we presumed that machine learning algorithms are a good fit for this classification task, since many dissertation titles exhibit similar patterns, containing a ‘genre’ term (e. g. ‘dissertatio’), a disciplinary ‘label’ (e. g. ‘physica’), and a topic indicated through n-grams starting with ‘de’ (e. g. ‘de meteoris’). This working hypothesis was, however, only partially confirmed: the algorithms we used achieved an average precision of approx. 70% — not good enough to draw any substantial conclusions about the number of dissertations published in the various disciplines. In our presentation we will show how precision could be significantly increased for those dissertations which in fact exhibited the pattern we were searching for and what factors were responsible for loss of precision in the ‘long tail’, titles that exhibited identifying features to a lesser degree. This finding prompted us to add another classification criterion, namely the reliability of our first-order classification: we now distinguish a ‘core’ of titles with high precision from ‘non-core’ titles with significantly lower precision and ‘unclassifiable’ titles that cannot be identified as belonging to a discipline. Our approach can lead to a more differentiated understanding of machine learning ‘features’ used as criteria for classification.

Stefan Heßbrüggen-Walter is Associate Professor at the School of Philosophy and Cultural Studies at HSE University in Moscow. Jörg Walter is an independent software developer in Velbert/Germany.

Registration

If you would like to participate, please register here: https://us02web.zoom.us/webinar/register/WN_8M6wZ5JyTB2A04gOxR-DMw

For questions, please contact: fortext

[at]linglit.tu-darmstadt.de.

Workshop: “Subject Indexing Early Modern Dissertations: Towards a Methodology for ML-based Text Classification Using Metadata”, 17 February 2021

Registration

Recent Posts

Workshop: “Subject Indexing Early Modern Dissertations: Towards a Methodology for ML-based Text Classification Using Metadata”, 17 February 2021

Registration

Share This Information:

Recent Posts