Spring 2023 Course on Natural Language Processing and the Human Record

Tufts University will introduce a new course in spring 2023: “Natural Language Processing and the Human Record.” Students at Boston College and Boston University can already cross-register to take this course for credit but, insofar as space allows, it will be open to others in person and to a wider potential audience participating online. This project-based course will not only provide opportunities for students of Greek and Latin, but also for students of other historical languages. It also addresses a major gap between the curricula to which most students of historical languages have access and the realities of doing research in a digital age.

When Princeton, for example, announced a tenure-track job at the rank of Assistant Professor in Ancient Mediterranean Languages and Cultures to begin in Fall 2023, it specifically asked for someone “who can help us expand and diversify our offerings, for example by adding a language to those we already teach, and/or using digital methods and resources, and/or harnessing the insights of linguistics to illuminate broader cultural issues in the study of ancient Greece, Rome, and related ancient and later cultures.”

Language technologies allow students of the Greco-Roman world to address all three of the intellectual goals that this job posting requests. In particular, students of Ancient Greek and Latin who take advantage of contemporary digital methods will be positioned to work with a variety of languages. The figure below illustrates how we can, for example, now offer dense linguistic annotations for a growing number of sources in historical (and contemporary) languages.

Screenshot from the NEH-funded Beyond Translation Project that is building a next generation reading environment for the Perseus Digital Library.

A reading environment such as the one above depends upon a hybrid environment that integrates automated analysis based not only on both machine learning and traditional procedural programming but also on contributions by human experts on the particular source. Knowledge of the language (whether in the form of annotated training data or heuristic rules) provides the starting point for computation and the computation can improve based on expert feedback. We need participants with strengths on both the computational and the content sides (and ideally some participants who can contribute to both sides of this process).

Few programs (if any) are, however, designed to provide students of Greco-Roman culture with the skills that they need to apply digital methods. Those students who do acquire such skills often do so as undergraduates in Computer Science, working in a job involving computation before they begin graduate school, or as something they pick up on the side during graduate school.

In the Tufts Department of Classical Studies, we will be teaching a new course with the title “Natural Language Processing and the Human Record.” It will be taught for the first time in spring 2023 as CLS 191 (and will, in subsequent years, appear with its own number as CLS 162).

This class will be taught on Monday nights from 6:00-8:30 pm so that people who are not regularly on the Tufts campus (such as in-service teachers and students from other local institutions) would be able to participate in person. An existing agreement would allow students from Boston University and Boston College to cross-register, but, if space allows, others are welcome to join. The course is officially listed as in person but we would work to make it accessible remotely as well.

Any intellectually determined student could profitably take this class. First, those who wish to focus on the computational side would need to acquire core skills in programming with Python and in working with Jupyter Notebooks but that is certainly doable, given the variety of online tutorials available, between the end of class in fall 2022 and the beginning of this class. The work required would be non-trivial but effective computational scholarship has long required, and probably long will require, a great deal of informal, on-going, self-directed study.

Second, although ability to apply a range of Python based libraries will always be a big help and extend the intellectual range, students with an interest in how to organize philological data could take this class and focus on topics such as the application and assessment of methods such as morpho-syntactic analysis (particularly the Universal Dependencies Framework), co-reference resolution, translation alignment, named entity recognition and linking, topic modeling, sentiment analysis, and ontology development. Technical terms such as these may be unfamiliar to most practitioners (and do evoke blank stares from most senior scholars), but they represent foundational new building blocks upon which the study of historical languages must be based in a digital age. Early career researchers, as well as students who wish to use their study of the past to prepare them to flourish in the modern world, face very different challenges and opportunities than those who were fashioned by the limitations and assumptions of late print culture.

The course description specifically mentions Ancient Greek and Latin because those are two languages that we know we can support but students are welcome to focus on any language where we have independent expertise to help guide and evaluate their work.

Strong contributions from the course will have an opportunity to be published, with credits for individual authors and/or each member of the team, both in the Tufts Dataverse and (as appropriate) in the new Perseus.

CLS 191 01 Seminar on Current Topics in Digital Humanities: Natural Language Processing and the Human Record
Cross listed with GRK 191 01 and LAT 191 01
G. Crane
3 SHU
In Person (although we will make remote participation possible0
Mondays, 6:00 – 8:30 pm (local time in Boston, MA, USA)

This class explores the application of natural language processing to the study of the human record and serves two complementary audiences. First, students who are familiar with, or able quickly to develop familiarity with, Python and related technologies can use these skills to develop course projects. Second, students who do not yet have this technical background but who wish to focus on how to publish born-digital versions of historical sources can take his class to develop new ways of reading, organizing and analyzing texts. Students of Greek or Latin who wish to focus on the language can take this class as GRK 191 or LAT 191. Students who wish to focus on another language (including sources in English) are welcome but should consult the instructor. We will cover recent publications and examine current applications that define the state of the art in digital humanities and digital publication. While students will be able to work on their own, we will particularly support the development of collaborative projects in which students with complementary skill sets work together. 

Recommendations: CLS 162 – CLS 161, CS 10, CS 11;

For those enrolling in GRK 191 or LAT 191: three or more semesters of study recommended.

This entry was posted in Course(s) and tagged , , . Bookmark the permalink.

Comments are closed.