Course on Arabic Sources by West African scholars about the Mali and Songhai empires: Digital Humanities and a new model of Classical Studies at Tufts

At Tufts University, the course on Classical Historians (Classics 141 — details in the departmental course booklet) will focus on Classical Arabic sources composed in, and about, pre-colonial West Africa. While we will consider Arabic sources produced outside of West Africa and accounts of European travelers, we will focus primarily on two different historical sources from West Africa istself: the Tārīkh al-Sūdān and (what has traditionally been called) Tārīkh al-Fattāsh. Our goal is not just to learn about the Mali and Songhai empires but to use what we learn to create openly licensed, digital sources of various kinds that will help others explore a major historical period that has attracted far too attention in the teaching and research.

Students will have an opportunity to explore emerging, digitally enabled methods by which global audiences can begin exploring the human record. In particular, we will exploit techniques by which we can begin to make the Arabic source text itself accessible to a general audience. We will begin publishing sections of these sources in the new version of the Perseus Digital Library that we are developing with support from the NEH. The development site for this is Beyond Translation and will be augmented between now and the fall semester.


Figure 1:Conclusion of an unpublished historical source in Arabic from Mali, preserved by Yaro family collection and hosted by the British Library – one of more than 2,000 West African manuscripts that the British Library has made available.

The course itself will meet during Tufts’ fall semester Monday evenings from 6:00-8:30. Space allowing, we hope to see students from other institutions participate, whether by direct cross-registration or by getting credit through a directed study authorized by a faculty member at their own institution.

We will also offer a weekly reading group for those who wish to go over sections of the Arabic. This will can be taken as an optional addition to the Monday class or as a separate class. The Arabic reading group would be 1 credit (vs. 3 for the Monday class). Any students taking both would receive 4 credits.

During the summer, I will also be working on the digital edition of these two histories and of other sources. If others are interested learning more and in possibly contributing, they should contact me. There are a number of ways to contribute that match a range of skillsets. The basic requirement would be an ability to read English carefully but there are also clearly opportunities for those with knowledge of French, of various aspects of Computer and Data Science, and of Classical Arabic.

I am hoping this summer to resuscitate my own Arabic and to see how far that helps me with the language of these Islamic scholars from Timbuktu. I will be using tools such as the suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi to extend my (not very advanced) knowledge of Arabic. The goals are to create exhaustive annotations, included translations aligned at the word and phrase levels, for (1) a small but extensible set of passages and (2) sets of sentences that allow readers to trace the meaning of Arabic words which cannot properly be translated.

The larger goal of this class and the larger project that it represents is to create openly licensed digital materials that are not only of immediate use but that also can be modified and, wherever possible, reused under a Creative Commons CC-BY license. Such a license is not suitable for publications that seek to represent a particular scholarly voice at a particular time. We are, however, managing the sources in Github and so each particular contribution is recorded in the versioning history. We are supporting a collaborative model of authorship that may be familiar to readers from Wikipedia. That said, individuals will be able to use the Github versioning history to document what they have done and to create hybrid publications that contain their own accounts of what they did and what critical decisions they needed to take.

The Tārīkh al-Sūdān and the Tārīkh al-Fattāsh

The fall course will focus primarily on two histories — tarikh, pl. tewarikh — composed by Islamic scholars in West Africa.

 (1) The Tārīkh al-Sūdān (TS) by al-Sa’di (1594-1655+ CE) focuses on the history of the Songhay Empire from the mid-fifteenth century until 1591 and then the Moroccan invasions and subsequent administration down to 1655. Houdas (Sa’dī 1900 and 1900a) published the Arabic text and a French translation. In 1999, John Hunwick published a scholarly translation with notes that covered the 28 of 35 chapters most relevant to Timbuktu and the Songhay Empire.

Figure 2: Map of places mentioned in Chapter 5 of the Tarikh al-Fettash: interactive map developed with the Tufts Datalab.

(2) Thirteen years later, Delafosse (1913 and 1913a)  published the Arabic text and a French translation for a far more challenging source that has been known as the Tārīkh al-Fattāsh (TF), the “Chronicle of the Researcher,” an account of the Songhay Empire through 1599 and thus includes the early years of the Moroccan occupation. Almost a century later, Wise and Taleb (2011) published an English translation of the Tārīkh al-fattāsh based on Delafosse’s French translation and the Arabic text.  Tārīkh al-fattāsh is a novel chronicle written in the 19th century, and not the effort of three generations of scholars who worked on it starting from the early 16th century and eventually interpolated in the 19th century, as previously advanced by most scholars. This 19th century TF was composed by a substantial rework of a 17th century anonymous work. With support from the NEH Translation Program, Mauro Nobili and Ali Diakite are publishing a new edition of this work that contains an English translation, the Arabic text and clear indications of how the 16th and 19th century texts relate to each other. 

We will, however, also consider other sources. We have a reasonably accurate transcription for the Arabic of Ibn Battuta’s description of West Africa and an accompanying French translation, long in the public domain. We can use DeepL or Google Translate to create a quick first English version and then edit this as we align it to the Arabic original.

Where the project stands.

Work on this project begin in summer 2021, when Ayah Aboelela (UMass Boston CS ’21) led preliminary work. We found not only PDF versions of the public domain Arabic/French editions of the TS and TF but also text automatically generated by Optical Character Recognition.

The French OCR-generated text was good enough as a base for further work. We applied DeepL and Google Translate to convert the French into English. The results were surprisingly good: we did find ourselves making occasional changes to the English but such correction did not materially add to the overall work of adding base TEI XML tags, marking footnote markers in the text and footnotes on the bottom of the page, adding occasional Arabic words in the notes, and adding Arabic numbers in the translation that pointed to the corresponding pages in the Arabic edition. Readers can examine a sample of such work, chapters 21-22 of the TS on Github.

The Arabic text was more problematic and, in two different OCR-workflows, had a character error rate of c. 5%. The text is still sufficiently accurate to support a range of text mining (such as topic modeling and text reuse detection). It is also good enough as a starting text if the goal is to add extensive annotation to relatively short passages and to create an initial reader. Ayah produced initial versions of curated passages where she took the time to correct errors in the OCR-generated text.

A few thousand words of carefully edited Arabic with aligned translation and annotation would be a useful start and allow readers at an intermediate level to familiarize themselves with the style and content of these sources before moving to passages without curated annotations and translations.

Figure 3: word and phrase alignments between the Persian poetry of Hafez and an English translation, with words left in red that cannot be aligned to the original

Ayah published exploratory work using natural language processing tools for Arabic available from Spacy and CAMel on Github. She tested services for morphological analysis and disambiguation and dependency parsing for the Arabic.

Figure 4: Dependency parsing of Arabic text from the Chronicles, produced by Stanza and visualized with Displacy (NB: this system writes text out left to right rather than right to left).

For named entity recognition (NER) (determining whether names are people, places or groups), we decided to apply tools from Spacy to English generated from the French with machine translation. A relatively modest amount of training improved the accuracy of NER from 60% to 82%.

Figure 5: NER visualization, using SpaCy’s visualizer Displacy, on passages from the chronicles

Not only do these two histories constantly refer to places with which most readers in the US are probably not familiar, but there are many personal names, often complex in form, and rarely familiar.` Identifying these names and creating links between related characters, and linking this information directly to the source text will, we hope, make it easier for readers to see who is who and to identify characters on whom they should focus their attention.

Figure 6: Social networks for Roman history developed by Zach Sowerby, MA student in Digital Humanities in the Tufts Classics Department6

Figure 5 illustrates social networks derived from primary sources about Roman history. We use simple collocation to posit connections. Our hope is to include more complex information about relationships (son-of, occupation, etc.). Zach Sowerby was working with a much larger text corpus than the two histories and his relatively rapid prototyping already reveals basic patterns of who is important and which characters are connected. We can get useful information starting with this fairly rapid work.

In fall 2021, I taught the first iteration of this class. We spent much of our time reading Michael Gomez’s 2018 book African Dominion, now the standard English account of early and medieval West Africa and then examining the accounts in the two histories on which Gomez bases much of his work (and which he carefully cites). The map in Figure 2 (above) was produced for this class by members of the Tufts Data Lab as a model for additional student work. I will adding results from student projects in this class during the summer.

Let’s see what we can do in the summer and then in the fall!

Acknowledgements

This work was made possible by the Beyond Translation Project, funded by NEH HAA-266462-19 and by support from the Data Intensive Studies Center at Tufts University, and by collaboration with Eldarion.

Posted in Ancient Greek, Course(s), Digital Humanities | Tagged , , , | Comments Off on Course on Arabic Sources by West African scholars about the Mali and Songhai empires: Digital Humanities and a new model of Classical Studies at Tufts

Greco-Roman Studies and the Future of Europe

Gregory Crane

The study of Greco-Roman culture can exert a purposeful and transformative role in Europe’s development of a more just multinational and multiethnic society. This is a topic about which I have thought and on which I have spoken for years. Technology has begun to change the ways in which we can relate to sources in different languages from different cultural contexts. Those nascent changes are by no means deterministic – the technology can develop in more and less helpful directions and its development can move under very different influences. Likewise, the study of Greece and Rome can be appropriated by groups such as white supremacists or more narrowly Eurocentric nationalists (for whom whiteness is a necessary but far less sufficient condition). Therefore my own work seeks for constructive ways by which emerging technologies for reading and the study of Greco-Roman culture can together help foster a society that is more just and affords greater happiness to its citizens.

The rest is published on the Classical Continuum, a publication venue supported by the New Alexandria Foundation and the Harvard Comparative Literature Department.

Acknowledgements

This work was made possible by the Beyond Translation Project, funded by NEH HAA-266462-19 and by support from the Data Intensive Studies Center at Tufts University.

Posted in Essays | Tagged , | Comments Off on Greco-Roman Studies and the Future of Europe

Perseus and new, enhanced introductions to Ancient Greek: Fall 2022

Gregory Crane
Tufts University
March 25
Gregory.Crane@tufts.edu

Tufts University will offer two different sections of introductory Ancient Greek in fall 2022, each of which takes a complementary approach. Both sections of the class have been designed to exploit increasingly powerful digital tools for understanding Ancient Greek and other languages — the skills that you learn will also help you exploit, and go far beyond, what you can do with translation, whether those are literary translations by human beings or the product of systems such as Google Translate or DeepL. Both sections build directly on an emerging new version of the Perseus Digital Library. Neither section has any prerequisites.

The first section will follow a textbook and will teach you to produce, as well as to understand, ancient Greek. It will, however, also give students far more exposure to ancient Greek source texts from the opening weeks of the semester. The second section, which will be online at a time to be determined, will focus on exploiting increasingly sophisticated digital tools to analyze ancient Greek sources.

Figure 1: the first line of the Iliad with exhaustive annotation in a new reading environment being developed for the Perseus Digital Library, with translations and glosses into English and Persian. More than 1 million words of Greek has this level of linguistic annotation.

The first section follows a traditional textbook but exploits a range of digital methods to enhance the experience of learning Ancient Greek, providing substantial immediate feedback as you practice traditional exercises. Instead of translating Greek into English or English into Greek and then waiting days for correction, you will be able to receive substantial feedback. We will also spend as much time possible seeing how the vocabulary and grammar are used in actual Greek sources and minimize use of artificial textbook Greek. The goal is to give you active as well as passive command of the Greek. This section is better suited to your needs if you feel you may wish to go beyond first year Greek. It will meet Mondays and Wednesdays 1:30-2:45 PM local time.

This section will be primarily in person but will be open to those who wish to participate remotely. If you are at an institution where you can cross-register with Tufts (such as Boston College or Brandeis), you would not have to travel across town — scheduling may prevent you from taking your local introduction to Greek or you may wish to participate in this novel approach. Those seeking credit should be able do so through Tufts’ University College.

Figure 2: A translation of Iliad 1 by Amelia Parrish (Tufts ’21) designed to be aligned at the word and phrase level with the Greek original to expose the working of the source language.

The second section will meet online at a time to be determined. It will focus entirely on reading and is designed for those who may have only one year — or even one semester — to study Ancient Greek. This second section represents a more radical departure from traditional approaches as it focuses on annotated texts themselves and could be applied to any corpus with sufficient annotation. After one semester, practice with digitally enabled tools will allow you to compare a translation of Homeric epic to the original Greek, to explore what the words really mean in Homeric Greek (end not just how they are translated), and to engage with the epics on your own. In the second semester, you will be able to move on to more syntactically complex sources such as Plato.

If space allows, we would particularly encourage participation in this online section by students from outside of Tufts. We want to understand how to apply this more radical departure from traditional pedagogy. We are building on work done by Farnoosh Shamsian, Phd student at the University of Leipzig and participants in this class will be contributing not only to her research but to an ongoing reimagining of how we work with historical languages.

Figure 3: Metrical analysis for the Iliad and Odyssey (and much else) published by David Camberlain, with a recording of Camberlain reading those lines: see the original with recording at Hypotactic.com.

We are aware of no modern language programs that will provide such transferable skills. You will not only learn how to work with sources in Ancient Greek but will have tools to analyze Latin as well as modern languages such as French, German, and Italian but also Croatian and Latvian, Arabic and Mandarin. Our goal is not to help you check into a hotel or order dinner. Our goal is to allow you to work directly and quickly with not only Ancient Greek primary sources but with scholarship about these sources in a variety of modern languages. Our goal is to transform who can participate in traditional scholarship about the Greco-Roman world and then to enable new forms of scholarship and new intellectual communities that were never possible in print culture.

Figure 4: Automatically generated map of place names mentioned in Odyssey 4, annotations thanks to Josh Kemp, Furman University ’23 (Beyond Translation: Building Better Greek Scholars)

Description of this version of Greek 1 as it appears in the course book for the Department of Classical Studies at Tufts University.

Greek 1: Fall 2022
Introduction to Ancient Greek
Section 1: Monday/Wednesday 1:30-2:45
Section 2: To be scheduled.
Gregory Crane, Professor of Classical Studies, Editor-in-Chief, Perseus Digital Library
Christopher Petrik, Tufts ’24
Farnoosh Shamsian, Phd Candidate, Leipzig University

The rise of digital methods and, increasingly, of machine learning has begun to enable a transformation in the study of Ancient Greek. What you can learn in an introduction to Ancient Greek can be far greater now than was ever possible before. At the same time, what you can do with what you learn will take you much farther now than was possible before. Tufts University has been at the forefront of this transformation. In taking Ancient Greek, you not only can benefit from this work but will have an opportunity to contribute yourself, creating during the course of first year Greek materials that will serve other language learners and advanced researchers alike.

You will have more exposure to authentic Greek in this introductory class than has ever been. Exhaustive annotation exists explaining the function of more  than a million words of Ancient Greek while a new generation of translations, designed to clarify the working of Greek for those who do not know the language makes it possible to see how grammar and vocabulary actually work in some of the most famous works of Greek literature, from the time you learn your first words. The very same methods that you learn to begin working with Ancient Greek have been applied to dozens of other languages

A major barrier to learning historical languages has been the slow pace and limited reach of the feedback that you receive. You do an assignment one day, hand it in the next, and then see how you did in the next class, two days or more later.  When you practice what you have learned, you will often be able to get immediate feedback and then be able to practice what you have learned until you have mastered it. 

We offer two different sections, each with a complementary approach aimed to serve different audiences. The first section builds off of a traditional textbook, offering all exercises online with immediate feedback. Class time will be devoted to questions that you cannot resolve on your own and to seeing how what we have learned in class helps us begin to understand real texts. Students will also begin working with short passages from the Iliad and Odyssey, Sophocles, Thucydides, Plato, and the New Testament. The second section is designed to support those who may be able to devote only a year or even a semester to the study of Ancient Greek. You will learn enough of the grammar to understand the basic working of highly inflected languages such as Ancient Greek (and Latin and Russian and many other languages) but you will spend most of your time learning how to apply the rich set of tools available to help you read Ancient Greek – and many other languages. If you do choose to continue your study beyond the first year, we will provide you with a framework by which you can do that.

Acknowledgements

This work was made possible by the Beyond Translation Project, funded by NEH HAA-266462-19, by support from the Data Intensive Studies Center at Tufts University, and by collaboration with Eldarion.

Posted in Ancient Greek, Course(s) | Tagged , , , , , | Comments Off on Perseus and new, enhanced introductions to Ancient Greek: Fall 2022

Thoughts on Classical Studies in the 21st Century United States

Abstract: This paper consists of three complementary parts. The first section describes three instances where very technical scholarship on Greek literature overlaps with, and draws attention to, particularly dramatic historical contexts. This section describes an aspect of Greco-Roman studies that is both too demanding and too narrow — too demanding because it assumes that anglophone researchers work with scholarship in languages such as French, German, and Italian, but too narrow because it does not engage with scholarship that is not in a major European language. The second section talks about the general need for Classics and Classical Studies in a country such as the United States to extend beyond Greece and Rome. This section builds on work that I have published in the past distinguishing Greco-Roman from Classical Studies. The third section describes a more concerted attempt to expand beyond North Africa and to include sources from Sub-Saharan Africa. I report on developing for a spring 2021 course on Epic Poetry a 10,000-line Mandinka/English corpus of stories produced by West African Griots. I will also briefly discuss the use of Classical Arabic to explore locally produced sources about West African history and culture. As a first step, the fall 2021 course on Classical historians at Tufts University will center not only on sources such as Herodotus, Thucydides, Livy and Tacitus but on two histories that focus on the Songhay Empire: the Tarikh al-Fattâsh, begun c. 1593 by Mahmud Kati, and the Tarikh as-Sudân, composed by al-Sadi (c. 1594–1655). This class will expand the role of Classical Arabic in Classical Studies at Tufts.

The full text is available here.

Posted in Essays | Tagged , | Comments Off on Thoughts on Classical Studies in the 21st Century United States

What Is a CTS URN?

By Michael Konieczny

Cross posted at the Open Greek and Latin blog.

Visitors to the Open Greek and Latin digital library often ask us about what appear to be fragments of “code” alongside the list of authors in the library’s collection:

If you expand the list of works by a given author, you’ll notice that a similar line of “code” appears next to the title of each work, but with an extra element added at the end:

And once you actually go to read a work, you’ll notice an even longer sequence of characters in the right sidebar, and an identical one in your browser’s address window:

These character sequences are called CTS URNs (Canonical Text Services Universal Resource Names), and they are an essential component of the Open Greek and Latin infrastructure. Simply put, CTS URNs are unique identifiers that make it possible to retrieve a specific passage of text from a database. In this blog post we’ll take a closer look at how CTS URNs work, and why they are so important to building the digital Classics library of the future.

Needle in a Digital Haystack: Universal Resource Names

Suppose you pay a visit to your local library to check out a copy of your favourite Jane Austen novel. If your library is very small—say, 200 items or less—you will probably be able to locate the book quite easily just by scanning your eyes over the shelf. But this method would quickly become impractical in a library with thousands or, in some cases, millions of items in its collection.

In order to simplify the search and retrieval process, libraries assign each book a unique call number, and then use the call numbers to arrange books in a logical order across floors and shelves. Armed with a call number and a floorplan of the library, you can easily find a specific book from among millions of others—assuming no one has misplaced or stolen it!

Call numbers are an example of metadata: information about an object, such as its location, size, or creation date, that is separate from the object’s contents. Metadata is important for keeping track of items in a collection and understanding how they relate to one another. Good metadata also makes it possible to perform statistical analyses that can yield insights into the collection as a whole.

In many ways, Universal Resource Names, or URNs, are analogous to the call numbers in a library. Each item in a digital collection is assigned a unique URN that distinguishes it from every other item. When you log on to the collection, your computer downloads an inventory containing the URN of every available text—this is what you see when you browse the OGL library. The inventory is updated whenever a new text is added to the database, so that you never end up with “dead” links or an incomplete catalog.

When you select a text you want to read, your computer sends the URN of that text to the OGL server, which responds by sending back a copy of the text in the form of an XML document (on which more in a future post).

Finding Your Way: Canonical Text Services

In theory, a URN could be any random sequence of characters, as long as no two URNs are the same. This kind of system would tell you what texts are available to read, but nothing about the way in which the collection is organized or how different texts relate to one another. In particular, it would be difficult to group together different texts by the same author, an essential feature of both physical and digital libraries.

To solve this problem, projects such as OGL use a system known as Canonical Text Services. Despite the name, CTS has nothing to do with labeling texts as either “canonical” or “non-canonical”. Rather, CTS provides a set of rules for generating URNs that reflects the logical organization of texts into groups and subgroups.

If you examine the list of works by Lucian of Samosata in the screenshot above, you will notice that each URN begins with the same sequence of characters: urn:cts:greekLit:tlg0062:. The letters urn:cts: appear in every URN, and indicate that we are employing the CTS citation format. greekLit locates the text within one of OGL’s main subcollections, Greek Literature (other subcollections include Latin Literature,latinLit, and Hebrew Literature, hebLit). Finally, tlg0062 is the sequence that has been assigned to the author Lucian. In fact, urn:cts:greekLit:tlg0062: is a complete URN on its own: it distinguishes the author Lucian of Samosata from all other authors in the OGL library. Individual works are identified by appending a suffix to the URN of the author: tlg001, tlg002, and so forth. This way, all works by Lucian appear together as a single text group.

This sort of system, in which smaller categories are nested within larger ones, is an example of a hierarchy. In addition to grouping together works by the same author, the hierarchical format of CTS URNs makes it possible to identify a specific passage within a text in a way that mirrors the text’s internal structure.

Navigating the Text

Classicists will be familiar with the system of citing passages of text by canonical reference. Depending on their genre and length, most ancient works are divided into segments such as books, chapters, or, in the case of poetry, line numbers. Longer segments, such as books, are themselves usually divided into shorter ones, so that the result once again is a nested hierarchy. For example, the citation “Thuc. 5.84.1” refers to book 5, chapter 84, section 1 of Thucydides’ History of the Peloponnesian War, which happens to be the opening scene of the famous Melian Dialogue. Longer passages can be identified by using a range: to cite the Melian Dialogue as a whole, we can write “Thuc. 5.84-116,” that is, Thucydides book 5, chapters 84 to 116.

The advantage of canonical references is that, unlike page numbers, they remain valid no matter what edition of a work is being used. They are also more suitable for citing texts in a digital environment, where the concept of physical page numbers is no longer very meaningful.

To identify a specific passage of text within the CTS framework, the URN of the text can be extended in a way that resembles the canonical references above. Here is the URN for book 5, chapter 84, section 1 of Thucydides: urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:5.84.1. In this example, the sequence perseus-grc2 identifies a particular version of the text stored in the OGL database, while 5.84.1 points to the specific passage. A longer passage can likewise be expressed as a range: urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:5.84-5.116. Note that the same hierarchical levels must be included on either side of the range: 5.84-5.116, not 5.84-116, which would prompt an error message.

When you access a text on the OGL library, your computer is provided with information about the text’s citation structure. As you navigate through the text, your computer sends a URN of the target passage to the OGL server, which returns a copy of the passage, again in the form of an XML document. While you can progress forwards and backwards through the text sequentially, you can also enter a specific URN into your browser’s address window: assuming the URN is formatted correctly, this will take you directly to whatever passage you are interested in.

Planning for the Future

We have seen that CTS URNs provide a logical way of retrieving texts and passages of texts that reflects the organization of a collection and the internal structure of the texts themselves. But perhaps the question remains: Why is such a system important in the first place?

While a simple reading environment is possible without URNs, the CTS framework allows us to unlock the full potential of a digital edition. By assigning a unique identifier to every passage of every text, CTS URNs make possible large-scale textual analysis that in the past would have required hundreds of hours of manual tabulation. With the proper software, we can easily find out how word frequencies, metrical patterns, and even syntactical structures vary within and across texts. The discovery of statistically-significant variations might help resolve disputes over the authorship of a text, for example, or to precisely quantify the way in which an author’s style developed over the course of their career.

In addition to this, the CTS framework helps protect a digital repository from becoming obsolete in the face of changing technology. Since URNs are just strings of characters, they will remain valid no matter how the technology for processing and displaying texts evolves in the future. By investing in this system, the Open Greek and Latin Project is positioning itself to take advantage of exciting innovations in the field of digital humanities, and to serve as an invaluable resource to Classicists for generations to come.

Posted in Features, Research, Technology | Tagged , | Comments Off on What Is a CTS URN?

Center for Hellenic Studies Team Converts Plutarch’s Moralia for Open Greek & Latin

The Perseus Digital Library is pleased to acknowledge the recent contributions of the Center for Hellenic Studies (CHS) 2019 digital humanities summer interns and research team on the corpus of Plutarch’s Moralia. As a part of the First Thousand Years of Greek initiative, a project of Open Greek & Latin, the CHS provided support for the conversion of older Perseus data into Canonical Text Services (CTS) and EpiDoc compliant files.

All of these files will be made available in Open Greek & Latin via the Scaife Viewer.

The 2019 CHS Summer Digital Humanities Interns, Karina Cooper, Ethan Della Rocca, Sophia Elzie, and Lucy Parr, worked on proofreading files, updating the TEI-XML markup, and managing the workflow via the Perseus GitHub Greek text repository. Improvements included file naming consistency, structural review, and header standardization. Perseus and Open Greek & Latin/First Thousand Years of Greek are grateful for their efforts and attention to detail.

Angelia Hanhardt, Editorial Assistant at the CHS trained and supervised the interns in keeping with her work on open access publication.

In the fall of 2019, Michael Konieczny, PhD, Classical Philology and CHS Library Assistant for the academic year, joined Lia in completing and reviewing the interns’ work (and has blogged about his experience).

This work added over 1.8 million words to Open Greek & Latin — over half a million in Greek. The CHS team created over 75 metadata files and converted over 200 editions and translations, accounting for over 12% of the entire Perseus Greek open-source primary text corpus.

Perseus and our collaborators are grateful to Ethan, Karina, Lucy, Sophia, Michael and Lia for their hard work on this corpus and we thank the CHS for funding their efforts.

Posted in Contribution, Release, Scaife Viewer | Tagged , , | Comments Off on Center for Hellenic Studies Team Converts Plutarch’s Moralia for Open Greek & Latin

Looking for a new language? Consider Ancient Greek.

Gregory Crane
Tufts University — Fall 2019
Greek 1: Introduction to Ancient Greek and the Homeric Iliad
MWF: 9:00-10:15
Eaton Hall 209

Are you interested in studying a new language in fall 2019? Whether you want to try something different for your language requirement or you have a year — or even a semester — you have an opportunity to travel thousands of years into the past and to confront the oldest sources in the continuous tradition of European literature.This is not your parents’ language class, and it is not high school Latin. You have an opportunity to participate in the reinvention of an ancient field and the development of a new track within the humanities as a whole. And you also have a chance to begin developing a research agenda of your own, one that can bring together the humanities and emerging fields such as data science. 

No language is poised to benefit more profoundly from disruptive technologies than Ancient Greek. This disruption is fundamentally changing the ways in which we can interact with this, and other ancient languages. And while the changes can enhance the questions that specialists can pose, those who are just starting to learn Ancient Greek are poised to benefit the most. A growing range of reading support tools allow students of the language to interact with the primary sources immediately. The more understanding of the language and the culture students internalize, the richer the experience will, of course, be — there is a big payoff to sustained study. But the tools now becoming available mean that even a semester of study would allow students to engage with sources in Greek directly, at a depth that would have been impossible before.

Disruptive new reading tools challenge us to rethink our understanding of historical languages and cultures such as those of Ancient Greece. None of the scholarly infrastructure that has emerged over generations of modern academic study, over centuries of print culture, and over millennia of sustained scholarship has been organized to match the opportunities and challenges of an interactive reading environment that exploits computational methods in general and machine learning in particular to serve a global audience (not just those familiar with the languages of European scholarship). 

The creative disruption opens up a wide range of opportunities, especially for ambitious students who wish to contribute and to develop a research portfolio. Participation in the 2018/2019 introductory Greek class at Tufts University enabled three students to begin research projects of their own. Two, Madeleine Harris (Tufts ‘21) and Pearl Spear (Tufts ‘22) applied emerging research from cognitive science and natural language processing to begin transforming our understanding of ancient emotions, while Bella Hwang (Tufts ‘22) combined what she learned in Greek and in her courses on computer science to begin modelling the linguistic features that students need to master if they are to read Homeric Greek with fluency. Such projects allow students to make contributions to the study of Ancient Greek, to spend extensive time posing new questions of poems that have fascinated humans for thousands of years, and to begin cultivating the most dynamic analytical methods of the 21st century. 


The 2019/2020 academic year will be the second step in a multi-year process re-engineering the way our students engage with and master Ancient Greek. We will focus on the Homeric Iliad — the earliest surviving literary source in the continuous tradition of European literature. No secular literary work from Europe has commanded the attention of more audiences, from more cultural contexts, over more time than the Iliad. As you begin to study Homeric Greek, you travel back thousands of years, hear the voices of oral poets who worked with a tradition that was already thousands of years old and engage with a world that is both alien and familiar.

Posted in Ancient Greek, Course(s) | Tagged , , | Comments Off on Looking for a new language? Consider Ancient Greek.

Digital Editions in Practice, A Two-Day Workshop

Call for applications: Digital Editions in Practice, A Two-Day Workshop

The Perseus Digital Library at Tufts University will host a two-day workshop that provides an overview of a sample, practical digital editions creation workflow. This will feature both an open-lecture component led by developers and expert users of advanced technologies and “hands-on” sessions for participants that offer in-depth demonstrations of select tools and technologies as well as discussions tailored to the attendees.

For the full announcement, please see: https://goo.gl/BGbWxy

Posted in Call for Participation, Digital Humanities, Workshop | Tagged , , | Comments Off on Digital Editions in Practice, A Two-Day Workshop

Individual Developments and Systematic Change in Philology

Gregory Crane
May 1, 2018

At the end of March 2018, my collaborators and I finished enjoying five years of support — 5,000,000 EUR(!) — from an Alexander von Humboldt Professorship, support which allowed young researchers from many different countries to work both as a team and on their own. Documenting all that work will be a significant task and requires its own publication(s). Work, at Leipzig, Tufts and elsewhere, on Open Greek and Latin (OGL) and on the Canonical Text Services (CTS) protocol upon which OGL builds provides the starting point for much of the work described below. A tremendous amount of support for OGL came from the European Social Fund and the Alexander von Humboldt Foundation, but collaborators at Perseus at Tufts University, at Mount Allison University in Canada, at the University of Virginia, at the Harvard Library and Harvard’s Center for Hellenic Studies (CHS) have contributed time and significant sums as well. As a group, they have made 37 million words of Greek and Latin available in CTS-compliant epiDoc TEI XML via GitHub.

This paper, however, does not focus primarily upon what happened at Leipzig but takes note of a number of events that have taken place in the opening months of 2018 and that have some connection to, but also depend upon efforts outside of, the Digital Humanities Chair at Leipzig. Each taken separately is important. All of these events taken together reflect a broader, systematic change — and change for the better — in Ancient Greek and Latin philology in particular and, ultimately, for all philology.

For the full article, see https://goo.gl/zG4yT4.

Posted in Essays | Tagged , , , | Comments Off on Individual Developments and Systematic Change in Philology

Its alive! Perseus and the Scaife Digital Library Viewer

On March 15, Eldarion released the initial version of the Scaife Digital Library Viewer. The release is, of course, a first step, but this first step changes the world in at least two fundamental ways: (1) Perseus is alive — it can finally include new materials on an on-going basis; (2) the Scaife Digital Library Viewer provides a foundation for an environment that can publish a growing range of born-digital, openly licensed, and networked (and fully networkable because they are openly licensed) annotations and micro-publications that cannot be represented in the incunabular digital publication systems that still internalize the limitations of print publication.

First, Perseus can now be configured so that it can include new materials almost immediately. We have not yet established a regular workflow — the initial Scaife Digital Library Viewer still runs on a server maintained by Eldarion rather than Tufts — but updates on a weekly and even a daily basis, if not real time, would be quite reasonable. New content does not even have to be in Greek or Latin — we already include a Persian edition of the Divan of Hafez. More importantly, if someone outside of the extended network of Perseus collaborators puts their content in the right format (for now CapiTainS-compliant EpiDoc TEI XML), we can include it. Thus, Neven Jovanovic was able to publish the first of what is expected to be a series of early modern Latin texts in Perseus (Scaliger’s Latin translation of Sophocles’ Ajax). Prof. Hayim Lapin from the University of Maryland converted his CC-licensed version of the Hebrew Old Testament , Talmud, and Mishnah. At present, anything that ends up as visible in the Scaife DL can be (because we require an open license) a permanent part of the Perseus collections. We need to think through a general process of content submission (and however open we wish to be, there are obviously some limits), but there are enough established collaborators with content to add and enough CC-licensed material that we would like to add that we already have enough materials to test a workflow for updates.

Second, the use cases of Perseus and of Digital Classics are not only more varied than those of print but involve so many data types and so many implicit use cases that they represent an emergent system. These include born-digital critical editions (with variants classified and dynamically configurable), diplomatic editions with alignments between transcription and source images, alignments between different versions of the same text in the same language, bilingual alignments between source texts and translations morphological and syntactic analyses, co-reference resolution, and other categories of linguistic annotation, social networks, geospatial data, representations of digital intertextuality (including annotations expressing estimating probabilities that a given word or phrase represents a paraphrase or direct quotation from a lost source text), and an unbounded set of potential new annotation classes. Use cases include not only specialists posing new kinds of questions (e.g., search a corpus for instances of “future less vivid conditionals” or a semantically clustered list of verbs associated with male vs. female agents) but a fundamentally new mode of interaction that we might term language wrangling or language hacking, where readers have such dense networks of explanatory annotations that they can engage immediately, at some level of precision, with any annotated source in any language, whether or not they have any prior of knowledge of that language. Such reading is a new form of engagement that lies between the experience of experts who have spent their 10,000+ hours immersed in a subject and the passivity that a print modern language translation, with no mechanisms to get past its surface and into the source text, imposes upon the reading mind.

Looking at the first release of the Scaife Digital Library Viewer, it is easy to see all the work that needs to be done. Indeed, for me, the steady progress towards a Perseus 5.0 only deepens my appreciation for what went into the development of Perseus 4.0 (the Java-based version, initially developed by David Mimno more than fifteen years ago and still in use at www.perseus.tufts.edu) and Perseus 3.0 (the Perl-based version that David A. Smith initially developed on the side to give Perseus its first web presence back in 1995). More than a decade ago, we solved another, less immediately obvious problem for having Perseus emerge as a place in which to publish content. In March 2006 (after being badgered by Ross Scaife, as well as Chris Blackwell, Gabby Bodard, Tom Elliott, Neel Smith and others), we began to apply a Creative Commons license to content that had no legal entailments. As soon as we decided that we would create collections that only contained CC-licensed content, we solved the bottleneck problem: so long as we actually made the content available, we could never use exclusive control over that content to restrict the development of services that we could not provide (say hello to Perseus Philologic, Alpheios.net).

The Scaife Viewer of March 2018 may only be a beginning. It may have a great deal more to do (e.g., integrating treebanks and source text/translation alignments). But the code is open and the possibilities are almost unbounded.

Posted in Features, General, Release, Scaife Viewer | Tagged | Comments Off on Its alive! Perseus and the Scaife Digital Library Viewer