Abstract: This paper consists of three complementary parts. The first section describes three instances where very technical scholarship on Greek literature overlaps with, and draws attention to, particularly dramatic historical contexts. This section describes an aspect of Greco-Roman studies that is both too demanding and too narrow — too demanding because it assumes that anglophone researchers work with scholarship in languages such as French, German, and Italian, but too narrow because it does not engage with scholarship that is not in a major European language. The second section talks about the general need for Classics and Classical Studies in a country such as the United States to extend beyond Greece and Rome. This section builds on work that I have published in the past distinguishing Greco-Roman from Classical Studies. The third section describes a more concerted attempt to expand beyond North Africa and to include sources from Sub-Saharan Africa. I report on developing for a spring 2021 course on Epic Poetry a 10,000-line Mandinka/English corpus of stories produced by West African Griots. I will also briefly discuss the use of Classical Arabic to explore locally produced sources about West African history and culture. As a first step, the fall 2021 course on Classical historians at Tufts University will center not only on sources such as Herodotus, Thucydides, Livy and Tacitus but on two histories that focus on the Songhay Empire: the Tarikh al-Fattâsh, begun c. 1593 by Mahmud Kati, and the Tarikh as-Sudân, composed by al-Sadi (c. 1594–1655). This class will expand the role of Classical Arabic in Classical Studies at Tufts.
Visitors to the Open Greek and Latin digital library often ask us about what appear to be fragments of “code” alongside the list of authors in the library’s collection:
If you expand the list of works by a given author, you’ll notice that a similar line of “code” appears next to the title of each work, but with an extra element added at the end:
And once you actually go to read a work, you’ll notice an even longer sequence of characters in the right sidebar, and an identical one in your browser’s address window:
These character sequences are called CTS URNs (Canonical Text Services Universal Resource Names), and they are an essential component of the Open Greek and Latin infrastructure. Simply put, CTS URNs are unique identifiers that make it possible to retrieve a specific passage of text from a database. In this blog post we’ll take a closer look at how CTS URNs work, and why they are so important to building the digital Classics library of the future.
Needle in a Digital Haystack: Universal Resource Names
Suppose you pay a visit to your local library to check out a copy of your favourite Jane Austen novel. If your library is very small—say, 200 items or less—you will probably be able to locate the book quite easily just by scanning your eyes over the shelf. But this method would quickly become impractical in a library with thousands or, in some cases, millions of items in its collection.
In order to simplify the search and retrieval process, libraries assign each book a unique call number, and then use the call numbers to arrange books in a logical order across floors and shelves. Armed with a call number and a floorplan of the library, you can easily find a specific book from among millions of others—assuming no one has misplaced or stolen it!
Call numbers are an example of metadata: information about an object, such as its location, size, or creation date, that is separate from the object’s contents. Metadata is important for keeping track of items in a collection and understanding how they relate to one another. Good metadata also makes it possible to perform statistical analyses that can yield insights into the collection as a whole.
In many ways, Universal Resource Names, or URNs, are analogous to the call numbers in a library. Each item in a digital collection is assigned a unique URN that distinguishes it from every other item. When you log on to the collection, your computer downloads an inventory containing the URN of every available text—this is what you see when you browse the OGL library. The inventory is updated whenever a new text is added to the database, so that you never end up with “dead” links or an incomplete catalog.
When you select a text you want to read, your computer sends the URN of that text to the OGL server, which responds by sending back a copy of the text in the form of an XML document (on which more in a future post).
Finding Your Way: Canonical Text Services
In theory, a URN could be any random sequence of characters, as long as no two URNs are the same. This kind of system would tell you what texts are available to read, but nothing about the way in which the collection is organized or how different texts relate to one another. In particular, it would be difficult to group together different texts by the same author, an essential feature of both physical and digital libraries.
To solve this problem, projects such as OGL use a system known as Canonical Text Services. Despite the name, CTS has nothing to do with labeling texts as either “canonical” or “non-canonical”. Rather, CTS provides a set of rules for generating URNs that reflects the logical organization of texts into groups and subgroups.
If you examine the list of works by Lucian of Samosata in the screenshot above, you will notice that each URN begins with the same sequence of characters: urn:cts:greekLit:tlg0062:. The letters urn:cts: appear in every URN, and indicate that we are employing the CTS citation format. greekLit locates the text within one of OGL’s main subcollections, Greek Literature (other subcollections include Latin Literature,latinLit, and Hebrew Literature, hebLit). Finally, tlg0062 is the sequence that has been assigned to the author Lucian. In fact, urn:cts:greekLit:tlg0062: is a complete URN on its own: it distinguishes the author Lucian of Samosata from all other authors in the OGL library. Individual works are identified by appending a suffix to the URN of the author: tlg001, tlg002, and so forth. This way, all works by Lucian appear together as a single text group.
This sort of system, in which smaller categories are nested within larger ones, is an example of a hierarchy. In addition to grouping together works by the same author, the hierarchical format of CTS URNs makes it possible to identify a specific passage within a text in a way that mirrors the text’s internal structure.
Navigating the Text
Classicists will be familiar with the system of citing passages of text by canonical reference. Depending on their genre and length, most ancient works are divided into segments such as books, chapters, or, in the case of poetry, line numbers. Longer segments, such as books, are themselves usually divided into shorter ones, so that the result once again is a nested hierarchy. For example, the citation “Thuc. 5.84.1” refers to book 5, chapter 84, section 1 of Thucydides’ History of the Peloponnesian War, which happens to be the opening scene of the famous Melian Dialogue. Longer passages can be identified by using a range: to cite the Melian Dialogue as a whole, we can write “Thuc. 5.84-116,” that is, Thucydides book 5, chapters 84 to 116.
The advantage of canonical references is that, unlike page numbers, they remain valid no matter what edition of a work is being used. They are also more suitable for citing texts in a digital environment, where the concept of physical page numbers is no longer very meaningful.
To identify a specific passage of text within the CTS framework, the URN of the text can be extended in a way that resembles the canonical references above. Here is the URN for book 5, chapter 84, section 1 of Thucydides: urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:5.84.1. In this example, the sequence perseus-grc2 identifies a particular version of the text stored in the OGL database, while 5.84.1 points to the specific passage. A longer passage can likewise be expressed as a range: urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:5.84-5.116. Note that the same hierarchical levels must be included on either side of the range: 5.84-5.116, not 5.84-116, which would prompt an error message.
When you access a text on the OGL library, your computer is provided with information about the text’s citation structure. As you navigate through the text, your computer sends a URN of the target passage to the OGL server, which returns a copy of the passage, again in the form of an XML document. While you can progress forwards and backwards through the text sequentially, you can also enter a specific URN into your browser’s address window: assuming the URN is formatted correctly, this will take you directly to whatever passage you are interested in.
Planning for the Future
We have seen that CTS URNs provide a logical way of retrieving texts and passages of texts that reflects the organization of a collection and the internal structure of the texts themselves. But perhaps the question remains: Why is such a system important in the first place?
While a simple reading environment is possible without URNs, the CTS framework allows us to unlock the full potential of a digital edition. By assigning a unique identifier to every passage of every text, CTS URNs make possible large-scale textual analysis that in the past would have required hundreds of hours of manual tabulation. With the proper software, we can easily find out how word frequencies, metrical patterns, and even syntactical structures vary within and across texts. The discovery of statistically-significant variations might help resolve disputes over the authorship of a text, for example, or to precisely quantify the way in which an author’s style developed over the course of their career.
In addition to this, the CTS framework helps protect a digital repository from becoming obsolete in the face of changing technology. Since URNs are just strings of characters, they will remain valid no matter how the technology for processing and displaying texts evolves in the future. By investing in this system, the Open Greek and Latin Project is positioning itself to take advantage of exciting innovations in the field of digital humanities, and to serve as an invaluable resource to Classicists for generations to come.
All of these files will be made available in Open Greek & Latin via the Scaife Viewer.
The 2019 CHS Summer Digital Humanities Interns, Karina Cooper, Ethan Della Rocca, Sophia Elzie, and Lucy Parr, worked on proofreading files, updating the TEI-XML markup, and managing the workflow via the Perseus GitHub Greek text repository. Improvements included file naming consistency, structural review, and header standardization. Perseus and Open Greek & Latin/First Thousand Years of Greek are grateful for their efforts and attention to detail.
Angelia Hanhardt, Editorial Assistant at the CHS trained and supervised the interns in keeping with her work on open access publication.
This work added over 1.8 million words to Open Greek & Latin — over half a million in Greek. The CHS team created over 75 metadata files and converted over 200 editions and translations, accounting for over 12% of the entire Perseus Greek open-source primary text corpus.
Perseus and our collaborators are grateful to Ethan, Karina, Lucy, Sophia, Michael and Lia for their hard work on this corpus and we thank the CHS for funding their efforts.
Gregory Crane Tufts University — Fall 2019 Greek 1: Introduction to Ancient Greek and the Homeric Iliad MWF: 9:00-10:15 Eaton Hall 209
Are you interested in studying a new language in fall 2019? Whether you want to try something different for your language requirement or you have a year — or even a semester — you have an opportunity to travel thousands of years into the past and to confront the oldest sources in the continuous tradition of European literature.This is not your parents’ language class, and it is not high school Latin. You have an opportunity to participate in the reinvention of an ancient field and the development of a new track within the humanities as a whole. And you also have a chance to begin developing a research agenda of your own, one that can bring together the humanities and emerging fields such as data science.
No language is poised to benefit more profoundly from disruptive technologies than Ancient Greek. This disruption is fundamentally changing the ways in which we can interact with this, and other ancient languages. And while the changes can enhance the questions that specialists can pose, those who are just starting to learn Ancient Greek are poised to benefit the most. A growing range of reading support tools allow students of the language to interact with the primary sources immediately. The more understanding of the language and the culture students internalize, the richer the experience will, of course, be — there is a big payoff to sustained study. But the tools now becoming available mean that even a semester of study would allow students to engage with sources in Greek directly, at a depth that would have been impossible before.
Disruptive new reading tools challenge us to rethink our understanding of historical languages and cultures such as those of Ancient Greece. None of the scholarly infrastructure that has emerged over generations of modern academic study, over centuries of print culture, and over millennia of sustained scholarship has been organized to match the opportunities and challenges of an interactive reading environment that exploits computational methods in general and machine learning in particular to serve a global audience (not just those familiar with the languages of European scholarship).
The creative disruption opens up a wide range of opportunities, especially for ambitious students who wish to contribute and to develop a research portfolio. Participation in the 2018/2019 introductory Greek class at Tufts University enabled three students to begin research projects of their own. Two, Madeleine Harris (Tufts ‘21) and Pearl Spear (Tufts ‘22) applied emerging research from cognitive science and natural language processing to begin transforming our understanding of ancient emotions, while Bella Hwang (Tufts ‘22) combined what she learned in Greek and in her courses on computer science to begin modelling the linguistic features that students need to master if they are to read Homeric Greek with fluency. Such projects allow students to make contributions to the study of Ancient Greek, to spend extensive time posing new questions of poems that have fascinated humans for thousands of years, and to begin cultivating the most dynamic analytical methods of the 21st century.
The 2019/2020 academic year will be the second step in a multi-year process re-engineering the way our students engage with and master Ancient Greek. We will focus on the Homeric Iliad — the earliest surviving literary source in the continuous tradition of European literature. No secular literary work from Europe has commanded the attention of more audiences, from more cultural contexts, over more time than the Iliad. As you begin to study Homeric Greek, you travel back thousands of years, hear the voices of oral poets who worked with a tradition that was already thousands of years old and engage with a world that is both alien and familiar.
Call for applications: Digital Editions in Practice, A Two-Day Workshop
The Perseus Digital Library at Tufts University will host a two-day workshop that provides an overview of a sample, practical digital editions creation workflow. This will feature both an open-lecture component led by developers and expert users of advanced technologies and “hands-on” sessions for participants that offer in-depth demonstrations of select tools and technologies as well as discussions tailored to the attendees.
At the end of March 2018, my collaborators and I finished enjoying five years of support — 5,000,000 EUR(!) — from an Alexander von Humboldt Professorship, support which allowed young researchers from many different countries to work both as a team and on their own. Documenting all that work will be a significant task and requires its own publication(s). Work, at Leipzig, Tufts and elsewhere, on Open Greek and Latin (OGL) and on the Canonical Text Services (CTS) protocol upon which OGL builds provides the starting point for much of the work described below. A tremendous amount of support for OGL came from the European Social Fund and the Alexander von Humboldt Foundation, but collaborators at Perseus at Tufts University, at Mount Allison University in Canada, at the University of Virginia, at the Harvard Library and Harvard’s Center for Hellenic Studies (CHS) have contributed time and significant sums as well. As a group, they have made 37 million words of Greek and Latin available in CTS-compliant epiDoc TEI XML via GitHub.
This paper, however, does not focus primarily upon what happened at Leipzig but takes note of a number of events that have taken place in the opening months of 2018 and that have some connection to, but also depend upon efforts outside of, the Digital Humanities Chair at Leipzig. Each taken separately is important. All of these events taken together reflect a broader, systematic change — and change for the better — in Ancient Greek and Latin philology in particular and, ultimately, for all philology.
On March 15, Eldarion released the initial version of the Scaife Digital Library Viewer. The release is, of course, a first step, but this first step changes the world in at least two fundamental ways: (1) Perseus is alive — it can finally include new materials on an on-going basis; (2) the Scaife Digital Library Viewer provides a foundation for an environment that can publish a growing range of born-digital, openly licensed, and networked (and fully networkable because they are openly licensed) annotations and micro-publications that cannot be represented in the incunabular digital publication systems that still internalize the limitations of print publication.
First, Perseus can now be configured so that it can include new materials almost immediately. We have not yet established a regular workflow — the initial Scaife Digital Library Viewer still runs on a server maintained by Eldarion rather than Tufts — but updates on a weekly and even a daily basis, if not real time, would be quite reasonable. New content does not even have to be in Greek or Latin — we already include a Persian edition of the Divan of Hafez. More importantly, if someone outside of the extended network of Perseus collaborators puts their content in the right format (for now CapiTainS-compliant EpiDoc TEI XML), we can include it. Thus, Neven Jovanovic was able to publish the first of what is expected to be a series of early modern Latin texts in Perseus (Scaliger’s Latin translation of Sophocles’ Ajax). Prof. Hayim Lapin from the University of Maryland converted his CC-licensed version of the Hebrew Old Testament , Talmud, and Mishnah. At present, anything that ends up as visible in the Scaife DL can be (because we require an open license) a permanent part of the Perseus collections. We need to think through a general process of content submission (and however open we wish to be, there are obviously some limits), but there are enough established collaborators with content to add and enough CC-licensed material that we would like to add that we already have enough materials to test a workflow for updates.
Second, the use cases of Perseus and of Digital Classics are not only more varied than those of print but involve so many data types and so many implicit use cases that they represent an emergent system. These include born-digital critical editions (with variants classified and dynamically configurable), diplomatic editions with alignments between transcription and source images, alignments between different versions of the same text in the same language, bilingual alignments between source texts and translations morphological and syntactic analyses, co-reference resolution, and other categories of linguistic annotation, social networks, geospatial data, representations of digital intertextuality (including annotations expressing estimating probabilities that a given word or phrase represents a paraphrase or direct quotation from a lost source text), and an unbounded set of potential new annotation classes. Use cases include not only specialists posing new kinds of questions (e.g., search a corpus for instances of “future less vivid conditionals” or a semantically clustered list of verbs associated with male vs. female agents) but a fundamentally new mode of interaction that we might term language wrangling or language hacking, where readers have such dense networks of explanatory annotations that they can engage immediately, at some level of precision, with any annotated source in any language, whether or not they have any prior of knowledge of that language. Such reading is a new form of engagement that lies between the experience of experts who have spent their 10,000+ hours immersed in a subject and the passivity that a print modern language translation, with no mechanisms to get past its surface and into the source text, imposes upon the reading mind.
Looking at the first release of the Scaife Digital Library Viewer, it is easy to see all the work that needs to be done. Indeed, for me, the steady progress towards a Perseus 5.0 only deepens my appreciation for what went into the development of Perseus 4.0 (the Java-based version, initially developed by David Mimno more than fifteen years ago and still in use at www.perseus.tufts.edu) and Perseus 3.0 (the Perl-based version that David A. Smith initially developed on the side to give Perseus its first web presence back in 1995). More than a decade ago, we solved another, less immediately obvious problem for having Perseus emerge as a place in which to publish content. In March 2006 (after being badgered by Ross Scaife, as well as Chris Blackwell, Gabby Bodard, Tom Elliott, Neel Smith and others), we began to apply a Creative Commons license to content that had no legal entailments. As soon as we decided that we would create collections that only contained CC-licensed content, we solved the bottleneck problem: so long as we actually made the content available, we could never use exclusive control over that content to restrict the development of services that we could not provide (say hello to Perseus Philologic, Alpheios.net).
The Scaife Viewer of March 2018 may only be a beginning. It may have a great deal more to do (e.g., integrating treebanks and source text/translation alignments). But the code is open and the possibilities are almost unbounded.
I am pleased to announce the first release of the Scaife Digital Library Viewer, a reading environment for source texts that follow the Canonical Text Services (CTS) data model. Our initial focus is on pre-modern sources, but the underlying approach applies to source texts of all periods. CTS provides a framework within which we can cite particular words in particular versions of particular texts — whether a version is a papyrus, manuscript, or a critical edition, whether versions of that text derive from a single lost original (as is the case for many ancient Greek and Latin texts) or the text itself appears in many versions, each of which has comparable authority (as is the case for many medieval sources). For those interested in more information, James Tauber, lead developer for this release, will present the Scaife Digital Library Viewer online on April 26 at 5 pm CEST as part of Sunoikisis Digital Classics. The presentation will be recorded and available, along with any other course materials, on the SunoikisisDC website after the class itself.
Ross Scaife (1960-2008) was a pioneer in reinventing the study of Greco-Roman culture to exploit the possibilities of a digital age. He was among the first — if not the first — to get tenure for a purely digital publication, Diotima: Materials for the Study of Women and Gender in the Ancient World, a project that he and Suzanne Bonefas launched in 1995 (the same year that David A. Smith, now an Assistant Professor of Computer Science at Northeastern, established the first web presence for Perseus). Ross was a colleague and he was a friend, whom we mourn still and will always miss. We lost him on March 15, 2008 and it is with fond memory that we announce the first version of the Scaife Digital Library in his honor, on March 15, 2018, ten years later.
Who is using Clarin (https://www.clarin.eu/) and/or Dariah (https://www.dariah.eu/), and particularly the German subprojects https://www.clarin-d.net/de/ and https://de.dariah.eu/, to work with historical languages?
If so, are you doing so as a funded member of Clarin or Dariah?
Who has used the Dariah repository (https://de.dariah.eu/repository) to store data? I have found documentation about how to add data but I have not yet found any collections stored within the Dariah repository?
What other services offered by Clariah and/or Dariah have you used? Are you planning to use any?
Ideally, this would generate a public discussion in forums such as Twitter, Humanist and the Digital Classicist mailing list but feel free to email me directly on gmail (gcrane2008).
The Perseus Digital Library at Tufts University invites applications to “Digital Editions, Digital Corpora, and new Possibilities for the Humanities in the Academy and Beyond” a two-week NEH Institute for Advanced Technology in the Digital Humanities (July 16-27, 2018). This institute will provide participants the opportunity to spend two intensive weeks learning about a range of advanced new methods for annotating textual sources including but not limited to Canonical Text Service Protocols, linguistic and other forms of textual annotation and named entity analysis. By the end of the institute, participants will have concrete experience applying all of these techniques not just to provided texts and corpora but to their own source material as well.
Faculty, graduate students, and library professionals are all encouraged to apply and international participants are welcome. Applications are due by February 1, 2018. In order to apply for the institute, applicants need to 1) complete the online registration form; 2) concurrently send a CV and statement of purpose by email to email@example.com