Publishing Text for a Digital Age

Publishing Text for a Digital Age

Update: Submissions now being accepted!

March 27-30, 2014

As a follow-on to “Working with Text in a Digital Age,” an NEH-funded Institute for Advanced Technologies in the Digital Humanities, and in collaboration with the Open Philology Project at the University of Leipzig, Tufts University announces a 2-day workshop for on publishing textual data that is available under an open license, that is structured for machine analysis as well as human inspection, and that is in a format that can be preserved over time. The purpose of this workshop is establish specific guidelines for digital publications that publish and/or annotate textual sources from the human record. The registration for the workshop will be free but space will be limited. Some support for travel and expenses will be available. We particularly encourage contributions from students and early-career researchers.


Posted in Uncategorized | Comments Off

New courses on Digital Philology at the University of Leipzig

October 2013 – January 2014: Overview of Digital Philology (5 credits)

April – July 2014: Current Topics in Digital Philology (10 credits)

[Please re-circulate]

*Research assistantships for enrolled students are available to students enrolled in these classes*

The Humboldt Chair of Digital Humanities at the University of Leipzig is developing a sequence of English-language courses on digital philology that will begin in the Wintersemester and Sommersemester of the 2013/2014 academic year. The courses may be taken in sequence or individually. We particularly encourage participation by graduate students, not only from Leipzig but from elsewhere in Europe and beyond, who are preparing to begin careers as researchers, teachers or library professionals. A semester or an academic year at Leipzig can help you transform your career and to acquire the skills by which you can flourish in an intensively network, profoundly global intellectual world.

These courses are particularly unusual in that they are offered within a Computer Science department and provide students with an opportunity to connect more directly with experts in advanced technologies than is often feasible. Germany also is unusual in that Computer Science and the Humanities are both instances of Wissenschaft — we do not face the boundaries between funding for research in the Humanities and in Computer Science that many in the English-speaking world face. If you wish to acquire the full range of skills needed for both teaching and research, these courses in this environment provide you with an excellent space in which to develop.

Note: particularly promising students enrolled in these classes will have an opportunity to work as research assistants, where they can apply the skills that they acquire in their classes. We particularly encourage ambitious students from outside Leipzig to consider this option to help support their stay.

An Overview of Digital Philology (5 credits, Wintersemester) provides students with programming skills needed to work with text in a digital age. We particularly focus upon the integration of methods from computational and especially corpus linguistics, both of which fields are fundamental to the study of language and critical to all who wish to develop flourishing careers as teachers and researchers in philology. The course is organized so that students can also take the Leipzig eHumanities Seminar (5 credits). In 2013, the course will focus particularly upon familiarizing students with XML and with the use of associated technologies (e.g., xslt, xquery).

While students who have taken the Overview of Digital Philology will be able to build on their knowledge in developing course projects, the Sommersemester course, Current Topics in Digital Philology (10 credits, Sommersemester), is open to anyone with advanced experience in either computer science or philology. Current Topics in Digital Philology provides a framework within which students of language from various backgrounds can develop projects informed by new advances in corpus and computational linguistics and in the digital humanities. In 2014, students will develop skills in the use of Python to work with richly annotated linguistic corpora and then use these skills in course projects.


[Please re-circulate]

Posted in Uncategorized | Comments Off

Mellon Funds Perseids Project

The Perseus Digital Library is pleased to announce new support from the Andrew W. Mellon Foundation for the development of the Perseids Platform for collaborative editing, learning and publication, under the direction of Perseus’ Associate Editor, Professor Marie-Claire Beaulieu. The Perseids platform integrates open source software and standards from a variety of projects including Perseus,, The Homer Multitext Project, the Alpheios Project, and the Open Annotation Collaboration. This $600,000 grant funds the expansion of Perseids to support new use cases for classroom collaboration on digital editions; scholarly curation of texts from scan through publication; and development of dynamic syllabi using managed resources. The grant will also enable continued collaboration across the variety of integrated projects and outreach to scholars interested in using the platform.

We are now accepting applications for an Associate Research Programmer to help with development of Perseids. Interested applicants are asked to follow the above link. Follow up questions may be emailed to

More information can be found on the Perseids project blog and documentation site at

Posted in Uncategorized | 1 Comment

Announcing The Perseus Catalog, release 1.0

The Perseus Digital Library is pleased to announce the 1.0 Release of the Perseus Catalog.

The Perseus Catalog is an attempt to provide systematic catalog access to at least one online edition of every major Greek and Latin author (both surviving and fragmentary) from antiquity to 600 CE. Still a work in progress, the catalog currently includes 3,679 individual works (2,522 Greek and 1,247 Latin), with over 11,000 links to online versions of these works (6,419 in Google Books, 5,098 to the Internet Archive, 593 to the Hathi Trust). The Perseus interface now includes links to the Perseus Catalog from the main navigation bar, and also from within the majority of texts in the Greco-Roman collection.

The metadata contained within the catalog has utilized the MODS and MADS standards developed by the Library of Congress as well as the Canonical Text Services and CTS-URN protocols developed by the Homer Multitext Project.  The Perseus catalog interface uses the open source Blacklight Project interface and Apache Solr. Stable, linkable canonical URIs have been provided for all textgroups, works, editions and translations in the Catalog for both HTML and ATOM output formats. The ATOM output format provides access to the source CTS, MODS and MADS metadata for the catalog records. Subsequent releases will make all catalog data available as RDF triples.

Other major plans for the future of the catalog include not only the addition of more authors and works as well as links to online versions but also to open up the catalog to contributions from users. Currently the catalog does not include any user contribution or social features other than standard email contact information but the goal is to soon support the creation of user accounts and the contribution of recommendations, corrections and or new metadata.

The Perseus Catalog blog features documentation, a user guide, and contact information as well as comments from Editor-in-Chief Gregory Crane on the history and purpose of the catalog.

The Perseus Digital Library Team

Posted in Uncategorized | Comments Off

Jobs at the Humboldt Chair in Digital Humanities

Jobs at the Humboldt Chair in Digital Humanities

University of Leipzig

[Please re-post]

In February 2013, the Humboldt Chair of Digital Humanities announced possible jobs. Funding from the European Social Fund has now been finalized ( and we are pleased to announce two positions: one for someone to supervise systems and text processing workflow; the other for someone with expertise in interactive design.

Applicants should have completed their most recent degree after January 4, 2011. Positions will begin June 1, 2013 or as soon as a suitable candidate is found, and will run through December 2014. Pay will be commensurate to experience under Saxon and ESF regulations.

System and workflow manager

We are looking for an addition to our team who will develop and administrate scalable systems and workflows for processing and visualizing billions of words. Our text analysis includes the latest technologies in OCR, linguistic annotation, named entity identification, text reuse and topic modeling. You will be working in an international and interdisciplinary group of young scientists, aiming to create a new generation of tools and methods for learning, analyzing and interacting with languages. The job will range from general systems administration to the planning, implementing, managing and monitoring of completely novel software systems.


Required skills and experience:

  • Linux/Unix System Administration
  • Build management system (at least one of Maven, Gradle or Ant)
  • Scripting language (at least one of Perl, Python, Ruby or Clojure)
  • Version control (at least one of Git, Mercurial or Subversion)
  • Unit test development

    Desired skills and experience:

  • Experience with workflow tools such as Taverna
  • Object oriented programming language (e.g. Java, Python, C++)
  • Multi-threaded / distributed programming
  • RDF, XML and Linked Data concepts
  • Continuous Integration environments and test-driven development


  • Ability to prioritise
  • Attention to detail
  • Ability to take initiative (self-driven), to work independently and as part of a team
  • Forward planner
  • Clear focus on high quality

    Interactive designer

    We are looking for an addition to our team who will join us in developing new methods by which users can interact with historical sources in general and with the collections in the Perseus Digital Library in particular. In this position, you will build ‘gamified’ user interfaces for eLearning applications which enable students to contribute to current research, to receive and give feedback, and to track and analyze their learning progress. You will be working in an international and interdisciplinary group of young scientists, aiming to create a new generation of tools and methods for learning, analyzing and interacting with languages.


    Required skills and experience:

  • Graphic design
  • Javascript, CSS and HTML
  • RDF, TEI XML and Linked Data concepts

    Desired skills and experience:

  • HTML5 and/or mobile application development skills
  • Knowledge of Ancient Greek or Latin
  • Experience with linguistic annotation


  • Strong attention to detail
  • Ability to take initiative (self-driven), to work independently and as part of a team
  • Empathic communicator
  • Creative and forward planner
  • Clear focus on high quality

    Please send a CV and a (short) cover letter to

  • Posted in Uncategorized | 3 Comments

    “Reinventing Humanities Publication Project” receives €1.1 million grant from the Saxon Ministry of Science and European Social Fund

    The Saxon Ministry of Culture has awarded the University of Leipzig a €1.1 million grant, with support from the European Social Fund and from the State of Saxony, to form an early career research group to help develop new methods of publication, predicated upon open data and open access, for the Humanities in general and for students of historical languages such as Greek and Latin in particular. This grant provides a first step towards reestablishing Leipzig as an international center for humanities publication, especially in technologically challenging areas such as ancient languages and music – and doing so in a framework that is born-digital from the start and that assumes the constraints and possibilities of an open, digital environment.

    An English version of the press release from the State of Saxony ( follows:

    Early Career Research Group starts at the University of Leipzig
    On May 1, 2013, an early career research group, “the reinvention of humanities publishing in a digital age,” began work at the University of Leipzig.

    Under the leadership of Humboldt Professor Gregory Crane, the team of early-career researchers will draw on emerging technologies such as Natural Language Processing, Text Mining, and Digital Libraries to help reinvent publication in the Humanities. “The goal is the creation of a comprehensive, openly accessible collection of data for Greek and Latin. In this era of digital publication and with this new collection as a foundation, Leipzig has a chance to reestablish its traditional role as a center for the publication of historical sources in languages such as Greek and Latin”, says classical philologist Professor Gregory Crane. Under his leadership seven researchers from the Institute of Computer Science are working on the project.

    The Ministry for Science and Art has provided support for this project with resources from the European Social Fund (ESF) and the State of Saxony. The Early Career Research Group has received roughly €1.1 million in support. The project runs through 2014.

    “The early career researches pursue an ambitious and important research goal. The use of new information technologies can carry humanities research into new dimensions. In addition research in the area of Digital Humanities is an important building block for further increasing the University of Leipzig’s profile in humanities research in general,” explained State Science Minister Sabine von Schorlemer.

    As Matthias Schwarz, Prorector for Research and Support of Early Researchers at the University of Leipzig, points out, it is not simply a question of making the University’s rich humanities resources accessible through new, IT-enabled methods of research, but it is also significant from the standpoint of employment policy. “Our main goal is to establish new career paths for humanists with IT-skills. On the one hand, the University of Leipzig is seizing the opportunity to strengthen the underlying structure of the humanities – an area in which Leipzig sees itself as establishing a particular strength among the various institutions of higher learning in Saxony. At the same time, through this ESF-support, early career development in the humanities is particularly well adapted for the modern job market,” Prorector Schwarz said.

    A number of early career research groups are being founded in the area of Digital or e-Humanities. As a comprehensive university with a particular strength in the humanities and outstanding competence in Computer Science, the University of Leipzig has long been focusing on the relatively new field where researchers aggressively exploit information technologies to take their research in the humanities to a higher level. Professor Gregory Ralph Crane is a pioneer of the e-Humanities. On 1st April he began his Alexander von Humboldt Professorship at the University of Leipzig.

    Posted in Uncategorized | 5 Comments

    Rediscovering Philology

    Gregory Crane
    Alexander von Humboldt Professor of Digital Humanities
    University of Leipzig
    Professor of Classics
    Tufts University
    Editor in Chief, Perseus Project

    This paper began as a contribution to the debate on whether or not the APA should change its name. A hundred and forty years later, the central leadership of the American Philological Association (APA) has resolved to abandon the name of philology and proposed to adopt for the association the name “Society for Classical Studies.”# I would argue against this on three grounds. First, we need to retain a qualifier in our name that reflects the fact that the APA is the organization to which most professional students of Greco-Roman culture in the United States turn. Second, classics and classical studies are now problematic names for a group that focuses primarily upon Greco-Roman culture because the term “classics” has been used to assert the primacy of Greek and Latin and of Western culture in general.

    Most of what follows, however, focuses more generally upon a third point, the nature and role of philology. The challenge for students of Greco-Roman culture is not to run away from, but to make the case for, philology. The members of the American Philological Association may draw upon the material record and upon methods from around the academic world, but they combine these sources and methods with the written record to understand the Greco-Roman world as broadly and deeply as possible. If few now in the English speaking world understand what philology is, then that presents an opportunity for those of us who have the privilege to work with Greek and Latin for a living. We should blow the dust off the ancient and (I believe) easily explained term philology — easily explained and easily justified if we use the term in its broadest and most dynamic sense. Philology entails — or should entail — everything that we can learn about the past from the linguistic record. Philology is neither narrow nor antiquated. It is an expansive set of practices, now undergoing a rebirth as students of the past adapt to the new opportunities of a digital space. If the twentieth century saw the rise of Classics in modern language translation, the digital technologies already at our disposal allow us to make the Greek and Latin sources directly accessible to a global audience.

    For the full discussion, see Rediscovering Philology.

    Posted in Uncategorized | Comments Off

    The Open Philology Project and Humboldt Chair of Digital Humanities at Leipzig

    Initial Research Plan (April 2013)
    Alexander von Humboldt Chair of Digital Humanities
    The University of Leipzig

    Abstract: The Humboldt Chair of Digital Humanities at the University of Leipzig sees in the rise of Digital Technologies an opportunity to re-assess and re-establish how the humanities can advance the understanding of the past and to support a dialogue among civilizations. Philology, which uses surviving linguistic sources to understand the past as deeply and broadly as possible, is central to these tasks, because languages, present and historical, are central to human culture. To advance this larger effort, the Humboldt Chair focuses upon enabling Greco-Roman culture to realize the fullest possible role in intellectual life. Greco-Roman culture is particularly significant because it contributed to both Europe and the Islamic world and the study of Greco-Roman culture and its influence thus entails Classical Arabic as well as Ancient Greek and Latin. The Humboldt Chair inaugurates an Open Philology Project with three complementary efforts that produce open philological data, educate a wide audience about historical languages, and integrate open philological data from many sources: the Open Greek and Latin Project organizes content (including translations into Classical Arabic and modern languages); the Historical Language e-Learning Project explores ways to support learning across barriers of language and culture as well as space and time; the Scaife Digital Library focuses on integrating cultural heritage sources available under open licenses.

    The Humboldt Chair of Digital Humanities at Leipzig will create the Open Philology Project. In this we advance a digital successor to that philology which sees in language a source for what Augustus Boeckh in 1822 termed “the understanding of all antiquity, including the events of both the physical and intellectual world.”[1] Philology brings the past to life as deeply and as broadly as possible through the use of surviving linguistic sources. From the human perspective philology constitutes a set of language-based critical scholarly skills — not only annotating (annotation is the basic genre), but also comparing, connecting, interpreting, proving or rejecting hypotheses, finding evidence; critical apparatuses and commentaries often preserve condensed fruits of such reasoning, and Open Philology doesn’t let the scholarly heritage of manuscript and print culture vanish, converting it into digital form and using it as a training field for next generations.

    The Open Philology Project will initially focus particularly upon pre-modern society but its methods and goals apply to any society for whom traces of their languages survive. Philology provides an opportunity to advance the intellectual life of individual societies and, equally important, dialogue across civilizations, transcending not only barriers of space and time but of language and culture. Digital technology plays a critical role as a catalyst because — and only because — it allows us to re-imagine how we can more fully achieve, and indeed transform our ability to achieve, these ancient goals of philology. This is not a digital philology or digital humanities project. The Open Philology Project is about philology.

    To address the vast challenge of an Open Philology that embraces all historical languages, the Humboldt Chair begins by advancing within a European and a global space the role of that Greco-Roman culture out of which Europe largely emerged. Greco-Roman culture has also contributed significantly to the Islamic world and Europe depended upon Arabic sources. Our goal in this activity is not only to increase the intellectual accessibility of European cultural heritage but also to foster exchange of cultural heritage sources such as Persian, Sanskrit, Classical Chinese, Egyptian from the earliest forms through Coptic, and the Cuneiform Languages of the Ancient Near East, and Classical Mayan from the Western Hemisphere. As a platform for this activity, the Open Philology Project builds upon, and helps develop, the Perseus Digital Library, working with colleagues in Europe, North America and elsewhere to expand open collections and services and to reach an increasingly global audience.

    The greatest challenge of humanistic scholarship lies, in our view, in making available the human cultural heritage to the global community. Digitization is a necessary but, by itself, insufficient step in this process. Human cultural heritage must be represented in a way that supports intellectual access across barriers of language and culture. This requirement in turn has implications for the technologies but also for the rights regime that we choose. Open data provides the best strategy by which to promote the circulation of sources within a global context. Collections that are protected behind subscription barriers may serve the interests of specialist communities. Collections that cannot be freely modified and re-circulated may be useful for reference. But scholarship in general and philology in particular must build upon open data if it is to realize its intellectual and social obligations to advance the common understanding of human culture. The Humboldt Chair is therefore committed to open source publication, with machine-actionable Creative Commons licenses requiring attribution and sharing of data and allowing commercial reuse (CC-BY-SA) as the preferred mode of distribution.

    The larger Open Philology Project begins with three specific, complementary activities, addressing the challenge of creating comprehensive open resources, providing the education needed to understand and to contribute to those resources, and integrating open resources from many different sources into an integrated computational framework for analysis, annotation, and preservation.

    First, the Open Greek and Latin Project makes Greek and Latin sources freely accessible, both digitally and intellectually, to a global public. Second, the Historical Language e-Learning Project provides distributed e-learning of historical languages such as Greek and Latin so that as many as possible may penetrate as deeply as they choose into the sources from which the present has been fashioned. Third, support from the Humboldt Foundation allows us to contribute, after years of planning, to the Scaife Digital Library. The SDL develops methods to aggregate and integrate from various sources open data, textual and archaeological alike, in any medium, about human cultural heritage, including, but not limited to, the Greco-Roman world.

    All three of these projects focus on the production, analysis, and preservation of machine-actionable annotations. All data about historical records is based upon transcriptions, whether from text-bearing objects or from sound recordings, which are themselves annotations that describe the textual content from a region of a written surface or a time interval in a recording. We will continue to make arguments in the digital successors to notes, articles and monographs but we should increasingly integrate into, and use as the foundation for, those arguments machine actionable links to the sources upon which they are based. These links include not only citations to particular sources (e.g., a machine actionable link to a particular reading in a particular edition of Aeschylus) but also to aggregate data (e.g., the results of a search posed as they appeared at a particular time). In the end, born-digital notes, articles and monographs — if they preserve labels inherited for the form of a book — may preserve a family resemblance to their predecessors but they will surely evolve into something qualitatively different as the adapt to the different gravity, if not fundamentally different physics, of a digital space.

    1. The Open Greek and Latin Project.

    The ultimate goal is to represent every source text produced in Classical Greek or Latin from antiquity through the present, including texts preserved in manuscript tradition as well as on inscriptions, papyri, ostraca and other written artifacts. Over the course of the next five years, we will focus upon converting as much Greek and Latin, available as scanned printed books, into an open, dynamic corpus, continuously augmented and improved by a combination of automated processes and human contributions of many kinds. The focus upon Greek and Latin reflects both the belief that we have an obligation to disseminate European cultural heritage and the observation that recent advances in OCR technology for Greek and Latin make these intertwined languages ready for large-scale work. This focus also builds upon years of work by many projects in Greek and Latin, including and the Homer Multitext Project, the Inscriptions of Aphrodisias., and CIL Open Access.

    The Open Greek and Latin Project aims at providing at least one version for all Greek and Latin sources produced during antiquity (through c. 600 CE) and a growing collection from the vast body of post-classical Greek and Latin that still survives. Perhaps 150 million words of Greek and Latin, preserved in manuscripts, on stone, on papyrus or other writing surface, survive from antiquity. Analysis of 10,000 books in Latin, downloaded from, identified more than 200 million words of post-classical Latin. With 70,000 public domain books listed in the Hathi Trust as being in Ancient Greek or Latin, the amount of Greek and Latin already available will almost certainly exceed 1 billion words.

    Where existing corpora of Greek and Latin have generally included one edition of a work, Open Greek and Latin Corpus is designed to manage multiple versions of, and to represent the complete textual history of, a work: every manuscript, every papyrus fragment, and every printed edition are all versions within the history of a text. In the short run, this involves using OCR-technology optimized for Classical Greek and Latin to create an open corpus that is reasonably comprehensive for the c. 150 million words produced through c. 600 CE and that begins to make available the billions of words produced after 600 CE in Greek and Latin that survive.

    The Open Greek and Latin Project assumes the following modules:

    A. The Philological Workflow Module enables a digital representation of a written source, available in a 2D or 3D form, to be converted into machine actionable text, corrected, and annotated with an increasing range of information (named entities, morphology, syntax, and other linguistic features, alignments between different versions of the same text, whether in the same language or translated across multiple languages, text re-use detection, including quotation, paraphrase and citation). Automated methods include Optical Character Recognition, Text Alignment, Syntactic Parsing, etc. In each case, human annotation can augment automated annotations or substitute for them altogether where automated methods are not yet able to produce adequate initial results (e.g, manual transcription of inscriptions and medieval manuscripts).

    B. The Distributed Review Module provides a range of options by which to assess and represent the reliability produced, whether by automated systems or by human contributors, as part of the Philological work flow. In many cases annotations can be released even when their reliability is not necessarily high (e.g., noisy OCR-generated text). The point is to identify annotations that most require subsequent attention, whether manual correction or action of some other kind (e.g., poor OCR data may reflect the need to create a new scan of a printed book). The Distributed Review Module assumes that multiple annotations may be equally trustworthy (i.e., experts back different interpretations) and can track inter-annotator disagreement among experts. The Distributed Review Module provides default values but also allows for different weights to be placed upon different validations (e.g., include all readings in a particular version of a text, whether these are readings in a particular manuscript or the readings chosen and emendations proposed by a particular editor, include all prosopographical identifications proposed by one particular scholar). The Distributed Review Module should support searching by both text characteristics (specific passages, authors), annotator characteristics (expert, novice, native language etc.), and annotation characteristics (emendations, grammatical or interpretive comments, degree of inter-annotator disagreement, etc.). But it should also permit browsing the history of annotation by passage, annotator, magnitude of disagreement etc.

    C. The Philological Repository Module can preserve all published philological data, including the transcriptions and all subsequent annotations (e.g., identifying a transcribed word as being in Latin, a place name, in the accusative case etc.) as well as the provenance of each annotation (e.g., the annotation is born-digital and was published by a particular individual at a given time or the annotation was extracted from a print book by a particular author and published at a given time, with or without human verification, and with an estimated accuracy). The repository is based upon the Canonical Text Services/CITE Architecture for textual sources developed by researchers at the Center for Hellenic Studies within the larger framework developed by the

    D. The e-Portfolio Module aggregates and distributes particular subsets of user contributions for particular audiences. The e-Porfolio Module can identify any published contributions according to type, date, and author (e.g., all syntactic analyses published by a particular person during a particular time interval). The e-Portfolio Module can also make selected materials that are not yet published available to selected audiences (e.g., an editorial board or the admission committee for a degree program). The Perseids Project from Tufts University provides a starting point for this work.

    2. The Historical Language e-Learning Project.

    Anyone, anywhere, regardless of their linguistic or cultural background, whether they are a student in a formal curriculum or not, should be able to learn as much of a historical language as they need to work directly in original-language primary materials. Work in this context entails not only learning but contributing early and in increasingly sophisticated ways: students can add new, or correct existing, data as they learn to type in an unfamiliar language, while they can, in the language of gaming, “level up” to tasks such as linguistic annotation of new materials and the production of aligned, modern language translations, and see their growing proficiency concretely visualized in a way that permits them to compare it to that of others and documents it for use in e-portfolios and other records of their achievement.

    In the short run, building upon existing collections and services, we will support students working with Greek, Latin and Classical Arabic texts in a system readily localized for speakers of multiple modern languages (with Croatian, English, German and French emerging as initial languages of interest). The Historical Language e-Learning Project is based upon the existence of extensible richly annotated corpora. Learners draw from the start on existing richly annotated corpora and on images of sources such as manuscripts and inscriptions. They use morpho-syntactic annotation, dictionary links, and aligned modern language translations, so that they immediately work with primary sources in the original. They learn grammar by comparing their morpho-syntactic analyses with vetted analyses already available, by creating their own aligned translations, and by using annotations and alignments to develop active as well as passive mastery of morphology, syntax, and vocabulary. They demonstrate advanced ability by expanding the corpus of richly annotated materials, proposing new annotations of their own and reviewing annotations proposed by others.

    Ancient Greek, Latin and Classical Arabic Large collections such as Gallica, Google Books, and the Internet Archive have already made billions of words in Greek and Latin available to a global audience — a far larger collection than the small handful of advanced researchers can document and a far broader collection in terms of genre and style than the classical corpora on which current programs in Greek and Latin still focus. While the amount of openly licensed Classical Arabic is not yet as extensive, more than enough sources are available and require documentation and analysis. We need to train a new generation of students, who can directly analyze sources in the original languages and make substantive contributions earlier and on a wider range of sources than has previously been feasible.

    Traditional programs of Ancient Greek and Latin are not designed to support students who first develop an interest in these languages during their undergraduate careers — by the time students are able to begin interacting proficiently with the primary sources, they are ready to graduate. Traditional class schedules are rigid and rarely can an institution offer more than one section of an ancient language. As for Classical Arabic, few institutions offer any formal instruction at all — Modern Language Association statistics report only 285 students enrolled in Classical Arabic in the United States in 2009.

    At the same time, Ancient Greek, Latin, and Classical Arabic must also compete for students with fields where students regularly contribute as members of laboratory teams and can often expect to develop their own research projects as undergraduates. The strongest academic programs not only demand that students master complex disciplinary knowledge but also provide students with an opportunity to use that knowledge to make substantive contributions and to develop significant research projects of their own.

    Open Greek and Latin creates an inexhaustible range of substantive activity to which any student of these languages can aspire — whether working on manuscripts of well-known authors (e.g., the Homer Multitext Project), creating the first modern language translations of Greek and Latin sources (e.g., Tufts’ Medieval Latin), or adding critical linguistic annotation (e.g., the Perseus Greek and Latin Treebanks).

    The Historical Language e-Learning project depends upon the following:

    A. Global Editions of Historical Languages include all features of a traditional edition (including textual notes) but are designed to make primary sources available to the widest possible audience. Global editions are richly encoded source materials that include enough annotation so that readers with a general understanding of grammar and of language are able to work directly with primary sources in a historical language that they have not studied. Core elements to this infrastructure include morphological and syntactic analyses, links to machine readable dictionaries (ideally with data about word sense of a given word in a given context), and one or more aligned modern language translations that themselves have substantial annotation and are designed to facilitate machine translation into many other modern languages.

    B. Preliminary Source Texts are digital texts that do not yet have the mature annotations needed for global editions. Students of a language can aspire to begin adding new annotations within the opening weeks of study, working at first with each other and with their instructors but ultimately working to level up to roles with more trust and responsibility as they demonstrate their increasing skills. It is both a goal and a necessity to engage students as collaborators, because we believe that this is a good thing in itself, because we believe that this increases learning, and because so many historical sources are already available that we cannot depend upon a handful of professionals to analyze and annotate them all.

    C. Machine Actionable Models of Language Competence provide methods by which to assess knowledge of historical languages at every level, from introductory exposure to the language through standardized examinations (e.g., the US-based National Latin exam, the German Graecum and Latinum) to the various PhD level examinations (e.g., US PhD programs in Greek and Latin commonly have combined reading lists of between 500,000 and 1,000,000 words of Greek and Latin). Where the city of Arpino holds the Certamen Ciceronianum Arpinas, a multinational competition for students from various nations — each of whom can compete in their own national language — we can create an on-going contest, where students from around the world and from widely disparate backgrounds can meet to compare their skills and compete to shed light upon Greco-Roman culture. Machine Actionable Models of Language Competence can be configured for various purposes and pedagogical perspectives. The Competence Models also provide mechanisms for evaluation of competence across national languages — examinations on morphology and syntax provide a powerful measure of competence and can be effectively localized in various national languages — whether the student speaks Arabic or Croatian, English or Lithuanian. The Distributed Review Module provides an environment for assessment of language competence as well as for advanced publications.

    D. Localized Learning Materials include grammars, lexica, and translations in a national language. Localized Learning Materials need to be able to be shared across, and customized for, many different languages. Within Europe alone, Greek and Latin, for example, are taught in more than thirty different national languages.[2] We need not only to maintain learning materials in dozens of languages but also to provide learning materials in languages where Greek and Latin are not part of formal academic curricula. To accomplish this we must represent as much information about the language in machine actionable form that can be efficiently represented in many languages. We also need to provide an architecture that supports customization for particular languages, especially the creation of aligned translations that contain from the start links between the source text and the modern language translation.

    E. Dynamic Syllabi can be analyzed to track the linguistic phenomena that students have encountered (e.g., vocabulary, grammar) and the content that they have covered. As students pursue different dynamic syllabi at different times, they can track their overall background and necessary background information needed to pursue subsequent courses. Instructors in structured classes can generate personalized background readings and examinations that reflect both what students brought with them to, and what they covered in, the class. The e-Portfolio Module uses Dynamic Syllabi to accomplish these goals.

    F. Personalized E-learning Tools analyze individual behaviors of particular learners and provide personalized analyses and suggestions, reflecting the strengths and learning styles of particular students. Personalized e-learning tools allow learners not only to track their progress towards target proficiencies and but also to personalize the target proficiencies as well: students of Homer will have different targets than those of the New Testament or of Plato, while students aspiring to fluent comprehension will have different needs than intellectual historians who wish to explore word usage or linguists interested in syntactic phenomena. The goal is to provide as much feedback as possible, as quickly as possible, and as closely adapted to the needs and interests of each learner as possible.

    3. The Scaife Digital Library (SDL)

    The Scaife Digital Library (SDL) commemorates Ross Scaife (March 31, 1960 – March 15, 2008) who did pioneering work for the study of Greco-Roman culture in a digital age, who was committed to collaborative scholarship and who was a champion of open data. The SDL is designed as a service, as an experiment, and as a space for research. An increasing amount of Ancient Greek, Latin, Classical Arabic and other sources are available under appropriate open licenses. The SDL builds upon the services and collections listed above. The SDL provides a mechanism by which to compare these services and collections with those available elsewhere, allowing research at the Humboldt Chair to explore new methods while making its own work more visible.

    As a service, the SDL will aggregate as much content in these languages as possible, converting, where necessary and feasible, into interoperable formats. The goal of this service is to provide a single space to represent all published Ancient Greek, Latin, and Classical Arabic. In this context, publication entails release under an open license. Proprietary collections are neither public nor published.

    As an experiment, the SDL will track how many sources in how many historical languages and of how many types it can identify and integrate and then track this data over time. This experiment attempts to measure both our ability to find materials and the change in what is available. It is our hope that the growth in available resources in languages beyond Greek, Latin, and Arabic will greatly outstrip the ability of the SDL to aggregate and analyze them.

    As a research space, the SDL collects not only metadata but also source materials into a single environment. Within this space researchers can explore customized collections (e.g., all available versions and translations of the Odes of Horace or the aligned corpus of Classical Arabic translations and Greek sources) or simply analyze all available Greek. While the SDL may collect as widely as possible from open collections representing cultures from around the world, the SDL intends to provide the most comprehensive possible coverage and services for students of Ancient Greek, Latin, and Classical Arabic.

    [1] Augustus Boeck, “Oratio nataliciis Friderici Guilelmi III.” (1822): “Itaque ubi, quae et qualis philologia meo iudicio sit, quaeritis, simplicissima ratione respondeo, si non latiore, quae in ipso vocabulo inest, potestate accipitur, sed ut solet ad antiquas litteras refertur, universae antiquitatis cognitionem historicam et philosophicam.”


    Posted in Uncategorized | 7 Comments