Visiting research scholar at Perseus, Nick White has released a new version of Ancient Greek OCR, free software to accurately convert scans of printed Ancient Greek into unicode text and PDF files, which can be easily searched, copied, archived, and transformed. It uses the excellent Tesseract OCR engine, tailored for Ancient Greek typography, syntax and vocabulary. Please visit the project website for more information.
Many different linguistic services and tools are dependent on lexical information as it is commonly found in Latin and Greek dictionaries. Most of these applications rely on their own implementation of dictionaries, stem databases etc. but there is no centralized open-access resource on which these services can draw for supporting data. The Perseus Digital Library is releasing its lexical data as an open linked data set, starting with Latin and to be followed by Greek, in the hopes that it may eventually become such a resource. Work on producing this data set has been a collaborative effort, and would not have been possible without the guidance of Neel Smith of Holy Cross and Helma Dik of the University of Chicago.
The core of the Perseus Lexical Inventory is a CITE collection of Lexical Entity URIs. Each Lexical Entity identifier has associated properties including a normalized form of the lexical entity (or lemma) and a short definition. The accompanying linked data set includes links between the Lexical Entity URIs, morpheus lemmas, and entries in the Lewis and Short lexicons on Perseus, Alpheios and Logeion. A VOID file describing the data set is available at http://data.perseus.org/ds/lexical/void and a SPARQL endpoint for querying the data set is at http://services.perseus.tufts.edu/fuseki/sparql.html. There is also a simple demonstration query form that looks up entries based upon the Latin form at http://perseids.org/tools/lexical/query.html. The Tufts Morphology Service (currently available at http://services.perseids.org/bsp/morphologyservice ) also supplies the corresponding Lexical Entity URIs for lemmas returned by Morpheus.
Subsequent updates to the data set will include links to ontologies and other collections of uniquely identifiable entities, including part of speech, lexical tokens or forms, stems, prefixes and suffixes, morphological analyses, metrical data, orthographical variants, and named entities. The lexical entities and tokens will also be linked to their occurrences in dictionaries and other lexica, texts (i.e. of the Perseus corpus, among others), treebanks, etc. Finally we expect to link to other established and emerging data sets, including the Pleiades Gazetteer and the SNAP dataset of ancient prosopography, among others.
Our ultimate goal is for the lexical data sets to be completely open with various channels, including both user interfaces and service-based APIs, through which people and systems can contribute new data and corrections.
In keeping with the approach we have been taking with the release of our data (see the Perseus Catalog’s Roadmap towards Linked Data standards compliance) we are releasing the data knowing we have much work to do still, and will make progress towards the larger vision in incremental steps. Our next steps will include release of a companion Greek Lexical Inventory, followed by the addition of the stem and lexical token data sets and development of APIs and interfaces for using and contributing to the data.
Pelagios, Pleiades, and Perseids workshop took place at week-long hackathon
On Monday, March 3, students in Marie-Claire Beaulieu’s Medieval Latin class and Maxim Romanov’s Geography of the Classical Islamic World held a workshop together with the Pelagios team. Leif Isaksen (University of Southampton), Elton Barker (Open University), and Rainer Simon (Austrian Institute of Technology) directed the students in using the Pelagios interface to annotate place names in Latin, English, and Arabic documents. We were fortunate to also have Tom Elliott (New York University Institute for Studies of the Ancient World), the co-managing editor of the Pleiades Gazetteer used by Pelagios, participating in the workshop.
Read about Perseids, the Bodin Project, Corpora and Digital Humanities at Tufts in the article, Tufts Reimagines Humanities Using Tools of the Digital Age.
The Humboldt Chair of Digital Humanities at the University of Leipzig is pleased to announce a new effort within the Open Philology Project: the Leipzig Open Fragmentary Texts Series (LOFTS). In the first phase of LOFTS we invite public discussion as we finalize the goals, technological methods and editorial practices.
The Leipzig Open Fragmentary Texts Series is a new effort to establish open editions of ancient works that survive only through quotations and text re-uses in later texts (i.e., those pieces of information that humanists call “fragments”).
As a first step in this process, the Humboldt Chair announces the Digital Fragmenta Historicorum Graecorum (DFHG) Project, whose goal is to produce a digital edition of the five volumes of Karl Müller’s Fragmenta Historicorum Graecorum (FHG) (1841-1870), which is the first big collection of fragments of Greek historians ever realized.
For further information, please visit: http://www.dh.uni-leipzig.de/wo/open-philology-project/the-leipzig-open-fragmentary-texts-series-lofts/
We are pleased to announce that records from the Perseus Catalog have beed loaded into VIAF as “the first set of personal names from a scholarly resource.”
More information on this milestone may be found here: http://hangingtogether.org/?p=3455
Our thanks to OCLC Research!
Publishing Text for a Digital Age
Update: Submissions now being accepted!
March 27-30, 2014
As a follow-on to “Working with Text in a Digital Age,” an NEH-funded Institute for Advanced Technologies in the Digital Humanities, and in collaboration with the Open Philology Project at the University of Leipzig, Tufts University announces a 2-day workshop for on publishing textual data that is available under an open license, that is structured for machine analysis as well as human inspection, and that is in a format that can be preserved over time. The purpose of this workshop is establish specific guidelines for digital publications that publish and/or annotate textual sources from the human record. The registration for the workshop will be free but space will be limited. Some support for travel and expenses will be available. We particularly encourage contributions from students and early-career researchers.
More information at the link above.
The position will start September 1, 2013.
Interested applicants are asked to follow the above link for further information.
Follow up questions may be emailed to firstname.lastname@example.org.