Design Sprint for Perseus 5.0/Open Greek and Latin

Tuesday, August 15, 2017: Leipzig has published the official RFP: https://goo.gl/EoxFPT. If you compare the English below, with the final version in German, you will see that they did quite a bit of work to streamline our draft. The key point is that the deadline for submissions is August 24, 2017. The RFP is in German but our purchasing office worked hard to facilitate the process of applying. For questions, the contact at Leipzig is Herr Christoph.Sedlaczek.

Scheduling. We are done with the English version of the RFP and will begin producing a German version (which is apparently a requirement). The German translation will be done quickly. Nothing, of course, is official, final or binding until the Leipzig administration publishes the RFP.

A DRAFT German version is now available at https://goo.gl/1Zoxeg.

[DRAFT] Request for Proposals for work on the Scaife Digital Library Viewer.

The following document presents a Request for Proposals (RFP) for the Scaife Digital Library Viewer. The initial version of the Scaife Digital Library Viewer must support searching and reading of the Open Greek and Latin collection within a new version of Perseus (Perseus 5.0). The RFP solicits proposals for a three month sprint (October – December 2017), subsequent testing (January – March 2018) and a formal roll out tentatively scheduled for March 15, 2018, ten years after Ross, passed away at an all too early age.

We announced in June that Center for Hellenic Studies had signed a contract with Intrepid.io to conduct a design sprint that would support Perseus 5.0 and the Open Greek and Latin collection that it will include. Our goal was to provide a sample model for a new interface that would support searching and reading of Greek, Latin, and other historical languages. The report from that sprint was handed over to CHS on Friday, July 21, 2017, and on July 22 we, in turn, made these materials available, including both the summary presentation and associated materials. The goal was to solicit comment and to provide potential applicants to the planned RFP with access to this work as soon as possible.

The sprint took just over two weeks and was an intensive effort. An evolving Google Doc with commentary on the Intrepid Wrap-up slides for the Center for Hellenic studies has been visible since July 24. Readers of the report will see that questions remain to be answered. How will we represent Perseus, Open Greek and Latin, Open Philology, and other efforts? One thing that we have added and that will not change will be the name of the system that this planned implementation phase will begin: whether it is Perseus, Open Philology or some other name, it will be powered by the Scaife Digital Library Viewer, a name that commemorates Ross Scaife, pioneer of Digital Classics and a friend whom many of us will always miss.

The Intrepid report also includes elements that we will wish to develop further — students of Greco-Roman culture may not find “relevance” a helpful way to sort search reports. The Intrepid Sprint greatly advanced our own thinking and provided us with a new starting point. Anyone may build upon the work presented here — but they can also suggest alternate approaches.

In developing our plans we work closely with the Alpheios Project. Alpheios developed the best reading reading environment for Greek with which we are the Humboldt Chair of Digital Humanities are familiar and did so almost a decade ago. Alpheios is now preparing to update its tools and Perseus 5.0 will work as closely as possible with Alpheios to minimize duplication of effort. Those submitting a proposal for the Leipzig RFP should familiar themselves with Alpheios and especially with the reading environment that Alpheios has provided for the first book of the Odyssey. This environment only runs under Firefox and it depends upon Firefox features that are supposed to disappear. The upcoming rewrite will address this problem, but the environment still runs on my Macbook as of July 22, 2017. Source code for this reading environment is available at https://sourceforge.net/projects/alpheios/.

In general, the goal is to create a new version of Perseus that integrates the additional features long offered by Alpheios and that provides users with an opportunity to establish basic profiles. Contractors can assume access to a CTS-compliant API. An initial browsing environment based upon the http://capitains.org/ API is visible at http://cts.dh.uni-leipzig.de/ and http://cts.perseids.org/, but contractors are free to develop their own frontends on top of the CTS API.

The deliverables below distinguish between results that are required (“must“) and that are desirable if possible (“should“). If proposals can guarantee more than the required features, they should indicate so. If proposals do not feel that they can guarantee all the requirements, they should indicate which they can and cannot guarantee.

  1. The contractor must provide a new reading environment that captures the basic functionality of the Perseus 4.0 reading environment but that is more customizable and that can be localized efficiently into multiple modern languages, with Arabic, Persian, German and English as the initial target languages. The overall Open Greek and Latin team is, of course, responsible for providing the non-English content. The Scaife DL Viewer should make it possible for us to localize into multiple languages as efficiently as possible.
  2. The reading environment should be designed to support any CTS-compliant collection and should be easily configured with a look and feel for different collections.
  3. The reading environment should contain a lightweight treebank viewer — we don’t need to support editing of treebanks in the reading environment. The functionality that the Alpheios Project provided for the first book of the Odyssey would be more than adequate. Treebanks are available under the label “diagram” when you double-click on a Greek word.
  4. The reading environment should support dynamic word/phrase level alignments between source text and translation(s). Here again, the The functionality that the Alpheios Project provided for the first book of the Odyssey would be adequate. More recent work implementing this functionality is visible at Tariq Yousef’s work at http://divan-hafez.com/ and http://ugarit.ialigner.com/.
  5. The system must be able to search for both specific inflected forms and for all forms of a particular word (as in Perseus 4.0) in CTS-compliant epiDoc TEI XML. The search will build upon the linguistically analyzed texts available in https://github.com/gcelano/CTSAncientGreekXML. This will enable searching by dictionary entry, by part of speech, and by inflected form. For Greek, the base collection is visible at the First Thousand Years of Greek website (which now has begun to accumulate a substantial amount of later Greek). CTS-compliant epiDoc Latin texts can be found at https://github.com/OpenGreekAndLatin/csel-dev/tree/master/data and https://github.com/PerseusDL/canonical-latinLit/tree/master/data.
  6. The system should be able to search Greek and Latin that is available only as uncorrected OCR-generated text in hOCR format. Here the results may follow the image-front strategy familiar to academics from sources such as Jstor. If it is not feasible to integrate this search within the three months of core work, then we need a plan for subsequent integration that Leipzig and OGL members can implement later.
  7. The new system must be scalable. While these collections may not be large by modern standards, they are substantial. Open Greek and Latin currently has c. 67 million words of Greek and Latin at various stages of post-processing and c. 90 million words of addition translations from Greek and Latin into English,French, German and Italian, while the Lace Greek OCR Project has OCR-generated text for 1100 volumes. Use of Elasticsearch appears desirable but proposals may suggest other directions.
  8. The system must integrate translations and translation alignments into the searching system, so that users can search either in the original or in modern language translations where we provide this data. This goes back to work by David Bamman in the NEH-funded Dynamic Lexicon Project (when he was a researcher at Perseus at Tufts). For more recent examples of this, see http://divan-hafez.com/ and Ugarit. Note that one reason to adopt CTS URNs is to simplify the task of display translations of source texts — the system is only responsible for displaying translations insofar as they are available via the CTS API.
  9. The system must provide initial support for a user profile. One benefit of the profile is that users will be able to define their own reading lists — and the Scaife DL Viewer will then be able to provide personalized reading support, e.g., word X already showed up in your reading at places A, B, and C, while word Y, which is new to you, will appear 12 times in the rest of your planned readings (i.e., you should think about learning that word). By adopting the CTS data model, we can make very precise reading lists, defining precise selections from particular editions of particular works. We also want to be able to support an initial set of user contributions that are (1) easy to implement technically and (2) easy for users to understand and perform. Thus we would support fixing residual data entry errors, creating alignments between source texts and translations, improving automated part of speech tagging and lemmatization but users would go to external resources to perform more complex tasks such as syntactic markup (treebanking).
  10. Bids should include a specific component for design work to plan next steps after the current phase of work. We were very pleased with the Design Sprint that took place in July 2017 and would like to include a follow-up Design Sprint in early 2018 that will consider (1) next steps for Greek and Latin and (2) generalizing our work to other historical languages. This Design Sprint might well go to a separate contractor (thus providing us also with a separate point of view on the work done so far).
  11. Work must be built upon the Canonical Text Services Protocol. Bids should be prepared to build upon https://github.com/Capitains, but should also be able to build upon other CTS servers (e.g., https://github.com/ThomasK81/LightWeightCTSServer and cts.informatik.uni-leipzig.de).
  12. All source code must be available on Github under an appropriate open license so that third parties can freely reuse and build upon it.
  13. Source code must be designed and documented to facilitate actual (not just theoretically possible) reuse.
  14. The contractor will have the flexibility to get the job done but will be expected to work as closely as possible with, and to draw wherever possible upon the on-going work done by, the collaborators who are contributing to Open Greek and Latin. The contractor must have the right to decide how much collaboration makes sense.
  15. We would welcome a bids that bring to bear expertise in the EPUB format and that could help develop a model for representing for representing CTS-compliant Greek and Latin sources in EPUB as a mechanism to make these materials available on smartphones. We can already convert our TEI XML into EPUB. The goal here is to exploit the easiest ways to optimize the experience. We can, for example, convert one or more of our Greek and Latin lexica into the EPUB Dictionary format and use our morphological analyses to generate links from particular forms in a text to the right dictionary entry or entries. Can we represent syntactically analyzed sentences with SVG? Can we include dynamic translation alignments?

We will draw upon the following criteria in selecting a proposal.

  1. Price. The cost of the contract is important but will be by no means the most important factor.
  2. A credible plan that reflects the available portfolio of work by the contractor.
  3. A demonstrated understanding of the work and its goals. The Intrepid plan with commentary listed above provide a blueprint but we welcome proposals that suggest alternatives or add additional critiques. Even if such alternatives are not adopted, they can illustrate an understanding of the work that we propose.
  4. Demonstrated experience with the issues involved in searching and analyzing Greek and Latin would be highly desirable. Such experience is not by itself sufficient and not absolutely essential — Perseus and Open Greek and Latin collaborators can provide leadership here — but documented expertise in searching and analyzing Greek and Latin in a digital medium would be major advantage to the work proposed here.
  5. A credible commitment to work with the CTS API and to build upon existing code.
  6. The degree to which the proposed work indicates that academic and support staff at Leipzig, Tufts and elsewhere will be able to maintain and enhance the work done under this contract.
  7. The degree to which the proposed work helps us develop a detailed plan for future work that we can use as the basis for proposals to raise additional support.
  8. The degree to which the proposed work appears suited to languages other than Greek and Latin. While the brief period of the proposed work means that we will focus upon Greek and Latin, we want to see other collections served by the Scaife DL Viewer. These include the emerging Open Islamicate Texts Initiative, the CTS-compliant texts from the Croatian Latin Authors project, and Perseus collections besides Greek and Roman Materials.
  9. Ability to communicate, in both written and spoken form, in English. Proposals will be reviewed by international experts and must be in English. Likewise, academic collaborators are international and the working language of the contracted must be English.
This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.