Progress towards Perseus6

Gregory Crane
November 15, 2024

A great deal of work towards the new Perseus (Perseus 6) has been going on behind the scenes and that work is now beginning to become visible. James Tauber of Signum University has led the development, with collaboration from Charles Pletcher and Clifford Wulfman. In most cases, the “we” in this narrative means “James Tauber did the work with the rest of us making comments from the peanut gallery.”

 

Figure: the ATLAS backend architecture

We are still not in the final stage of implementation but we are finishing up a major, and probably the biggest, task: creating a sustainable backend to manage a growing range of machine actionable textual data from a widening set of open data projects in multiple countries. As we refine the backend data, we can begin to move to the final stage of work: implementing within the Scaife Viewer the frontend services that support treebanks, grammatical annotations, aligned translations, metrical analyses, automatic mapping, links from edition to manuscripts and that were developed in Beyond Translation.

For now I point out work done to make the existing Scaife/Perseus Greek and Latin accessible in the ATLAS architecture.

Every Greek and Latin text available in Scaife is now available in the new ATLAS architecture. If you start with the Scaife ATLAS homepage (https://atlas.perseus.tufts.edu/) drill down into the CTS library (https://atlas.perseus.tufts.edu/library/), you can find your way into a list of all editions currently available in Scaife (a dozen versions of Thucydides in various languages are available https://atlas.perseus.tufts.edu/library/urn:cts:greekLit:tlg0003.tlg001/), and then down to each of the lowermost citable notes (e.g., https://atlas.perseus.tufts.edu/library/passage/urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:1.1.1/). At this point, you can call for either plain text or xml by adjusting the final argument: https://atlas.perseus.tufts.edu/library/passage/urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:1.1.1/text/ vs. https://atlas.perseus.tufts.edu/library/passage/urn:cts:greekLit:tlg0003.tlg001.perseus-grc2:1.1.1/xml/.

If you explore the Github repository, you will not not only the full text library stored in the simpler ATLAS form (identifier+textchunk) but also complete morphosyntactic analyses for all 40 million words of Greek (using GreCy) and 16 million words of Latin (using LatinCy). Navigating to particular authors is a bit cumbersome because we had too much data and it was necessary to split the data into multiple Github repositories. The information you need to find the right repository for any given Scaife Greek or Latin author is at https://github.com/scaife-viewer/tagging-pipeline/tree/main/data.

We now have three layers of morpho-syntactic data.

  1. Actively curated Treebanks (e.g., Francesco Mambrini’s Daphne).
  2. The growing repository of curated and automatically generated treebanks from Alek Keersmaeker’s Glaux Trees (currently 20 million words of Greek and including some works that are not yet in Perseus)
  3. The comprehensive morpho-syntactic data generated for any texts added to Perseus.

The opening sections of Thucydides (source here):

The identifier+textchunk format is all that we need to add new texts into Perseus. While the greater structure of the TEI XML format that we have used in Scaife has advantages, we can now add much more content much more quickly and work with larger corpora that has been practical in the more demanding XML framework. The goal is not to abandon XML but to provide the XML as time allows while also being able to work with larger, less structured collections.

The morpho-syntactic analysis (link here):

A draft of a fuller write-up for this work is available in Zenodo: The Sixth Generation of the Perseus Digital Library and a Workflow for Open Philology.

Posted in Uncategorized | Leave a comment

August 5, 2024: update on Perseus 4

Gregory Crane

As far as I can tell, things have settled down on Perseus 4. As I understand it, there were two issues.

First, Tufts Technology Services moved Perseus 4 to a new hardware platform. In the older platform, various checks had been added so that we would automatically reboot Perseus servers if they hung. This automatic rebooting feature was added a long time ago — probably at least ten years in the past. Replacing this automatic feature seems to have slipped our attention (and I should have checked that). It looks like we have made progress on this. It is possible we will need to tweak this reboot feature but for now I am finding that the site works (at least for me). We monitor email to the Perseus webmaster (and try to respond in a timely fashion).

Second (as noted before), we were hit with scraping and of high volume use that differed from previous patterns over the many years that Perseus 4 has been running. Measures have been put in place to mitigate against these behaviors. We will monitor this situation and respond as needed.

I would like to thank Tufts Technology Services for their on-going help.

Posted in Uncategorized | Comments Off on August 5, 2024: update on Perseus 4

Towards a new Perseus: Update

Gregory Crane
Editor-in-Chief, Perseus Digital Library
Update as of January 19, 2024.

We have now completed work on Beyond Translation (a draft white paper on this work that has been submitted to the NEH Office of Digital Humanities is available here) and are focused on using the Beyond Translation work as part of an update to the Scaife Viewer. The resulting system will finally allow us to replace Perseus 4. We are calling the new version Perseus 6 (rather than 5) to reflect the amount of work embedded in the Scaife Viewer and now Beyond Translation (which we view collectively as Perseus 5). A grant from the NEH Collections and References program in the NEH Division of Preservation and Access for Perseus on the Web — preparing for the next thirty years provides the primary support for this phase of work, with additional support from the Tufts Data Intensive Studies Center, the School of Arts and Sciences, Tufts Technology Services and Google.

Our main collaborator in this phase of development is James Tauber, who is now working with Signum University. We also are waiting on Tufts administrative paperwork to finalize a contract with another group to help us reorganize the Perseus home (and associated sub pages) and replace this WordPress-based blog with the Pubpub Publishing Platform (which we already began using in documenting Beyond Translation.

A draft outline of the work that we are doing is now available here.

For now, the focus of work is to fold the services visible on Beyond Translation into the Scaife Viewer. The first results from that work will probably documentation, with changes to the Scaife Viewer following.

Posted in Uncategorized | Comments Off on Towards a new Perseus: Update

Draft NEH White Paper for Beyond Translation

GREGORY CRANE

The Office of Digital Humanities at the National Endowment for the Humanities asks its projects to submit white papers after completing a project. We have posted a draft of the White Paper for Beyond Translation here and invite suggestions as we finalize our work.

dNext up: a formal description of plans to replace Perseus 4. We secured the last chunk of funding needed shortly before the winter break. Once we finalize this report on what we have done, we will provide more details on what we are doing.

Posted in Uncategorized | Comments Off on Draft NEH White Paper for Beyond Translation

Towards the Next Perseus: mid-fall update

A lot of work continues to go on behind the scenes as we move to replace the current Perseus 4.0 (the Hopper). Our goal is to finish the transition by fall 2024, with new functionality folded into the Scaife Viewer until this can fully take the place of the now venerable system. Those who are more technically inclined can follow much of what is being done by tracking issues on the Beyond Translation Github site.

New functionality includes support for new kinds of annotations such as treebanks, translations aligned at the word and phrase level, automatic mapping, visualization of meter etc. You can see a summary of these new features and enhancements here.

New functionality for Scaife also includes addition of services to which users have been long accustomed in Perseus, with support for commentaries been at the top of the list. We are also finishing a long-term backlog of texts for which the structural markup requires some manual intervention(as well as programmatic reformatting).

At the moment we are preparing to sign a contract to replace the Perseus home page and associated data. Our plan is to replace the Word Press platform (which I am currently using) with a different publishing platform, which is much better suited to academic publication. It supports not footnotes, automatically generates citation information and allows us to include interactive visualizations. We will have more to say as soon as the contract is signed.

Our plan is to have more substantive information about what we are doing by the end of the December (i..e., a few weeks after a busy semester starts).

Posted in Announcement, General, Release, Scaife Viewer, Technology | Comments Off on Towards the Next Perseus: mid-fall update

Yet more Lucian: translations by Emily James Smith

In 1892, at the age of 27, while serving as teacher of Greek at the Packer College Institute, Emily James Smith published translations for selected works of Lucian. She later served as dean (1894-1900) and then trustee (1900-1905) of Barnard College. She provides with readable translations for a number of Lucian’s works. We added the section numbers and attentive readers will note missing sections. Smith chose to leave out those passages that could not be translated in the standards of the time because of their sexual nature.

Her translations have the identifier perseus-eng5 (e.g., tlg0062.tlg029.perseus-eng5 for “the Dream”). She includes both works that have been ascribed to Lucian (with the identifier tlg0061although some are clearly not by him) and two that are labelled as “Pseudo-Lucian” (tlg0061).

tlg0062.tlg029 The Dream
tlg0062.tlg018 Zeus the Tragedian
tlg0062.tlg024 The Sale of Lives
tlg0062.tlg019 The Cock
tlg0062.tlg016 The Ferry
tlg0062.tlg012 A True History
tlg0062.tlg044 Toxaris; Or, Friendship
tlg0061.tlg001 Loukios; Or, the Ass
tlg0061.tlg004 The Halcyon
Posted in Release | Comments Off on Yet more Lucian: translations by Emily James Smith

More Lucian: the Fowler brothers 1904 translation

Gregory Crane

Henry Watson Fowler (1858-1933) and his younger brother Francis George Watson (1871-1918) are best known for their 1906 publication, the King’s English and the 1926 Modern English Usage, composed by Henry George after the 1918 death of his brother. In 1904, however, the brothers had published The works of Lucian of Samosata, coyly described as “complete with exceptions specified in the preface.” The exceptions included works that did not fit with Victorian sensibilities (such as the Dialogues of the Sex Workers) or that did not match seem worthy of Lucian (as they understood him). They also left out, sadly, On the Syrian Goddess, which Harmon would translate into an archaizing form of English that many contemporary readers would find unbearable.

Nevertheless, the Fowler brothers provide a second translation to complement those by Harmon, Kilburn, Macleod and others. Our goal in Perseus it to work towards providing, as often as possible, two or more translations so that readers can begin to get a sense of how differently the same text can be represented. For now we are adding more translations but we do so in part because new services have emerged (in particularly automatic translation alignment and rich linguistic annotation) that allow readers without knowledge of Greek to begin seeing how the source text and translations are related.

The Fowler translations have the label “perseus-eng4” and their XML source files can be found (where they are available) in Github in the various work directories here.

Posted in Release | Comments Off on More Lucian: the Fowler brothers 1904 translation

Lucian: Updating Greek and adding English

Gregory Crane

Another update for our NEH-funded Next Thirty Years of Perseus work. We have now updated Lucian. First, we have fixed issues in the Greek for Lucian works 1-52 as editing by A. H. Harmon. These were originally entered years ago (c. 2010) with a version of Abbyy Finereader that only knew modern Greek. There were some residual OCR errors as well as incorrectly accented words (usually problems because we did not account for enclitics). We also added the textual notes. There were two versions of this Greek up until now but they have been consolidated.

We have also added the corresponding English translations by Harmon. These will all appear in the next upload to the Scaife Viewer, from work 1 (Phalaris) through work 52 (Disowned/Abdicatus)

Translations for all of Lucian are ready to be added, with more than one translation for most of Lucian’s works soon to be available.

Lucian text files are at here.

To examine this work by work, you can use URLs of the form: https://github.com/PerseusDL/canonical-greekLit/tree/master/data/tlg0062/tlg001.

Posted in Release | Comments Off on Lucian: Updating Greek and adding English

New translations of Thucydides added

Gregory Crane

Under our new Perseus the Next Thirty Years NEH grant, we have added a set of new translations for Thucydides, including translations in English, French, German, Italian and Latin. These are now available on Github and (with the exception of two German translations of part of Thucydides) can now be viewed in the Scaife Viewer. The opening books of Thucydides in the Zevort translation have been available. We now have the complete translation.

These are in addition to translations that have been available in Perseus for many years.

More Thucydides materials should appear in the coming months.

Posted in Release | Comments Off on New translations of Thucydides added