In preparation for our Grammatical Treebank Analysis workshop, Vanessa and Bob Gorman have produced a series of videos introducing Arethusa and Treebanking. If you want to prepare for the workshop ahead of time and want some guidance to working with treebanks on your own, these videos are a great place to start.
Fernando Rios, Data Management Services, The Sheridan Libraries, Johns Hopkins University
Bridget Almas, Perseids Project, Tufts University
Software is an important part of many kinds of scholarship. However, it is often an invisible part of the knowledge generation process. As a result, software’s lack of visibility within the scholarly record inhibits the understanding and future use of the scholarship which is dependent on it. One way to mitigate that outcome is to preserve not only the final result but also the actual platform, services and tools upon which it depends.
In order to guide preservation of these platforms and services, Data Management Services at Johns Hopkins University is exploring several aspects of software preservation, one of which is investigating how preservation needs can be determined for particular projects such as Perseids. The Perseids Project at Tufts University is a web-based platform that is being used to produce new forms of digital scholarship for the humanities. Consequently, examining how this scholarship might be preserved by preserving the underlying software is of practical importance.
One of the outputs of the Perseids Project has been a series of prototypes of new forms of data-driven publications and digital editions. The data for these online publication prototypes have been produced through the use of a variety of software tools and services that combine dynamically provided data through orchestrated calls to web services. The software tools and underlying services have gone through several iterations of development throughout the lifetime of the project and publications have been produced at different stages of that development. This scenario poses a series of interesting challenges for preservation of these digital publications, the underlying data, and the tools and services that are intrinsic to them.
This exploratory project had two objectives. The first was to give structure to thinking about how the data-driven publications and digital editions enabled by Perseids could be preserved. The primary concerns were what should be considered in determining how to adequately capture the collection of services and tools that comprise Perseids? Should the entire collection even be captured? The second objective was to develop and trial a set of questions, presented in the form of a questionnaire, that could be used to elicit information to help address the first objective.
The Perseids platform and the publications produced on it rely upon complex pieces of software with many moving parts. In order to begin addressing the question of how such a platform and its publications might be preserved, we had several informal discussions of what the major parts of Perseids were, along with general approaches to preservation and the associated challenges. We focused our investigation on a prototype digital publication that was developed on an early version of the platform and that used versions of the annotation tools and services from Perseids which have since been significantly updated or replaced since the prototype was first produced.
In order to understand how we might proceed with a potential software preservation activity, we decided it was important to answer three questions. First, we agreed it was important to have clarity on what the purpose of preservation is and who would benefit. Second, we determined that understanding what the pieces of the software are and how they are interdependent was critical. Third, we decided that being clear on what the costs versus benefits of preserving the Perseids software were, in relation to alternative approaches (e.g., website capture), was the most important question to address, from a practical perspective.
To structure the information, we used two questionnaires developed by Fernando for the purpose of providing consulting services for software archiving by the Data Management Services group at Johns Hopkins University. The first questionnaire asked very general questions in order to appraise the state of the software and gauge any potential gaps which may hinder its preservation and future reuse. Questions included asking the purpose of the Perseids project, its intended audience, the state of user- and developer-oriented documentation, general information about external software dependencies, and questions meant to gauge the general attitude with respect to software preservation and credit. After Bridget completed the questionnaire, we decided to move forward with determining what might need to be done in order to preserve the scholarship that the target use case represents (i.e, the prototype digital publication) and how it might be carried out.
To do this, a second, more focused questionnaire was developed (by Fernando, using feedback given by Bridget on the first questionnaire) in order to get us thinking about the specifics of preservation, including most importantly, the why. The figure below shows the sequence in which different aspects of preservation were addressed. The questions are loosely grouped by what kind of information they capture: why, what, when, how long, who, and how.
Although the questionnaires are still undergoing refinement and are not (yet) publicly distributed, a brief description of the information captured by the questionnaire we used is shown in the table below.
|Why||Questions in this part revolved around really thinking about the true purpose of preserving software (e.g., enabling reproducibility, reuse, or continued access to scholarship) as well as the intended audience.|
|What||This section attempted to help us think through two things. First, at what level of granularity should the software be described and preserved in order to fulfil the preservation goal? This is important because different goals may require different levels of granularity in the description of the software. An example of a highly granular description is describing not only the software as a whole but also describing and documenting the individual pieces that comprise it as well as their interrelationships. Once an appropriate level of granularity was determined, a series of questions elicited information on those pieces.|
|When||This section attempts to determine what an appropriate time to preserve software is. For normal grant-funded projects, this will likely be at the end of the project or at the time of publication.|
|How Long||This part simply asks at least how long should the software be preserved. It is a simple question with a potentially difficult answer. Ideally, the answer is ‘a long time’ but the longer the time span, the more effort must be made to ensuring the software remains not only accessible but also usable. Therefore, it is important to come up with a number based on available resources.|
|Who||This section is meant to determine who is responsible for not only the software but also who bears responsibility for archiving it, making it citable, assigning unique identifiers etc. This section also is meant to help in identifying a suitable archive where it may be stored.|
|How||This section elicits what approach seems reasonable to preserve the software (e.g., by archiving the source code as-is, using virtualization or emulation technology, or by continued development). In addition, this section determines the kind of documentation that will be included and how it will be attached to the software (e.g., readme file, wiki, structured metadata). Although not part of the questionnaire, the Pathways of Research Software Preservation (Rios, 2016) gives an overview of how different parts of research software might be preserved and how different approaches are related.|
We learned, first, the importance of sorting through the “why” and “what” to identify those pieces of software which warrant preservation activity and to define exactly what approach to take to preservation. Having the framework of the questionnaire to guide our thinking about those issues helped to focus what felt at the beginning like a daunting task.
Bridget entered into the discussions with Fernando with a pragmatic motivation: as development progresses on Perseids, having to support multiple earlier versions of services in order to support the prototype publications becomes increasingly unmanageable. We wanted to be able to retire the earlier service versions that these prototypes depend upon, but the cost versus benefit ratio for upgrading prototype code does not always allow for that. In considering the options for preserving a functioning version of a prototype, some of which themselves imply a fair amount of work (such as creating and preserving a Docker container image of all the supporting pieces), thinking about the the true purpose for preservation helped to put the problem in perspective and also to identify gaps in our planning and preservation capabilities.
While each of the suggested motivations from Fernando’s questionnaire could be considered to be an ideal to which to aspire in general, when held up against the specific software, they didn’t all make practical sense. For example, while in theory, reproducibility of the exact display of the annotations and textual data from our target use case seemed desirable, we had to ask if that was essential for preserving and reproducing the scholarship. The answer to that might have been yes if we had amassed large quantities of data for the use case, and expanded it beyond the initial prototype. But as we have not yet been able to do that, and the tools and services in question have since evolved, the small dataset we have accumulated for our publication would be better reproduced and expanded via newer tools. With this consideration in mind, it seems the remaining value of the prototype code would be as a demonstration of a methodology for annotation and a proposed service-based infrastructure to support that methodology. The code itself is of less consequence than a documentation of the ideas and dependencies would be.
This problem is discussed in the context of scientific workflows in “Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?” (Thain, Ivie and Meng, 2015). The authors found that preservation of distributed environments is still very much an open question and they suggest various approaches. In our case, a Docker image would allow an end-user to see the prototype functioning as it did when published but would provide little insight into the methodology or infrastructure. As we don’t intend to reproduce this environment exactly, we might consider just preserving the “working principle”, providing a description of the setup, using a controlled vocabulary.
It also became clear, in reviewing the questionnaire, that simply having code in GitHub or other open source versioning repository is not sufficient. All code we write is available in the project’s GitHub repository. However, because of the complex history and dependencies of open source software development, what exists in the repository represents, in many cases, only the tip of the iceberg. In addition, the GitHub repository, as it currently stands, doesn’t present a true picture of all the people who contributed intellectually to these efforts, because the code is just one piece of the puzzle. As discussed in Matthew Turk’s excellent post, “The Royal ‘We’ in Scientific Software Development”, we need to do a much better job of recording and crediting this intellectual work. Further, we need to be cognizant of the need to to this as the work takes place. An ontology such as TaDiRAH would be worth considering here.
The “who” section of the questionnaire also raised some interesting questions. Where does the responsibility for preservation lie, between the software developer and the scholar? Many of the use cases we work on in Perseids are not explicitly funded projects in and of themselves. Our approach has been to try to do as much as possible to serve as many real scholarly workflow needs as possible. This has provided the opportunity for us to explore various questions around what humanities infrastructure needs to support, while hopefully still also providing real value to our users. At the same time, we have learned that without adequate planning for governance and sustainability, things can and do fall through the cracks. Prototype code which we have developed, such as for the use case we examined here, does not always have a clear owner. For future projects of this nature, we need to take the time at the beginning to ask ourselves these questions about who will take ownership and responsibility for ensuring the preservation in order to eliminate this ambiguity.
Conclusions and Next Steps
Although data preservation and sharing has received much attention from funders, publishers, libraries and research communities in the past 10 years or so, methods, tools, and best practices for preserving and curating the software associated have not been as fully developed. The evaluation of the Perseids project served to contextualize some of the ideas and workflows around capturing information to enable the archiving of research software that are being developed in the Data Management Services group at Johns Hopkins University. From the Perseids Project’s perspective, the iterative approach we took gave us a clearer idea of the unique requirements and challenges of preserving the scholarship embedded in this software.
We learned that while having an ideal to shoot for is good, the ideal isn’t always the best or most practical approach. We have, however, identified some concrete next steps we can take to move closer to where we would like to be with preservation of the platform components and outputs.
First, we will explore ontologies and approaches for describing the distributed infrastructure we have envisioned for our publications. We have started with an analysis of the Ontosoft Ontology, although at first glance, it does not seem possible to express with it all the layers of intent and dependencies in our environment. We also intend to explore the Linked Resource Model ontology developed by the Pericles-EU project for this purpose.
In order to preserve the end-user experience of our publications, we expect to use Webrecorder.io service to create web archive snapshots of their current state. This will allow us to preserve the visual representation of the scholarly output without a dependency upon the software behind it being available in perpetuity.
Finally, we hope to do a better job planning for the sustainability and stewardship of future undertakings on the platform from the outset, including identifying all participants and the nature of their contributions.
Teach the Teachers Workshop
Tufts University Boston MA August 14-16th, 2017
The Perseids Project in conjunction with the Department of Classics at Tufts University is calling for participants in the second Teach the Teachers workshop.
This three-day workshop aims to showcase the Perseids platform and explore the uses of these tools in a classroom setting. Registration for this workshop will be free and financial support for travel and lodging will be provided. We are looking for participants who teach at the High school or secondary school level, as well as Phd candidates and graduate students.
Treebanks are large collections of syntactically parsed sentences. Although originally designed to improve computational linguistic analysis, treebank annotations have proven to be valuable tools for pedagogy and traditional philological pursuits. Treebanking projects have also proven to be valuable tools for students because they provide targeted assessment and feedback. In addition, treebanking allows students to contribute to a growing collection of ancient language treebanks.
The workshop will contain seminars on how to use the tools available via Perseids, in particular the Alpheios Alignment editor and the Arethusa Treebank editor. These seminars will include comprehensive guidelines so that any user at any level of digital literacy will be able to use the tools to their full potential. This will include:
- Use of translation alignments for language and non-language students
- Use of treebank annotations in the classroom, including Prof. Matthew Harrington’s treebanks of the AP Latin Curriculum
The purpose of this workshop is to facilitate the exchange of new ideas for the implementation of the Perseids Platform in the classroom. We encourage you to experiment with our tools before attending the workshop, so that you can bring your own ideas about implementations in the classroom for discussion.
Participants should submit a statement of up to 500-700 words in length. Submissions will be accepted until December 16th. Statements should demonstrate that an applicant has a strong desire to work with new and experimental teaching techniques. No experience with digital methods is required, but those with experience will be supported at their own level. Although we work primarily with Greek or Latin teachers, we encourage educators who work with other ancient languages to apply. An ideal candidate needs to be willing to approach teaching these subjects in new ways and should be prepared to implement them in the classroom.
Send submissions in the form of a pdf to firstname.lastname@example.org
A free two-day workshop sponsored by the Perseids Project
January 4-5th, 2017, 9AM-5PM
THE WESTIN HARBOUR CASTLE, TORONTO
1 Harbour Square
Toronto, ON M5J 1A6
This two-day workshop aims to present some of the work currently being done in digital pedagogy for classical studies. As the field of classical studies continues to evolve, technology is playing an even larger role both in educating a new generation of scholars and in opening new approaches to data-driven humanities research.
The workshop will include hands-on seminars on how to use the tools available via Perseids, in particular the Alpheios Translation Alignment editor and the Arethusa Treebank editor. Treebanking (morpho-syntactic diagramming) allows a user to identify all the dependency relationships in a sentence as well as the morphology of each word. Translation alignments allow a user to identify corresponding words between an original text and its translation. With both methods, the resulting data is automatically compiled in an xml file which can be further queried for research.
Participants should plan on attending all sessions of the two day workshop, from 9AM-5PM on January 4th and 5th. Participation is open to college professors, high school teachers, and graduate students.Participants should bring laptop computers. Since we will be working in Latin and Greek, participants should have a basic knowledge of either language. Wifi will be provided as well as coffee breaks and lunch. Participation is free, but seats are limited to 40.
The workshop will be led by Marie-Claire Beaulieu (Tufts University), Tim Buckingham (Perseids Project), Vanessa Gorman (University of Nebraska-Lincoln), and Robert Gorman (University of Nebraska-Lincoln).
Follow this link for more information and to sign up for the workshop.
Keep checking out the landing page, as we will keep adding more information and more content in the future.
As discussed in previous posts in this series, navigating the waters of the scholarly and technical assumptions each of us bring to the Perseids collaboration is not always simple. Some of this disconnect has been beneficial to the project — when we each stick to our respective roles and areas of expertise we have very little redundancy of effort.
But, when it comes to joint decisions about the direction of the project, our Hacker and Professor run into some disagreements. Bridget regularly has to remind the team that the inclusion of new unplanned features and workflows mean that other things we had hoped for would have to wait or be dropped entirely. But Marie-Claire can be frustrated by the “workplan-waving.” This recurring issue stems in part from Marie-Claire not being able to fully assess the complexity of the technical solutions, and Bridget not understanding what drives the scholarly and pedagogical requests. These misunderstandings make it difficult for them to decide which things should remain in the workplan and which new avenues should be pursued with students.
So, in keeping with the experimental nature of Perseids, The Hacker and the Professor have embarked on a skills exchange as an experiment of their own. Bridget has been coaching Marie-Claire through a self-initiated journey into programming and web design. Marie-Claire has been mentoring Bridget through an assignment she normally gives to her Greek mythology classes, which aims to analyze the transmission of a classical Greek myth through its representation on an ancient artifact.
It has been a truly fascinating journey so far. What follows are some of the thoughts they have about their skills exchange.
First, I have to confess that my interest in helping Marie-Claire obtain some more technical skills is not entirely altruistic … I hate the part of my job that requires that I be realistic about timeframes and the effort needed to develop code. I want Marie-Claire to gain these skills for her own growth, but also so that when we prioritize the work the burden for understanding what takes time is more fully shared. I am not a natural teacher though and I am incredibly thankful for the outstanding free resources available for this. The Khan Academy site in particular has been great in providing a logical order to tackle topics, exercises, and examples to work through. (A side-benefit of this for the project is that it has been allowing us to think more concretely about certain features of the ePortfolio and self-assessment tools that we hope to make available on Perseids). We then take those examples and Marie-Claire applies them in the context of work she is doing with her students using the Perseids platform.
I do believe that I have the better end of the bargain here though. Marie-Claire is a world-class teacher who cares tremendously about her students and her subject, and I could not ask for a better mentor. As a young college student I was focused on getting out into the real world as quickly as possible to save time and money and didn’t take advantage of my education to explore some of the topics in ancient religion and myth that serve as the underpinnings for our society. I have passed by thousands of objects in museums and public spaces without thinking about what they say about our social history and our internal perceptions of ourselves, our human relationships, and our culture. I have tried over the years to be more well-read and informed in a self-directed, and often misguided, sort of way, but doing so without context makes it hard to get engaged with the material. Reading the primary and secondary sources with a specific question in mind changes that. What I find particularly interesting about this experiment is that when we first embarked on it, I found myself getting distracted by thinking about superficial aspects of the digital tools that could enhance presentation of the material or my eventual reporting on it. But as I delve deeper into the actual content and discuss the questions I have on it with Marie-Claire, aspects of digital presentation and publication are actually quite far from my mind. I am very curious to see if and how they reenter the picture as I get closer to producing the results of my little research project.
Learning programming has been an exhilarating experience so far. Let me be clear: I am not saying that it all comes easy and everything is great. Quite the contrary. I struggle through the basics and often get stuck on little things. I also often get it into my head to undertake projects that are too difficult at my current level and I sink into quagmires. Yet, every small success is a reward, and Bridget’s support, patience, and encouragement is a constant motivation. In fact, I feel that I’m getting the better end of the bargain in our skills exchange, because I have access to Bridget’s advice and experience, without which it would be very difficult not to be intimidated by the material. The excellent Khan Academy tutorials also do a great job of rewarding every bit of progress. I am constantly reminded of the very similar effort I had to make when I was learning Greek and Latin, and the immense joy of discovery I experienced as I got better. As a teacher, I never want to lose sight of the challenge of learning.
In fact, becoming a better teacher motivates me through this learning experience. Anything I learn, my students will get to learn too. So as I make my way through my lessons and the sessions with Bridget, my head is buzzing with ideas for student projects that will take advantage of these skills and transmit them to my students. As a Classics professor, I strive for my discipline to be taught better and more widely, so that the wealth of wisdom and beauty that we inherited from the ancient world be made accessible to as broad an audience as possible. In today’s world, that includes code and programming. These techniques enable us to study our field in deeper and more meaningful ways than we ever could before and to disseminate the results in sustainable ways. Technology also makes our discipline more inclusive than ever before, because it allows us to approach the Humanities from a common middle ground that crosses cultural and social gaps.
As you can see, I am the dreamer in the Perseids team… For me, programming is very much like fine arts, music, or languages. It is creative, yet also exacting, and forces me to think in a disciplined fashion. Hopefully, that will help me stick to the workplan.
Alright, enough musing. Can we talk about code now?
Professing is a rather mysterious activity. We teach, we write, we read, we muse, we talk… seemingly in no particular order. Understandably horrified, our hacker friends wave their workplan at us and tell us that we need to stay in scope, and that such and such feature is not to be released until the second quarter of next year, and what are the requirements please?
There is a method to the madness, I assure my hacker friends. We do have well-defined research and teaching agendas, and our progress (especially for junior professors) is meticulously charted by our institutions. Yet, the flexibility of our schedules and work culture means that we often have the opportunity (and occasionally the obligation) to take up an unplanned project. We are also responsible for mentoring student theses, the topics of which may vary quite widely, and are renewed with every cohort, every academic year.
So how do we build tools and infrastructure with equal parts of professing and hacking? Obviously, this requires true intellectual engagement on both sides, so that the result is not an immediate means to an end, but rather a process whereby Humanities questions and technology are explored and developed at the same time. Writing user stories and requirements allows us to think about our objectives, not only from the user perspective but also from the inside out. What do we want the data to do? And more importantly, how should it do it?
As Bridget Almas pointed out in her latest post, the wires-exposed nature of Perseids is helpful in the course of this experimentation because it lets us think concretely about the objects we are manipulating, namely the data and the technology itself. And yes, we acquire skills that we never thought we would have when we signed up to be Humanists.
Now, does this change professing? Yes and no. No, because experimentation is built into research and teaching. Hitting roadblocks or dead-ends is a natural part of discovery, and the process of learning is one of trial and error. Yes, because we are now placed in a global environment where we must produce data and tools that can be reused in order to ensure any degree of perennity and sustainability. Explaining this to our administrators is not always easy, since expectations for Humanities faculty are centered on single-author publications, especially books and journal articles. Even so, the highly individualized practices of our profession are eroding to make way for teamwork, which in turn requires us to stick to the workplan.
And that’s not half-bad. In my opinion, one of the greatest benefits of this method is the built-in review system. As we think through our projects with our team and scope out the requirements, we go through a back and forth that helps us all to refine our work. Then, when we release new features and workflows, we try it all out in class or in our offices. In the process, we gather a wealth of feedback that we can immediately (or as soon as the workplan allows) put to use in a new iteration. This differs radically from traditional publishing models in the Humanities, in which most feedback is received after any changes can be made. Although our Frankenstein must deal with all sorts of growing pains, at least we piece him together in a positive and forward-looking environment.
Marie-Claire Beaulieu, Perseids Professor
Teach the Teachers, Leipzig April 18-19th, 2016
The Perseids Project, in collaboration with the Humboldt Chair of Digital Humanities at the University of Leipzig and the Department of Classics at Tufts University is calling for participants in the first Teach the Teachers workshop. The two-day workshop aims to present and develop lesson plans and syllabi including digital methods for the high school and university Humanities curriculum. Registration for the workshop will be free and financial support for travel and lodging will be provided. We are looking for participants who teach at the High School or Secondary school level, as well as PhD candidates and Graduate Students.
As the field of classical studies continues to evolve, technology is playing a larger and larger and larger role both in the interpretation of data, but the in the education of a new generation of scholars. As people begin to use these tools to teach Greek and Latin, it is important that we come together and share our experiences, strategies, and ideas. Moreover, this workshop will offer educators who are unfamiliar with newer digital tools and their use in the classroom, to learn from fellow educators the best techniques for their implementation.
The workshop will contain seminars on how to use the tools available via Perseids, in particular the Alpheios Alignment editor and the Arethusa Treebank editor. These seminars will include comprehensive guidelines so that any user at any level of digital literacy will be able to use the tools to their full potential. This will include, but is not limited too:
- Use of translation alignments for non-language students
- Use of treebank annotations to assess understanding of grammar and morpho-syntax
- Use of the gold standard review functionality and the board review systems of Perseids
- The Perseids Social network annotation workflow
- Assessment strategies for digital assignments
The purpose of this workshop is to facilitate the exchange of new ideas for the implementation of the Perseids Platform in the classroom. During the two-day workshop we will produce resources for digital projects in a classroom setting. We will incorporate these resources into a growing collection of shared resources which will help future educators integrate the Perseids to their classroom practice.Those resources may include:
- Syllabi for high school and college level courses.
- Lesson plans for in-class digital projects.
- collaborative, inter-institutional workflows and project plans
Contributors should submit statements of up to 500-700 words. Submissions will be accepted until
We have extended the deadline to Monday, January 18th.
Send submissions in the form of a pdf to email@example.com Statements can include:
- Plans or ideas for the implementation of digital tools in the classroom
- Description of experience or interest in digital methods
- Other experience involving the use of digital tools in an educational setting
Treebanking, and the collaborative environment that surrounds treebanking allows undergraduates to participate in research and scholarship in new ways. This fall I am a teaching assistant, or peer-instructor, for an intermediate Greek course. Although I am a Junior in college, and taking my fifth Greek course right now, but because of treebanking I am able to instruct and assess students with one year less Greek than I have.
My role in the class is to lead close reading sessions and a treebanking lab, and to grade treebanking assignments. In the close reading sections, I go over the readings, drawn from Plato’s Apology, and answer questions. In the treebanking lab, I demonstrate how to treebank different Greek constructions. For instance, near the start of the semester a student asked about the difference between coordinating and subordinating conjunctions. I walked the class through how these conjunctions are treebanked and how the tree shows us the relationship between conjunctions and the words they govern and are governed by. In general, during this hour and twenty minute lab session students work independently on treebanking assignments while I go around the room and answer questions. When students submit their treebanks, I grade them for accuracy using the review tools built into the treebanking program we use, Arethusa. It is possible to automatically compare my treebank with a student’s treebank, and also to manually enter feedback on a particular word, sentence, of assignment.
I see real benefits for the students in the class and myself from this practice. The students in the class are able to ask every question they have about treebanking (and by extension Greek syntax) and get an answer immediately. I spent hundreds of hours treebanking Plato and Xenophon over the past summer, so I can refer back to my trees or to the resources I learned to use while I worked on that project if questions come up that I cannot answer off-hand. I think that small things, like understanding the relationship between the μέν and δέ of the classic μέν…δέ (“on the one hand… on the other hand”) construction, that would otherwise fall through the cracks are brought to the surface and addressed in this type of class. Before I started treebanking I could not have explained that the μέν…δέ construction is a type of coordination, because the structure is not explained that way in traditional Greek textbooks. That is, the traditional explanation of “on the one hand…on the other hand,” while useful for beginners to translate, does not explain the grammatical role of these words the way a treebank does. So in this way I can address the holes in students’ grammatical understanding and hopefully give them more of the tools they need to really read Greek.
But in terms of the benefits for me as a peer-instructor, I have never thought more about Greek than while I am answering questions from the students in this class. People say that you never learn something until you teach it. I cannot agree more – I had never thought about just what is going on syntactically with the “extra” ἤ in an ἤ…ἤ (“either…or”) construction. It is perfectly clear that one the ἤ’s is the coordinator and the other is only setting up the construction, but until I actually had to explain this common construction I had never thought much about it. It’s the sort of thing grammar books normally gloss over but that you have to know to understand the syntax. In short, treebanking with students has provided me a rigorous review of syntax. But the main reason this opportunity is so important is that before treebanking an undergraduate as a teaching assistant in Greek would be almost unheard of. The opportunity to really learn something through teaching has been given to me because of my experience with treebanking and the hands-on, collaborative work that treebanking encourages.
Tufts University, Greek and Latin Major
Class of 2017
Marie-Claire Beaulieu and I have a running joke between us playing on the difference between “professing” and “hacking.” When we have a Perseids Project hackathon, the hackers “hack” and the professors “profess.” This is meant to be funny, but it also exposes a more serious question — how, in a digital humanities project, do you manage the different approaches and expectations developers and humanists bring to project in a way that is productive and encourages rather than discourages continued collaboration?
The inspiration for this blog post comes from our presentation at DH 2015 in Sydney, “Building Tools that Build Digital Humanists.” This paper was actually mistitled in a way that highlights this very point of potential disconnect — what we should have called it was something like “Building tools while building Digital Humanists” because otherwise it implies that the tools themselves provide the magic solution of how to develop digital skills. Eliminating the tool as black box is a primary motivator behind many of the design decisions we make on Perseids. Exposing humanities scholars, researchers and students to the inner workings of the software tools we build for them, the raw data on which they operate, and the development process itself has resulted in some unexpected benefits and challenges.
We have been following what I call an Agile-inspired approach to building tools for Perseids. We engage the users at every step of the process, releasing features for them to work with well before the tools are fully finished. This is a fairly common practice in software development, but it introduces some unique challenges in academic environments, where programming and other resources are typically in much shorter supply, and where we often cannot finish what we start. Agile methods take for granted that there will be another sprint.
It sometimes feels a bit like looking for the silver lining in the clouds, but we have found that using tools mid-development with constant, and maybe even a bit too in-your-face, access to the raw data structures can open a window into exactly how the tools let users manipulate and shape the data. The process has allowed both students and researchers to understand their role in the creation, curation and annotation of texts through the scientific process of creating and using the data. It has also exposed the critical role the humanist plays as product designer and tester of the tools we develop to support the research and publication process. Or, as Stephen Ramsay says, as “builders and makers.” (Ramsay, Stephen. (2011). “On Building.” http://stephenramsay.us/text/2011/01/11/on-building/ )
It also builds their digital skills. We have scholars working with us who, prior to becoming involved in the project, had no programming background or experience with XML files, and are now developing their own analytical tools and services in XQuery and managing their JSON configuration files through GitHub. One researcher, initially very hesitant about her computer skills and afraid of breaking the system, is now our star quality assurance expert who eagerly tests new releases and can be counted on to find and report the bugs within hours if not minutes. Enterprising students and professors have become emboldened to develop additional solutions that cleverly work around current limitations in the system and which inform our design process and feature list.
It’s not all roses though. Our development approach of rapid prototyping has led to many misunderstandings about exactly how experimental certain features are, and dashed hopes about how soon things that we trial can be brought to readiness for actual use.
Another real challenge is that for projects that are funded in spurts you want to try to make sure that the code you write can be easily taken up and added to by new developers, but with code that changes so rapidly, keeping documentation up to date can be daunting and it is often the first thing that has to go when time runs short.
I am trying to be better about managing expectations about what we can realistically do by when, and imposing discipline in the form of requiring user stories and well defined requirements for new features, but as a developer in this environment it is not always easy to say no, especially when it’s the shiny new features that grab attention, while the amount of work needed to make them real is not always easily justifiable.
I think it’s worth it though. One of my favorite quotes from the group of scholars who make up our ad-hoc software development team on Perseids is the following, from Drs. Robert and Vanessa Gorman of the University of Nebraska Lincoln:
“The principal power of the Perseids/Arethusa system is the information it puts into the hands of student and instructor.”
It is this perspective that keeps me going when I’ve had to explain for the upteenth time why the system looks and feels a bit like Frankenstein. We have quite enjoyed working with the monster in our lab here, and we feel that it is precisely this wires-exposed nature of Perseids that is inspiring and fosters learning and experimentation.
The Perseids team and Tufts University joined the Université Lyon II, l’École Française d’Athènes and Brown University for a three week field workshop in Greece this May. The workshop included 12 graduate students from either side of the Atlantic and a team of faculty composed of professors, professionals, and information technology specialists (see our Participants list). The workshop addressed current issues in the practice of digital epigraphy, especially with respect to prosopography. Faculty and students examined stones and sites in Athens, Larissa, and Thasos. Daily blog entries created by students are available on the workshop website. At each site, we produced digital editions of texts and used a variety of digital tools to extract information from the data we created.
In particular, we used Timemapper to test a new reconstruction of the famous Mur des Théores in Thasos proposed by our colleague Michèle Brunet. The inscription in question is a long list of the names of yearly magistrates in Thasos, spanning at least seven centuries of local history. As the wall has crumbled over time, the reconstitution of the arrangement of the blocks is crucial in understanding the chronological organization of the inscription. Entering the data in Timemapper and thus reconstituting the proposed sequence of magistracies has allowed us to verify the chronological succession and arrangement of the blocks. We enhanced the Timemapper workflow by creating a CITE Image Collection of drawings of the blocks and including links to specific regions of interest on these images, referenced by stable CITE URN.
We also began drafting a social network of the inscription using the SNAP prosopographical standards in order to understand the relationships among the persons listed on the stone (so far, only father/son relationships are represented). The results are displayed in a prototype of a social network visualization plugin for the Arethusa annotation framework. (This plugin was developed for Visible Words with the additional support of the Humboldt Chair for the Digital Humanities at Leipzig.) We used the Hypothes.is annotation tool to annotate the relationships and identities according to a controlled workflow and simplified tagging conventions. We used stable URI identifiers from the Lexicon of Greek Personal Names (LGPN) to annotate the identities. We then submitted the Hypothes.is annotations to Perseids for stabilization and preservation. Upon ingest, the Perseids system tested the annotations to ensure they adhered to the tagging conventions and converted the tool-specific Hypothes.is annotation data according to the standard Open Annotation data model, and converted the simplified tags for the social relationships to adhere to the stable SNAP ontology.
In addition, this data can be further queried and presented in the Timemapper interface in order to compare it against traditional prosopographical resources such as the Lexicon of Greek Personal Names (LGPN). The names of the magistrates had also been encoded with TEI/EpiDoc in the Perseids Platform in reference to the blocks on which they were inscribed. As the LGPN also provide some TEI serialization of its data, it’s possible to enrich the TEI/EpiDoc transcription with information about the persons that were recognized by the students from the encoded names. Emmanuelle Morlock (HISoMA research center in Lyon) showed the students how they could re-use their encoded transcriptions to produce automatically – with some bits of XSLT – another Timemapper visualization displaying face-to-face the inscribed names and the information taken from the LGPN about the identified persons. A rough calculation of the age the magistrate would have at the year of the block is also possible, thus allowing to detect some wrong identifications through inconsistencies in the dates. In this way, our work contributed in creating a better understanding of this complex ancient inscription while furthering the development of digital tools and methods.
Our workshop in Greece was also the occasion to participate in SunoikisisDC, an international consortium of Digital Classics program with a shared interest in digital methods led by the Humboldt Chair of Digital Humanities in Leipzig. The Université Lyon II is an active participant in the consortium. On May 13, Michèle Brunet and Marie-Claire Beaulieu led the 6th Sunoikisis common session from Thasos focusing on Thasian involvement in the Peloponnesian War. On May 19, Michèle Brunet and her graduate students Nicolas Genis, Adeline Levivier, and Élise Pampanay led the 7th Sunoikisis common session from Thasos focusing on the history of the walls surrounding the ancient city.