Creating a New Treebank Annotation

Creating a New Treebank Annotation

  1. Login to Perseids
  2. Click “New Treebank Annotation” link on the Perseids home screen.
  3. Enter text for annotation via one of the following methods:
    1. Text Input: You can enter plain text or XML directly into the text input box.
    2. Text Selector widget: If you don’t know the URI for the text you want to treebank, you click  “Toggle text selector” link which will provide a CTS-based selector for available texts. We will continue to add to the texts available via this selector – they are ones that have been vetted for CTS compatibility and for the immediate term are generally more  reliable than using Perseus stable uris directly.  This selector works very like the one on the annotation editor — after you’ve made your selections in the drop-downs, click “Retrieve Text”. This will populate the text box with the text you chose (the output of the CTS request) and you can confirm that it’s right before continuing to click “Edit” to create your template.
    3. Text URI: You can specify a URL that retrieves a text that you want to create treebank. The URL should return either valid XML or plain text. These should work for CTS API request urls, as well as the data.perseus.org citation urls (to the extent they work). When you leave the text uri input field after making a change, the application will attempt to automatically load the text at that location into the text input area. You should wait for this before continuing.
  4. The application will attempt to automatically determine whether the text supplied via any of the above methods is plain text or XML. You should verify that the automatic determination is correct, and override it if necessary, by checking or unchecking the “Input is XML” checkbox that appears below the text input area.
  5. The application will also attempt to automatically determine the language of the text supplied via any of the above methods. You should verify that the automatic determination is correct, and override it if necessary, by checking the appropriate Language radio button (Greek, Latin or Arabic currently supported)
  6. The application will also attempt to automatically determine the direction of the text supplied via any of the above methods. You should verify that the automatic determination is correct, and override it if necessary, by checking the appropriate Text Direction radio button
  7. You can access advanced formatting and tokenization options by clicking on the “Click to toggle advanced options” link. This will expose options for choosing a format (tag set) for your treebank, options for merging, shifting and splitting enclytics, and, if your text input is XML, specifying details of the root node, namespace, and any elements that should be ignored when tokenizing the text.
    1. Note the defaults for the XML options should just work for most of the Perseus-supplied documents, but as there are still such a wide variety of formats out there, there may be exceptions to that. Note that all element names specified here (for root or to ignore) should be specified as local names, without prefixes. If the document is namespaced, you must supply the namespace uri. Referencing elements belonging to multiple different namespaces within a single document is not currently supported.
  8. Click Edit. After a few moments (or longer depending upon the length of your text) the new treebank annotation should be created in your Perseids workspace and automatically opened in the editor.

Alternate path – File Upload:

If you already have an treebank XML file adhering to the Perseus Ancient Language Dependency Treebank (ALDT) schema that you want to upload to work on in Perseids, you can use the File Upload button at the bottom of the input form (labelled “Browse” or “Choose File” depending on the browser). Click the button, select your local file, and the upload will begin immediately. If successful, after a few moments (or longer depending upon the file size), the file will be uploaded to your Perseids workspace and opened automatically in the editor.  Note that the default editor is currently Alpheios. If you want to specify Arethusa as the default editor for the file, you can add the following to the XML of your file before upload (or after using the EditXML feature in Perseids):

<annotator>
<short>arethusa</short>
<name>arethusa</name>
<address/>
<uri>http://github.com/latin-language-toolkit/arethusa</uri>
</annotator>

To specify Alpheios as the editor (i.e. once Arethusa has been made the default) you would add

<annotator>
<short>alpheios</short>
<name>alpheios</name>
<address/>
<uri>http://github.com/alpheios-project/treebank-editor</uri>
</annotator>

Alternate path – Create Template Link

You can use the new Create Link feature at the bottom of the input form to create a link to a template treebank file on any server that then can be used as the base for new treebank annotations.  This is essentially a mechanism to allow anyone to create links into Perseids with templates of their choosing (e.g. such as might be needed for a class assignment) along the lines of the prototype links that I put into the Treebank Pilot Links page.  More information on using this feature is at http://sites.tufts.edu/perseids/instructions/instructions-for-using-templates-for-treebanking-with-students/