Data Entry Errors
Requiring Manual Correction:
1) The following bit of XML had been copied from the English and pasted into the top of each chapter in the Latin Book 1 word document, but should only have appeared once at the top. Only the DIV3 for type=”chapter” should have been repeated at the top of each chapter
<DIV1 TYPE=”text”>
<DIV2 N=”1″ TYPE=”book”>
<PB N=”1″ REF=”3″/>
<HEAD>THE FIRST BOOKE OF A COMMONWEALE.</HEAD>
2) An invalid Unicode space character (01xf) appeared a number of times in the document.
3) The biggest problem was with a variety of errors in the use of the <NOTE PLACE=”marg”></NOTE> tag
- missing closing and/or opening tag
- missing = before PLACE attribute
- curly quotes (“ and ”) instead of straight (“) used around PLACE attribute value
Correctable via automatic methods:
- Page xx should be <PB n=”xx”/>
- There were several instances of XXX instead of Page XXX
- All chapter DIV3 elements used N=”1″ rather than correct chapter number
Steps to convert to TEI-Analytics
- Saved word document as plain text
- adding wrapping <TEI/> root element
- correct above errors
- added closing tags for <DIV3> elements
- replaced & with &
- lower-cased all element and attribute names
- changed divX elements to div
- added TEI header, text, body, p elementsreplaced [A] [B], etc. with <milestone n=”A”/> etc.
Additional Cleanup on XML needed:
- Fill in teiHeader info (bibl, publication statement, etc.)
- Correct chapter <HEAD/> text (still same as English)
- If [A], [B], etc. sections are to become part of citation scheme, they need to be converted to divs