High-performance computing (HPC) clusters enable a dramatic improvement in the time required to process large problems and complex tasks by connecting many local computers (nodes) together via a high speed interconnect to provide a single shared resource. Such a distributed processing system “allows complex computations to run in parallel as the tasks are shared among the individual processors and memory.
Screen shot of the Perseus project web site.
Usually when people think of cluster computing use, they come up with examples from the hard sciences such as physics, mathematics or biology. However, humanities research can also benefit from cluster computing.
Professor Gregory Crane of the Tufts classics department has created the Perseus Digital Library “to make the full record for humanity – linguistic sources, physical artifacts, historical spaces – as intellectually accessible as possible to every human being.” Its flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world.
The Perseus project has been using UIT’s new research HPC cluster for two main purposes:
- parallel text alignment (aligning all of the words in a Latin or Greek text like the Aeneid or the Odyssey)
- training probabilistic syntactic parsers on their treebank data.
The Perseus Project has developed the Latin Dependency Treebank, a 53,143-word collection of syntactically parsed Latin sentences. Currently in version 1.5, the treebank is comprised of excerpts from eight authors including Caesar, Cicero and Vergil.
David Bamman, a Senior Researcher in Computational Linguistics for the Perseus Project, explains how he benefitted from using UIT’s new research HPC cluster for parallel text alignment and training probabilistic syntactic parsers:
“Both of these are computationally expensive processes – even aligning 1M words of Greek and English takes about 8 hours on a single-core desktop, and for my end result, I need to do this 4 separate times. Using a multithreaded version of the algorithm (to take advantage of each cluster computer’s 8 cores), has let me scale up the data to quantities (5M words) that I simply could not have done on our existing desktop computers.”
Therefore, a task that took 32 hours to accomplish on a desktop took only 9 minutes using UIT’s new research HPC cluster, a 200-fold decrease.
Most importantly, David found that using the cluster environment to run multiple instances of these algorithms in parallel greatly helped in testing optimization parameters for both tasks. For the text alignment task, he was able to run four alignments simultaneously. Therefore, using the cluster let him work not just faster but more accurately as well.
Rebecca Sholes, Senior Faculty Development Consultant, UIT Academic Technology