The Viral Texts project examines pre-Civil War newspapers in America to see what made texts “go viral,” or get reprinted, throughout the century. This visualization shows shared reprints of in newspapers from 1836-1860.
Sample Data Sets
- Internet Archive
- Project Gutenberg
- Google Books
- Hathi Trust (Hathi Download Helper)
- NLP-Ready texts
- Early Modern Texts Datasets
- List of Tufts-accessible collections of texts
NOTE: Contact Martha Kelehan, Associate Director of Tisch Library, before embarking on a large-scale text analysis project. She’ll put you in touch with the librarians who will help negotiate license agreements to give you permission to do web scraping and text analysis on collections of texts.
- Voyant Tools – word frequencies, concordance, word clouds, visualizations
- TAPorWare – directory of data cleaning, annotating, and analysis tools
- Lexos – easy web based tool for visualizing text files, includes clickable preprocessing settings