Learning Text Processing

How can I learn more about text processing and NLP?

  • General NLP
    • This is an excellent collection of lecture slides from the Accelerated Natural Language Processing course at the University of Edinburgh. These cover all of the most important subfields of NLP in a thorough yet concise way. This material is easier to digest if you have some solid math and probability theory skills under your belt.
    • This book (yes, an actual book) has been and continues to serve as the first point of reference for many scholars engaging with NLP. This is absolutely the place to look for an in-depth explanation of the theories behind and applications for all subfields of NLP, both for speech and text processing. It is available at Tisch Library. The text of the third edition draft (which will contain extra material on machine translation and chatbots, among other topics) is available online.
  • Basic Text Processing
    • Here is a video introduction and accompanying code for NLTK (a Python library for text processing) and basic text processing/visualization.
    • This is a really awesome tool for exploratory text analysis that doesn’t require any coding! You can read more about the tool and the reasons for using it here.
    • Voyant and NVivo are also good tools for exploratory analysis and generating some cool, simple visualizations without writing any code. Be aware that NVivo can take some getting used to because of rather complex interface, while Voyant, on the contrary, is very easy to start using, but doesn’t really let you “look under the hood” to see how it’s working.
    • If you prefer to use R, you can check out this tutorial for text analysis.
  • Computational Semantics
    • This is an awesome guide with an accompanying notebook to get you started with sentiment analysis, assuming you have a solid grasp of basic Python.
    • Here is a walkthrough of how to implement word embeddings in Python using word2vec.

And don’t forget to check out the workshops available through the DataLab! There will be an introductory text analysis workshop as well as a sentiment analysis workshop in the Fall semester.