Brandeis Design and Innovation

Analyzing Patterns in Texts

Imagine you’re reading a book, and you notice that one character is cheerful and upbeat, while the second character is pessimistic and never has anything positive to say. If you wanted to share your observation, you might write a paper which integrates quotations paraphrased examples from the text. That’s text analysis.

What if you wanted to compare 50+ texts? Or count the number of times certain phrases were said? You could do it on your own, but it would take a long time.

Digital text analysis can help. Text analysis uses computer scripts to “read” a text and identify patterns. Text analysis provides a range of outputs, including numerical counts of particular words or phrases, identifying positive/negative language (i.e. sentiment analysis). With additional temporal and/or geographic data, we can even identify how language trends changed trends over time and space. 

There are a ton of ready-made toolkits for text analysis. This guide shares some good toolkits to try, tutorials, plus tips on preparing your dataset. 

Tools like Constellate, Scopus, and HathiTrust Digital Library, entail searching for published articles/reviews and analyzing the results. Users aren’t required to provide a dataset, and are pulling from a database. These tools are great for learning the basics of text analysis and can be great hands-on tools for the classroom. Learn the basics of Constellate and Scopus with this tutorial

Unlike the above tools, which pull texts from databases like JSTOR, there are a number of tools that allow users to upload their own dataset. Datasets can be any type of text – whole books (of any genre), scripts, etc. You could create a dataset by distributing a survey to respondents, or scraping (i.e. pulling from a specific website) Tweets or product/place reviews.

Prepping this dataset will take some time, and specific formatting decisions will depend on 1) what tool you’re using 2) your research questions/intended analyses. Preparing the dataset is called “cleaning”. You can learn the basics of cleaning text data but be prepared for future tweaks once you start analyzing. 

Before you can choose an analytical tool/platform, you should ask yourself some questions:

  1. Am I analyzing trends within a single text, or am I interested in comparing texts? Check out the examples below to figure this out:
    • Single Text Example: What are the twenty most popular terms in Pride and Prejudice?

      Comparing Texts Example: Does Jane Austen’s vocabulary expand over the course of her six major novels?
  2. Make a list of some analytical questions you want to answer. Make sure it’s possible to answer those questions with the data you have.
    • Am I interested in change over time? Does my dataset record any dates/times?
    • Do I want to show change across space? Are there locations in my dataset?

With a clean dataset and a list of some analytical questions, you can make an informed decision about what tool/platform to choose.

The chart below summarizes key features across a few text analysis platforms. Click here for an accessible PDF version. You may need to experiment with a few to ensure the platform meets your needs.

If you’re curious about whether text analysis is appropriate for your work, I recommend looking at Voyant and Orange first. Voyant is a great option for beginners and you can learn it quickly using this tutorial. Orange has a steeper learning curve but is great for beginners who need a powerful interface with a wide array of functions. 

If you want to get an in-depth understanding of text analysis in an instructional setting, Dr. Margarita Corral teaches a Text Analysis using R (a coding language) workshop. It’s entry level. If you are working on any projects using political data, survey responses, or any other type of Social Sciences data, it’s highly recommended that you reach out to Dr. Corral to hear all of the options and best practices in your field.

 

Tool Cost Compare Sentiment Quantitative Frequency Time Networks Spatial Learn
Voyant
recommended
Free No No No Yes No No No Voyant Tutorial
Orange
recommended
Free Yes Yes Yes Yes Yes Yes Yes Orange Learning Resources
R
recommended, especially for Social Sciences data
Yes Yes Yes Yes Yes Yes Yes Yes Brandeis Library Guide about R
Atlas.ti
recommended, especially for Social Sciences data
Yes Yes Yes Yes Yes Yes Yes Yes Brandeis Library Guide about Atlas.ti
ConText Free No Yes Np Yes No Yes No ConText Learning Resources
Palladio Free No Yes Yes Yes Yes Yes Yes Palladio Learning Resources
Meaning Cloud Limited Free Analyses Yes Yes No Yes No Yes No Meaning Cloud Learning Resources
Nocode Functions Free Yes Yes No Yes Yes Yes Yes
WordStat Paid Yes Yes No Yes Yes Yes No
Sci2 Free No Yes Yes Yes No Yes Yes Sci2 Learning Resources
Textal (App) Free No Yes No Yes No Yes No