Distant Reading in R. Analyse the text & visualize the Data

Simone Rebora & Giovanni Pietro Vitali

1.0 The Workshop 

Distant reading is one of the most famous methodological approaches that has been constantly taking place in digital humanities, since its formalisation by Franco Moretti in the article Conjectures on World Literature (2000). Distant reading benefits greatly from the use of computational tools. For this reason, we are proposing a course based on the use of R, one of the most popular programming languages used today by the scientific community. 

The course is suitable for beginners who want to start digital humanities training with a complete overview of the most common tools used for distant reading. 

The philosophy of the course is to analyse the text & visualize the data and the course is structured on this dichotomy. 

The objective of the course is to provide the participants with methodological and practical tools that they can utilise for their own research. At the end of the two weeks, they will be able to use R and RStudio in order to apply textual and spatial analysis. R analysis displays results that can be easily presented by graphical representations such as graphs, trees, or maps. As a result, part of the course will be dedicated to open source programs like Gephi, Gimp and Inkscape, specific to the reworking of vectorial and graphical files.  

2.0 Schedule 

The course takes place over two weeks in order to allow the participants to choose to attend one or both parts. However, participation to the entire course is strongly advised.  

The first week is dedicated to the basics of R and natural language processing, three of the most common methods used for distant reading (sentiment analysis, topic modeling, and stylometry) and a brief introduction to machine learning. The objective of this first week is to provide a basic theoretical / methodological understanding of distant reading techniques, together with the practical tools to analyse texts in an R environment. 

The second week is dedicated to data visualization. In this module the participants will focus on mapping, network analysis and graphics. The objective of this week is to give participants the tools to organise the visualisation of data graphically, chronologically, and spatially. If a participant is interested in the second week only, we will assume that s/he has a more than basic knowledge of R programming language. 

At the beginning of the course, the workshop leaders will divide the class in two groups according to their research interests. Each group will carry out some research to be presented on the last day of the workshop, using one of the methodologies introduced during the week. 

  Week 1: Analyse the text         Week 2: Visualize the data         
  Day 1 Day 2   Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10 
1st  hour Introduction to the course Natural Language Processing Sentiment analysis Machine learning Topic modeling Network analysis (Gephi) Network analysis Named-entity Recognition Mapping Mapping 
2nd  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Machine learning Topic modeling Network analysis (Gephi) Network analysis Named-entity Recognition Mapping Mapping 
3rd  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Stylometry Hands-on Network analysis (Gephi) Inkscape & Gimp Mapping (Coordinates) Mapping Mapping 
4th  hour Introduction to R and RStudio Natural Language Processing Sentiment analysis Stylometry Hands-on Network analysis (Gephi) Inkscape & Gimp Mapping (Coordinates) Mapping Mapping 

3.0 Technical Requirements 

  1. Participants should have their own computer with at least 5-10GB of available space. 
  1. Operating System: Windows (preferably 7+), Linux or Mac OSX. 
  1. Java 8 for the operating system. You may need to create an Oracle account to download Java 8. 
  1. Zip/unzip programs (these are programs that you normally have by default in your computer, like 7-Zip or WinZip for Windows, to manage compressed folders). 
  1. Browser: Mozilla Firefox and Google Chrome. 
  1. Simple text reading program (for txt and csv) like Sublime Text Editor 3 for Windows, Linux and Mac. 
  1. Google account  
  1. RStudio and Xquartz (the latter for Mac) 
  1. Openoffice 
  1. Gephi 
  1. Inkscape 
  1. Gimp