Statistics and Programming Resources

There are a wealth of resources out there to learn statistics and programming. I would specifically recommend Kaggle.com if you are looking to get your hands on some easy datasets for analysis projects. Tufts also has a great set of subscriptions for databases accessible to students here. In addition, we’ve compiled some programming and statistics resources that our members have found useful themselves. We will keep updating this list with any new resources that we find.


Tufts Data Lab Resources

The Tufts Data Lab has a huge compilation of resources for many different statistical softwares and data visualization programs, including R, Python, SAS, STATA, SPSS, Mathematica, MATLAB, Excel, Tableau, NVivo, and High Performance Computing.

R/RStudio Resources
R for Data Science by Hadley Wickham and Garrett Grolemund
An Excellent introductory book for those looking to learn the basics of R, produce reproducible code, wrangle data using the tidyverse, visualize and explore data, and create models using the R statistical package
Big Book of R by Oscar Baruffa
A passion project compilation of bookmarks of books and resources pertaining to R. If you have questions on a specific subject matter in R programming and/or statistics, this book likely has a resource that can help
STHDA by Alboukadel Kassambara
A website focused in R, providing tutorials in data visualization, summary statistics, hypothesis testing, and more advanced topics like clustered data analysis, survival analysis, and certain machine learning algorithms
Advanced R by Hadley Wickham
A great book for those looking to dive deeper into R, and learn more about data structures, functional programming, functions, and more of the hard programming side of R
STATA Resources
Regression Methods in Biostatistics by Vittinghoff et al.
This is the course book for Statistical Methods II in the Friedman school, and has a lot of very helpful example code in STATA. In addition, it takes you through many basic statistics concepts that every biostatistician should know
SAS Resources
The Little SAS book, 6th edition
The course book for Intro to SAS at Friedman, this short textbook is an excellent guide for anyone just starting out with programming in SAS. This book goes through reading in data, data wrangling, merging and concatenating datasets, and outputting data in neat formats for reports. This book can be found through the Hirsch online library.
Tufts DISC
The Data Intensive Studies Center at Tufts
The Data Intensive Studies Center at Tufts hosts regular short classes on topics like regression, bayesian analysis, probability and machine learning, and deep learning. DISC tends to have a focus on python programming, as Python is one of the most prominent data science languages in use today.