This post was initially intended for publication last Friday but due to weather was delayed.

Stata: The New Norm in Academic Statistics

Spring semester brings a heightened pressure for freshmen, sophomores, and juniors to find lucrative internships, and for the seniors about to enter the “real world” to land full-time employment. One commonly heard piece of advice from human resource representatives is to have an understanding of statistics, no matter your field.


Thankfully, classes can often cater to this desired professional attribute. A fundamental program that professors teach to aid students in manipulating and understanding statistical data is Stata. In this post, I will provide a brief introduction to Stata’s uses and capabilities while touching upon a few of its flaws.

According to Stata’s website, the program has capabilities with data management, graphic displays, modeling, regressions, and much more. In my experience, Stata does all of these exceptionally well. For example, if you would like to try to determine whether or not levels of atmospheric carbon dioxide have any relationship to surface temperature, you simply import your data and run a correlation. Stata will do all of the math for you, and can provide graphic displays of your information. Your graph might look something like this:

Of course, Stata can handle much more complex datasets with multiple variables, time scales and output measures. One example on the program website involves the predicted miles-per-gallon of both foreign and domestic cars as it relates to the vehicle’s weight. Plotting that data could give you a graph like this:

The most useful attribute of the program in my opinion is its intuitive command structure. If I want to calculate the correlation coefficient between two variables, for instance, I just type “corr variable1 variable2.” Want to find a summary of your data, with the mean and standard deviation? Simply type “summarize variable1.”

Though I have only been using Stata for a short period of time, I have noticed a few shortcomings. First, the spreadsheet that Stata provides to edit and manipulate data lacks the functionality of an Excel or Open Office spreadsheet, so users often run both programs simultaneously. Additionally, the offline product lacks a basic tutorial, so someone just starting out might have trouble at first. That being said, it has a vast command glossary, so if you know a command’s syntax but do not understand what it does, you can easily look it up.

No matter what career field students decide to pursue—engineering, politics, academia—a basic knowledge of statistics will be vital to their success. Do computer programs like Stata and others like it teach statistics? Unfortunately not, but they allow students to apply their classroom knowledge on real data and create an experiential learning opportunity. Having that skill can only help them in their future endeavors.


