Quick Tips On Writing with Statistics
This handout explains how to write with statistics including quick tips, writing descriptive statistics, writing inferential statistics, and using visuals with statistics.
Last Edited: 2010-04-21 07:46:27
1. Never calculate or use a statistical procedure you don't fully understand. If you need a statistical procedure, and you don't understand it, then you need to consult someone or learn how to do it properly.
2. Never attempt to interpret the results of a statistical procedure you don't fully understand. If you need to interpret a particular statistic, talk with a professional statistician and make sure you understand the proper interpretation. Unlike descriptive statistics, inferential statistics is anything but black and white, there may be several valid interpretations of a given statistic, and you need to be aware of which ones are better under which circumstances.
3. If you are using statistics in a paper, consider your audience. Will they understand the statistics you are using? If not, you may need to explain the procedure that you are using in detail. This is not inappropriate! It is better to include too much information than too little. Depending on your field, this may be done using an appendix, footnotes, or directly in the text.
4. Present as much information as needed so that your reader can make his or her own interpretation of your data. Certainly, your job is to help them interpret your data, but most statistics are used to support a persuasive argument. You need to give your reader enough information that they can reconstruct your argument from your statistics. If you don't give enough information, people will think that you are being deceptive, which can damage your credibility. You can't convince someone of anything if they are convinced that you are misleading them!
5.Use graphics and tables. Statistics can contain a lot of information, and using visuals can display a lot of information in a manner that can be quickly understood. See the section on visuals and statistics for more information.
6. If it's applicable, and you can calculate it, do include some measure of variability; typically this is a standard deviation. Even if you aren;t doing any inferential statistics, this statistic provides excellent information about your data set.
7. Be wary of using statistics from other places that are not peer-reviewed. Popular magazines are notorious for including bad statistics. Often times their 'sample' is a section of people who choose to respond to some online query. Their sample often includes mostly women or mostly men (depending on the magazine) but rarely do they have a good representation from both genders, and many times the magazines imply that the results generalize to the entire population. While some might, many do not. If it's not from a reliable source, then don't use it.
8. Speaking of sources, if you used a statistic, you need to provide your audience with additional information including where the statistic came from. You should be wary of statistics that seem to appear out of nowhere.
- A poor example: The ten largest cities in the U.S. comprised 54% of the total U.S. population.
- A good example: According to the United States Census Bureau, in 2000, the ten largest cities in the U.S. comprised 54% of the total U.S. population.
In the second example, your audience knows exactly where the statistic comes from (if they don't believe your statistic, they can go and check themselves) and it comes from a reputable source (the U.S. Census Bureau).
9. If you calculated a statistic, how did you calculate it? In some fields, you don't need to tell your readers how you calculated some statistics. For example, in psychology, you don't need to explain how you did an ANOVA or a t-test, but in other areas you might need to explain this in more detail.
10. Be clear as to what population(s) your statistic is meant to generalize to. If your sample included only male college students, you should be very careful if you want to generalize your results to female lawyers. Don't imply that your sample generalizes to everyone if your statistic was calculated from a more specific population.
11. If you are using inferential statistics, try to speak as plainly as possible, and put the statistics at the end of the sentence. See the Writing Inferential Statistics section for more information.
I get asked this question fairly often so I thought I would do a few posts on it. The most common problem is that a student who is new to statistics has no idea where to even start.
These examples use SAS but you could use any package you like.
My recommendation to students beginning to learn statistics is to start with some type of publicly available data set, getting some experience with real data.
1. IDENTIFY THE VARIABLES YOU HAVE AVAILABLE
The first thing to do is examine the contents of the dataset. Look at the variables you have available. With SAS, you would do this with PROC CONTENTS.
Your program at this point is super simple
LIBNAME mydata “path to where your data are” ;
PROC CONTENTS DATA = mydata.datasetname ;
Normally, you would come up with a hypothesis first and then collect the data. The advantage of working with public use data sets is you don’t have to go to the time and expense of interviewing 40,000 people. The disadvantage is that you are limited to the variables collected.
2. GENERATE A HYPOTHESIS
Looking at the California Health Interview Survey data, I came up with the following null hypothesis:
There is no difference in obesity among Caucasians, African-Americans and Latinos.
3. RUN DESCRIPTIVE STATISTICS
You need descriptive statistics for three reasons. First, if you don’t have enough variance on the variables of interest, you can’t test your null hypothesis. If everyone is white or no one is obese, you don’t have the right dataset for your study. Second, you are going to need to include a table of sample statistics in your paper. This should include standard demographic variables – age, sex, education, income and race are the main ones. Last, and not necessarily least, descriptive statistics will give you some insight into how your data are coded and distributed.
proc freq data = mydata.coh602 ;
tables race obese srsex aheduc ;
where race ne “” ;
proc means data= mydata.coh602 ;
var ak22_p srage_p ;
where race ne “” ;
You can see the results from the code above here.
Notice something about the code above – the WHERE statement. My hypothesis only mentioned three groups – Caucasians, African-Americans and Latinos. Those were the only three groups that had a value for the race variable. (This example uses a modified subset of the CHIS , if you are really into that sort of thing and want to know.) Since that is the population I will be analyzing, I do not want to include people who don’t fall into one of those three groups in my computation of the frequency distributions and means.
4. PUT TOGETHER YOUR FIRST TABLE
Using the results from your first analysis, you are all set to write up your sample section, like this
The sample consisted of 38,081 adults who were part of the 2009 California Health Interview Survey. Sample demographics are shown in Table 1.
<Then you have a Table 1>
Variable …………N…. %
- Black 2,181 5.7
- Hispanic ,4926 13.0
- White 30,974 81.3
- Male 15,751 41.4
- Female 22,330 58.6
Variable ……N ….. Mean… SD
Age…………38,081 55.4 18.0
Income 37,686 $69,888 $63,586
I’ll try to write more soon, but for now The Invisible Developer is pointing out that it is past 1 a.m. and I should get off my computer.
UPDATE: Click here for step 2