Introduction to Statistics

Before you start trying to figure out what tests you should do, I always recommend roughly graphing your data. This will also help you identify which of your columns are categorical or continuous.

Using the iris data, here is how I would categorize the data:

Now before you would have even collected the data, you would have had a question. Never forget your question! Stats can be overwhelming, but it can be even more so, if you forget your question. If you need a reminder of what a question, hypothesis, and prediction is, check out my post on writing here.

Now that you have your question, picking out the statistical analysis is easy! Here is a chart for the most common statistical analyses given the specific data:

This is example data. It is all gibberish and made-up.

Using the examples above, here are the codes for each of the statistical analyses in R:

  • Linear Regression = lm(y~x, data=YOUR DATA)
    • eg. lm( # of Rings ~ Time (years), data= TREE RING DATA)
    • lm=linear model
  • Logistic Regression = glm(y~x, family=”binomial”, data=YOUR DATA)
    • e.g glm( Cannibalism ~ Age (months), family=”binomial”, data= CANNIBALISM DATA)
    • glm = general linear model, this formula has an extra function, family, where you can let R know that you want this linear model to be binomial, or logistic (look here to see what a logistic graph looks like)
  • ANOVA = aov(y~x, data=YOUR DATA)
    • eg. aov(Call Time (s) ~ Species, data = FROG DATA)
    • aov = anova
  • Chi-square = chisq.test(data$y, data$x)
    • e.g chisq.test(CAT DATA$sex, CAT DATA$colour)
    • chisq.test = chi square test

If you want to read more on these specific tests and how to do them in R, Google is your friend!