Before you start trying to figure out what tests you should do, I always recommend roughly graphing your data. This will also help you identify which of your columns are categorical or continuous.
Using the iris data, here is how I would categorize the data:
Now before you would have even collected the data, you would have had a question. Never forget your question! Stats can be overwhelming, but it can be even more so, if you forget your question. If you need a reminder of what a question, hypothesis, and prediction is, check out my post on writing here.
Now that you have your question, picking out the statistical analysis is easy! Here is a chart for the most common statistical analyses given the specific data:
Using the examples above, here are the codes for each of the statistical analyses in R:
- Linear Regression = lm(y~x, data=YOUR DATA)
- eg. lm( # of Rings ~ Time (years), data= TREE RING DATA)
- lm=linear model
- Logistic Regression = glm(y~x, family=”binomial”, data=YOUR DATA)
- e.g glm( Cannibalism ~ Age (months), family=”binomial”, data= CANNIBALISM DATA)
- glm = general linear model, this formula has an extra function, family, where you can let R know that you want this linear model to be binomial, or logistic (look here to see what a logistic graph looks like)
- ANOVA = aov(y~x, data=YOUR DATA)
- eg. aov(Call Time (s) ~ Species, data = FROG DATA)
- aov = anova
- Chi-square = chisq.test(data$y, data$x)
- e.g chisq.test(CAT DATA$sex, CAT DATA$colour)
- chisq.test = chi square test
If you want to read more on these specific tests and how to do them in R, Google is your friend!