7.2 Step 2: Check Assumptions

After looking at the data, the next step is to check whether the statistical test is appropriate for the data.

Statistical tests are built on certain expectations about the data. These expectations are called s. If assumptions are badly violated, the results may be inaccurate or misleading.

NoteWhy Assumptions Matter

Checking assumptions is not an annoying hoop to jump through after choosing a test. It is part of deciding whether the test you planned to use actually fits your data.

Assumptions Are Test-Specific

Different tests have different assumptions. That is why later chapters will walk you through assumption checks separately for each inferential statistic.

For now, you only need the general idea: before interpreting a statistical test, you need to know whether the test is reasonable for your variables, design, and data.

Most of the parametric tests in this book involve some combination of the following assumptions.

Continuous Dependent Variable

Many parametric tests require a continuous dependent variable.

In our Bobo doll example, the dependent variable is the number of aggressive behaviors. We are treating that as a continuous outcome because it is a quantitative score. That makes it appropriate for the kind of test we will eventually use.

If the dependent variable were categorical, such as whether a child showed any aggressive behavior, we would need a different kind of test.

Independence

Independence means that one observation should not improperly depend on another observation.

In our example, each child is observed separately. One child’s score should not determine another child’s score. That supports the assumption of independence.

Independence is usually a design issue, not something we can fix by clicking an option in jamovi. For example, students nested in classrooms or employees nested in organizations may be more similar to each other because of the group they belong to. That kind of structure requires more advanced methods than we cover in this book.

Normality

Normality refers to the shape of the distribution. For many parametric tests, we want the dependent variable, or sometimes a specific version of it, to be approximately normally distributed.

This does not mean real data must be perfectly bell-shaped. Real data are messy. The question is whether the data are close enough for the test to work reasonably well.

Later, Chapter 9 will show you how to examine normality using visualizations, skew and kurtosis, Shapiro-Wilk tests, and Q-Q plots.

Homogeneity of Variance

Homogeneity of variance means that groups have similar variability on the dependent variable.

In our example, we would ask whether the aggressive-video and calm-video groups have roughly similar variability in aggressive behavior.

This assumption matters for tests that compare groups on a continuous outcome, such as t-tests and ANOVAs. If variability is very different across groups, we may need a different version of the test.

What If an Assumption Is Violated?

A violated assumption does not automatically ruin everything. It means you need to make a thoughtful decision.

Depending on the test and the assumption, you may need to:

  • use a different version of the test;
  • use a non-parametric alternative;
  • transform or adjust the data carefully;
  • report the violation and interpret results cautiously; or
  • choose a different analysis.

Chapter 9 will cover these decisions more carefully. For now, tuck this idea into your brain: the test you run depends not only on the research question, but also on whether your data meet the assumptions for that test.

TipCheck Your Understanding
  1. What is a statistical assumption?
  2. Why is independence usually a design issue?
  3. What does normality refer to?
  4. What does homogeneity of variance refer to?

Answers

  1. A statistical assumption is a condition that should be reasonably met for a statistical test to work as intended.
  2. Independence depends on how the data were collected, such as whether observations are separate or nested within groups.
  3. Normality refers to the shape of a distribution and whether it is approximately bell-shaped.
  4. Homogeneity of variance means that groups have similar variability on the dependent variable.

Looking Ahead

In the next step, we perform the statistical test. For now, remember that “performing the test” is not just pressing buttons. It means using an analysis that fits the research question, variables, design, and assumptions.