7.4 Step 4: Interpret the Results

The final step is to interpret the results and connect them back to the research question.

This is where students often want to write something too strong, such as “the alternative hypothesis was proven” or “there is no effect.” Slow down here. Hypothesis testing gives us a decision about the null hypothesis, not certainty about the real world.

Reject or Fail to Reject the Null

In null hypothesis significance testing, we make one of two decisions:

reject the null hypothesis; or
fail to reject the null hypothesis.

Note my language carefully: reject or fail to reject the null hypothesis.

I am not saying that we prove the alternative hypothesis. I am not saying that we accept the null hypothesis. NHST is built around testing the null hypothesis, so our formal decision is about the null.

Result	Statistical decision	Plain-language meaning
p < α	Reject the null hypothesis	The data are surprising enough under the null that we reject it.
p ≥ α	Fail to reject the null hypothesis	The data are not surprising enough under the null, or we do not have enough evidence to reject it.

Interpreting the Bobo Doll Example

Suppose we analyze the Bobo doll data and get p < .001.

If α = .05, then p < α. The result is statistically significant, so we reject the null hypothesis.

In context, we might say:

Children who watched the aggressive video showed more aggressive behaviors than children who watched the calm video. The difference was statistically significant, so we reject the null hypothesis that children in the aggressive-video condition show the same or fewer aggressive behaviors than children in the calm-video condition.

That is a statistical conclusion. We should still interpret it alongside the study design, effect size, measurement quality, sample, and broader research literature.

Statistical Significance and Practical Significance

Statistical significance tells us whether a result crossed the threshold for rejecting the null hypothesis.

asks whether the result matters in context.

These are related, but they are not the same.

A result can be statistically significant but practically trivial. For example, a huge study might find a tiny difference that does not matter much in real life.

A result can also be practically meaningful but not statistically significant. For example, a small pilot study might show a potentially important pattern but not have enough power to detect it clearly.

This is why we care about both p-values and effect sizes.

Type I and Type II Errors

Hypothesis testing involves uncertainty. We can make the best decision available from the data and still be wrong.

A occurs when we reject the null hypothesis even though the null hypothesis is actually true. This is sometimes called a false positive.

A occurs when we fail to reject the null hypothesis even though the alternative hypothesis is actually true. This is sometimes called a false negative.

Reality	We reject H₀	We fail to reject H₀
H₀ is true	Type I error	Correct decision
H₁ is true	Correct decision	Type II error

In real research, we do not know with certainty whether H₀ or H₁ is true. If we knew, we would not need to collect data. This table helps us remember that hypothesis testing is decision-making under uncertainty.

Another Careful Language Moment

A statistically significant result does not guarantee that the null hypothesis is false. A non-significant result does not guarantee that the null hypothesis is true. Either decision can be wrong.

A Brief APA-Style Preview

Later, Chapter 10 will focus on writing results in APA style. For now, it is useful to see what a complete result might look like.

For our Bobo doll example, a write-up could look something like this:

Children who viewed the aggressive video (M = 11.20, SD = 3.10) exhibited more aggressive behavior than children who viewed the calm video (M = 5.40, SD = 2.80), t(28) = 5.39, p < .001, d = 1.97.

You do not need to understand every symbol in that sentence yet. The later test-specific chapters and the APA results chapter will walk you through the details. For now, notice that the write-up includes descriptive statistics, the inferential test result, the p-value, and the effect size.

Visualizing the Results

A complete analysis should usually include a graph. For this example, a box plot with individual data points would help show the difference between conditions, the spread within each condition, and any possible outliers.

The graph does not replace the statistical test. It helps readers see the pattern behind the numbers.

Check Your Understanding

What are the two formal decisions we make in NHST?
Why should we avoid saying that we “accept” the null hypothesis?
What is a Type I error?
What is a Type II error?
Why is practical significance different from statistical significance?

Answers

We reject the null hypothesis or fail to reject the null hypothesis.
Failing to reject the null does not prove the null is true. It only means we do not have enough evidence to reject it.
A Type I error occurs when we reject the null hypothesis even though it is true.
A Type II error occurs when we fail to reject the null hypothesis even though the alternative hypothesis is true.
Statistical significance tells us whether a result crosses a threshold for rejecting the null. Practical significance asks whether the effect matters in context.