Statistical Analysis with R: Exploring Drug and Stress Data with ANOVA and T-Tests

October 24, 2024

Statistical Analysis with R: Exploring Drug and Stress Data with ANOVA and T-Tests

Welcome Back!

Statistical analysis is a powerful tool for understanding the effects of various interventions. In this blog, we'll dive into two key statistical methods: ANOVA (Analysis of Variance) and t-tests, using R as our primary tool. We'll analyze data from two separate experiments—one on the effects of drugs on stress and the other on infant motor development (Zelazo dataset).Through these examples, we'll walk through how to interpret results from t-tests and ANOVA, understanding concepts like p-values, t-values, and how to interpret the statistical significance of differences between groups.

1. The key question is: Does the drug's effectiveness change depending on how much stress the person is under?

To answer this, we used a statistical method called ANOVA (Analysis of Variance). This test helps us figure out whether there’s a significant difference in drug effectiveness across the different stress levels.

ANOVA compares the means (average scores) across multiple groups to determine if the differences we see in their averages are statistically significant (i.e., not due to random chance).

In our case, the groups are:

High Stress
Moderate Stress
Low Stress

We’re looking at whether the stress reactions (the numbers you see in the table) are significantly different between these groups after taking the drug.

The Data

The steps in R are simple:

I first created a table (or dataframe) to store the stress reaction scores.
Then, I performed the ANOVA test to see if there was a significant difference in the drug's effect between the three stress groups.

INPUT:

OUTPUT:

What Do These Results Mean?

Let’s break it down:

Df: This stands for Degrees of Freedom. For our stress levels, it’s the number of groups minus one (so 2). The degrees of freedom for the residuals (or error) is the total number of observations minus the number of groups (in this case, 15).
Sum Sq: This is the Sum of Squares, which tells us the total variation in the stress reaction scores for each group. We see that the sum of squares for the stress levels (98.67) is much higher than for the residuals (72.33), meaning that there’s a noticeable difference between the groups.
Mean Sq: This is the Mean Square, calculated by dividing the sum of squares by the degrees of freedom. It helps us understand how much variation there is per degree of freedom.
F value: This is the ratio of the variance between the groups to the variance within the groups. A high F value (like 10.24 here) suggests that the groups have different averages, so the drug’s effect might change depending on stress levels.
Pr(>F): This is the p-value, and it tells us whether the differences we see are statistically significant. In this case, the p-value is 0.00235, which is much smaller than 0.05 (our standard threshold for significance). Since our p-value is so low, we can confidently say that there is a significant difference in stress reaction scores between the groups.

2.1 Convert the data for use in lm and perform t-tests

The zelazo dataset consists of four groups of infants who received different types of intervention (active, passive, none, control for 8 weeks). The goal is to convert this data into a format suitable for linear modeling (lm function) and then calculate relevant t-tests comparing subgroups.

INPUT:

OUTPUT:

Interpretation of Results:

t = -3.0449: This is the t-statistic. It measures how far the observed difference between the means is from zero in units of the standard error. A higher absolute value of t indicates a greater difference between the groups.
df = 8.6632: This is the degrees of freedom, which adjusts for sample size and variability.
p-value = 0.01453: The p-value is less than 0.05, meaning the difference between the active and ctr.8w groups is statistically significant. This suggests that the scores for the two groups are not due to random chance, and there is a meaningful difference between them.
Confidence Interval: The 95% confidence interval ranges from -3.89 to -0.56. Since zero is not within this interval, it reinforces the idea that the difference between the group means is significant. The confidence interval suggests that the true difference in means is likely between those values, and it's consistently negative, meaning ctr.8w has higher scores than the active group.
Mean of x (Active) = 10.125: The average score for the active group is 10.13.
Mean of y (ctr.8w) = 12.35: The average score for the control group (ctr.8w) is 12.35.

2.2

Apply ANOVA to the Zelazo dataset

INPUT:

OUTPUT:

Search This Blog

Advanced Stats and Analytics