Exploring Mean, Median, and More: Descriptive Statistics Made Easy with R

                                         



Welcome Back!

 In this post, I will explore some basic concepts of descriptive statistics by looking at two different sets of data. I'll calculate key measures like the mean, median, mode, range, interquartile range, variance, and standard deviation to see how these sets differ. Using simple R commands, I'll show how these statistics can help us understand data better. This post is perfect for anyone wanting to learn data analysis basics using R.

The data sets we will be using:

  • Set 1: 10, 2, 3, 2, 4, 2, 5   

  • Set 1:20, 12, 13, 12, 14, 12, 15
The following Central Tendency for the 2 datasets:
Set 1:
  • Mean: 4.0
  • Median: 3.0
  • Mode: 2

  • Set 2
  • Mean: 14.0
  • Median: 13.0
  • Mode: 12

  • The following Variation for the data sets:

    Set 1:

    • Range: 8
    • Interquartile Range (IQR): 2.5
    • Variance: 8.33
    • Standard Deviation: 2.89

    Set 2:

    • Range: 8
    • Interquartile Range (IQR): 2.5
    • Variance: 8.33
    • Standard Deviation: 2.89

    Now Let's Compare the results!

    Central Tendency: 
    Set 2 has a much higher mean (14.0) compared to Set 1 (4.0), indicating that the values in Set 2 are generally larger. Similarly, the median of Set 2 (13.0) is higher than that of Set 1 (3.0), reflecting a higher central value, Set 2 has a mode of 12, while Set 1 has a mode of 2, showing different most frequent values.

    Variation:
    Both sets have the same range (8) and interquartile range (2.5), indicating a similar spread of data points. The variance and standard deviation are identical for both sets (8.33 and 2.89, respectively), which means both sets have the same degree of variability around their mean values.

    While both sets exhibit the same level of variability, the values in Set 2 are consistently higher than those in Set 1, as indicated by their central tendency measures (mean, median, and mode). This suggests that Set 2 represents a dataset with generally larger numbers but with a spread and distribution similar to Set 1. ​

    Code description:

    In this code, we start by creating two sets of numbers, set1 and set2, using the c() function, which combines the numbers into a list for each set. Next, we calculate some key statistics to understand these sets better. First, we find the mean (average) of each set using the mean() function, which adds up all the numbers and divides by the total count. Then, we find the median with the median() function, which gives us the middle value when the numbers are sorted. For the mode, which is the most frequent number, we use a combination of table() and sort() to identify the value that appears most often. To understand how spread out the numbers are, we calculate the range using the range() function and subtract the smallest value from the largest. We also calculate the interquartile range (IQR) with the IQR() function to see the spread of the middle 50% of the numbers. Finally, we calculate the variance with the var() function and the standard deviation using sd(), both of which help us understand how much the numbers vary from the mean. The cat() function is used at the end to print all these results in a readable format.

    Code:

    INPUT:





    OUTPUT:

    Comments

    Popular Posts