Class 2 started from where we left in class 1. Box plots was discussed in detail with detailed description about
whiskers, median, outliners like asteriks etc.
All About Box Plots
In statistical analysis, a box
plot is a graph that can be a valuable source of easy-to-interpret information
about a sample of study. A box plot can provide information about a sample's
range, median, normality of the distribution, and skew of the distribution. It
can also identify and plot extreme cases within the sample.
The box plot shows a box encased by
two outer lines known as whiskers. The box represents the middle 50% of the
data sample - half of all cases are contained within it. The remaining 50% of
the sample is contained within the areas between the box and the whiskers, with
some exceptions (these exceptions are called outliers and they will be
discussed more extensively later). For example, consider a sample of 100 IQ
scores. The bottom 25% of the scores would be represented by the space between
the lower whisker and the box, the middle 50% would be within the box, and the
remaining 25% would be contained between the box and the upper whisker.
Median Line:
Inside the
box, there is a single line. This line represents the median, which is the
middle value of the entire sample. Trace this line back to the axis to derive
its value. The location of the median line can also suggest skewness in the
distribution if it is noticeably shifted away from the center.
Box
Position:
The location
of the box within the whiskers can provide insight on the normality of the
sample's distribution. When the box is not centered between the whiskers, the
sample may be positively or negatively skewed. If the box is shifted
significantly to the low end, it is positively skewed; if the box is shifted
significantly to the high end, it is negatively skewed.
Box Size:
The size of
the box can provide an estimate of the kurtosis - the peakedness - of the
distribution. A very thin box relative to the whiskers indicates that a very
high number of cases are contained within a very small segment of the sample.
This signifies a distribution with a thinner peak. A wider box relative to the
whiskers indicates a wider peak. The wider the box, the more U-shaped the
distribution becomes.
Outliers:
Outliers are
not present in every box plot. When they are present, they are found in the
form of points, circles, or asterisks outside of the boundaries of the
whiskers. These are extreme values that deviate significantly from the rest of
the sample and they can exist above or below the whiskers of the box plot.
Further the calculation of mean, mode and median was taught
both in excel and Spss format.
Calculating Mean,
Median & Mode
The table below shows how to calculate the mean, median, mode and
range for two sets of data.
Set A contains the numbers 2, 2, 3, 5, 5, 7, 8 and Set B contains the numbers 2, 3, 3, 4, 6, 7.
Set A contains the numbers 2, 2, 3, 5, 5, 7, 8 and Set B contains the numbers 2, 3, 3, 4, 6, 7.
Measure
|
Set A
2, 2, 3, 5, 5, 7, 8 |
Set B
2, 3, 3, 4, 6, 7 |
The Mean
To find the mean, you need to add up all the data, and then divide this total by the number of values in the data. |
Adding the numbers up gives:
2 + 2 + 3 + 5 + 5 + 7 + 8 = 32
There are 7 values, so you divide
the total by 7: 32 ÷ 7 = 4.57...
So the mean is 4.57 (2 d.p.)
|
Adding the numbers up gives:
2 + 3 + 3 + 4 + 6 + 7 = 25
There are 6 values, so you divide
the total by 6: 25 ÷ 6 = 4.166...
So the mean is 4.17 (2 d.p.)
|
The Median
To find the median, you need to put the values in order, then find the middle value. If there are two values in the middle then you find the mean of these two values. |
The numbers in order:
2 , 2 , 3 , (5) , 5 , 7 , 8
The middle value is marked in
brackets, and it is 5.
So the median is 5
|
The numbers in order:
2 , 3 , (3 , 4) , 6 , 7
This time there are two values in
the middle. They have been put in brackets. The median is found by calculating the mean of these two values: (3 + 4) ÷ 2 = 3.5
So the median is 3.5
|
The Mode
The mode is the value which appears the most often in the data. It is possible to have more than one mode if there is more than one value which appears the most. |
The data values:
2 , 2 , 3 , 5 , 5 , 7 , 8
The values which appear most
often are 2 and 5. They both appear more time than any of the other data values.
So the modes are 2 and 5
|
The data values:
2 , 3 , 3 , 4 , 6 , 7
This time there is only one value
which appears most often - the number 3. It appears more times than any of the other data values.
So the mode is 3
|
Presented by,
Pallavi Gupta 2013187
Group Members:
Priya Jain 2013210
Prerna Bansal 2013209
Piyush Mittal 2013197
Neeraj Garg 2013166
No comments:
Post a Comment