Saturday, 31 August 2013

T-test and Crosstabs

We were required to carry out a small project based on statistical analysis of data. Our topic was ‘Month wise production of cement in India’. Data was collected from www.indiastat.com. Following is a snapshot of the data.
On the required data, we were asked to perform the following functions.

1.       Mean
Mean is what most people commonly refer to as an average. The mean refers to the number you obtain when you sum up a given set of numbers and then divide this sum by the total number in the set. Mean is also referred to more correctly as arithmetic mean.
mean= sum of elements in set/ number of elements in set
Example.
To find the mean of the set of numbers below 
3, 4, -1, 22, 14, 0, 9, 18, 7, 0, 1
The first step is to count how many numbers there are in the set, which we shall call n,
n=10
The next step is to add up all the numbers in the set
sum= 77
The last step is to find the actual mean by dividing the sum by n,
mean=7.7

2.       Median
The median is defined as the number in the middle of a given set of numbers arranged in order of increasing magnitude. When given a set of numbers, the median is the number positioned in the exact middle of the list when you arrange the numbers from the lowest to the highest. The median is also a measure of average. In higher level statistics, median is used as a measure of dispersion. The median is important because it describes the behavior of the entire set of numbers.
 Example.
To find the median in the set of numbers given below
15, 16, 15, 7, 21, 18, 19, 20, 21
From the definition of median, we should be able to tell that the first step is to rearrange the given set of numbers in order of increasing magnitude, i.e. from the lowest to the highest
7, 11, 15, 15, 16, 18, 19, 20, 21
Then we inspect the set to find that number which lies in the exact middle.
median=16

3.       Mode
The mode is defined as the element that appears most frequently in a given set of elements. Using the definition of frequency given above, mode can also be defined as the element with the largest frequency in a given data set. For a given data set, there can be more than one mode. As long as those elements all have the same frequency and that frequency is the highest, they are all the modal elements of the data set.
Example.
To find the Mode of the following data set.
3, 12, 15, 3, 15, 8, 20, 19, 3, 15, 12, 19, 9
Mode = 3 and 15

4.       T-test
We use this test for comparing the means of two samples (or treatments), even if they have different numbers of replicates. In simple terms, the t-test compares the actual difference between two means in relation to the variation in the data (expressed as the standard deviation of the difference between the means).
The formula given below is used to compute the T Test 

Where,
x1 is the mean of first data set, x2 is the mean of first data set
S12 is the standard deviation of first data set, S22 is the standard deviation of first data set
N1 is the number of elements in the first data set, N2 is the number of elements in the first data set

Example.
Calculate the T test value whose inputs are 10, 20, 30, 40, 50 and 1, 29, 46, 78, 99.
First Calculate Standard Deviation & mean of the given data set, 

For 10, 20, 30, 40, 50 
Total Inputs(N)=5
Means(xm)= 30 
SD =15.8114

For 1, 29,46,78,99 
Total Inputs(N) = 5
Means(xm) = 50.6 
SD=38.8626 

To Perform T Test 
From above we know that, 
x1 = 30, x2 = 50.6, S12 = 250, S22 = 1510.3, N1 = 5, N2 = 5 
Substitute these values in the above formula, 
  T = (30 - 50.6)/√((250/5) + (1510.3/5)) 
= -1.0979

5.       Cross Tabulation
It is a statistical process that summarises categorical data to create a contingency table. They are heavily used in survey research, business intelligence, engineering and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them.
Example.

We examine the 3 rows for the unit J1. This unit needs both Adult size t-shirts and Child sizes.

A total of 8 adult (A) shirts (Total Of ID): 
   2 medium (M), 3 small (S), 3 extra large (X)
A total of 8 child (C) t-shirts (Total Of ID)::
   5 large (L), 2 medium (M), 1 extra large (X)

By Pallavi Gupta (2013187)
Group Members:
Piyush (2013197)
Prerna Bansal (2013209)
Priya Jain (2013210)
Neeraj Garg (2013318)


No comments:

Post a Comment