Sunday, 21 July 2013

CHI-SQUARE ANALYSIS

CONTD. SESSION 5 (Lecture 10)

In 10th Lecture we studied about the Hypothesis testification  method and Chi-square Test.


A chi-squared test is a statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.

Chi-Square Test Requirements:1. Quantitative data.
2. One or more categories.
3. Independent observations.
4. Adequate sample size (at least 10).
5. Simple random sample.
6. Data in frequency form.
7. All observations must be used.


Chi Square Goodness of Fit Test:

This test allows us to compare a collection of categorical data with some theoretical expected distribution. This test is often used in genetics to compare the results of a cross with the theoretical distribution based on genetic theory. Suppose you preformed a simple monohybrid cross between two individuals that were heterozygotes for the trait of interest.

Results of a monohybrid cross between two heterozygotes for the 'a' gene.


 
 A

 a

 Totals
 A
 10

 42

 52
 a
 33

 15

 48
 Totals
 43

 57

 100


The penotypic ratio 85 of the A type and 15 of the a-type. In a monohybrid cross between two heterozygotes, however, we would have predicted a 3:1 ratio of phenotypes. In other words, we would have expected to get 75 A-type and 25 a-type. Are or results different?


Calculate the chi square statistic x2 by completing the following steps:
  1. For each observed number in the table subtract the corresponding expected number (O — E).
  2. Square the difference [ (O —E)2 ].
  3. Divide the squares obtained for each cell in the table by the expected number for that cell [ (O - E)2 / E ].
  4. Sum all the values for (O - E)2 / E. This is the chi square statistic.

Chi Square Test of Independence

For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test of independence the null and alternative hypotheses are:
Ho: The two categorical variables are independent.
Ha: The two categorical variables are related.
We can use the equation Chi Square = the sum of all the (fo - fe)2 / fe
Here fo denotes the frequency of the observed data and fe is the frequency of the expected values. The general table would look something like the one below:


  Category ICategory IICategory III
Row Totals
 Sample A
 a

b

c

a+b+c
 Sample B
 d

e

f

d+e+f
 Sample C
 g

h

i

g+h+i
 Column Totals
 a+d+g

b+e+h

c+f+i

 a+b+c+d+e+f+g+h+i=N


Now we need to calculate the expected values for each cell in the table and we can do that using the the row total times the column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N.
Once the expected values have been calculated for each cell, we can use the same procedure are before for a simple 2 x 2 table.


 ObservedExpected|O - E|(O — E)2 (O — E)2/ E
     


Suppose you have the following categorical data set.


Table . Incidence of three types of malaria in three tropical regions.


  AsiaAfrica
South America

Totals
 Malaria A
31

14

45

90
 Malaria B
2

5

53

60
 Malaria C
53

45

2

100
 Totals
 86

64

100

250


We could now set up the following table:


 Observed
Expected

|O -E|

 (O — E)2

 (O — E)2/ E
 31 30.96 0.04 0.0016 0.0000516
 14 23.04 9.0481.723.546
 45 36.00 9.0081.002.25
 2 20.64 18.64347.4516.83
 5 15.36 10.36107.336.99
 53 24.00 29.00841.0035.04
 53 34.40 18.60345.9610.06
 45 25.60 19.40376.3614.70
 2 40.00 38.00 1444.0036.10


Chi Square = 125.516


Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4

Table 3. Chi Square distribution table.

probability level (alpha)


Df0.50.10
0.05
0.020.010.001
10.4552.7063.8415.4126.63510.827
21.3864.6055.9917.8249.21013.815
32.3666.2517.8159.83711.34516.268
43.3577.7799.48811.66813.27718.465
54.3519.23611.07013.38815.08620.517


Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05)


Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Our data tell us there is a relationship between type of malaria and location, but that's all it says.

Publisher : Nitin Kumar Shukla

Group 8
Nitin Kumar Shukla
Nishanth Aggarwal
Neeraj Ramadoss
Prakhar Swami
Prerna Arora
Praveen Iyer



No comments:

Post a Comment