The Chi-Square distribution is merely the distribution of
the sum of the squares of a set of normally distributed random variables. Its
value stems from the fact that the sum of random variables from any
distribution can be closely approximated by a normal distribution as the sum
includes a greater and greater number of samples. Thus the test is widely
applicable for all distributions.
A chi-squared test, also referred to as chi-square test or test, is any statistical
hypothesis
test in which the sampling distribution of the test
statistic is a chi-squared distribution when the null
hypothesis is true. Also considered a chi-squared test is a test in
which this is asymptotically true, meaning that the sampling distribution (if
the null hypothesis is true) can be made to approximate a chi-squared
distribution as closely as desired by making the sample size large enough.
One important set of statistical tests allows us to test for
deviations of observed frequencies from expected frequencies. To introduce
these tests, we will start with a simple, non-biological example. We want to
determine if a coin is fair. In other words, are the odds of flipping the coin
heads-up the same as tails-up. We collect data by flipping the coin 200 times.
The coin landed heads-up 108 times and tails-up 92 times. At first glance, we
might suspect that the coin is biased because heads resulted more often than tails.
However, we have a more quantitative way to analyse our results, a chi-squared
test.
To perform a chi-square test (or any other statistical
test), we first must establish our null hypothesis. In this example, our null
hypothesis is that the coin should be equally likely to land head-up or
tails-up every time. The null hypothesis allows us to state expected
frequencies. For 200 tosses, we would expect 100 heads and 100 tails.
The next step is to prepare a table as follows.
|
Heads
|
Tails
|
Total
|
Observed
|
108
|
92
|
200
|
Expected
|
100
|
100
|
200
|
Total
|
208
|
192
|
400
|
The Observed values are those we gather ourselves. The
expected values are the frequencies expected, based on our null hypothesis. We
total the rows and columns as indicated. It's a good idea to make sure that the
row totals equal the column totals (both total to 400 in this example).
Using probability theory, statisticians have devised a way
to determine if a frequency distribution differs from the expected
distribution. To use this chi-square test, we first have to calculate
chi-squared.
Chi-squared = (observed-expected)^2/(expected)
We have two classes to consider in this example, heads
and tails.
Chi-squared = (100-108)2/100 + (100-92)2/100 = (-8)2/100
+ (8)2/100 = 0.64 + 0.64 = 1.28
The left-most column the degrees of freedom (df). We
determine the degrees of freedom by subtracting one from the number of classes.
In this example, we have two classes (heads and tails), so our degree of
freedom is 1. Our chi-squared value is 1.28. Move across the row for 1 df until
we find critical numbers that bound our value. In this case, 1.07
(corresponding to a probability of 0.30) and 1.64 (corresponding to a
probability of 0.20). We can interpolate our value of 1.24 to estimate a
probability of 0.27. This value means that there is a 73% chance that our coin
is biased. In other words, the probability of getting 108 heads out of 200 coin
tosses with a fair coin is 27%. In biological applications, a probability � 5%
is usually adopted as the standard. This value means that the chance of an
observed value arising by chance is only 1 in 20. Because the chi-squared value
we obtained in the coin example is greater than 0.05 (0.27 to be precise), we
accept the null hypothesis as true and conclude that our coin is fair.
T-test
A t-test is any statistical hypothesis test
in which the test statistic follows a Student's t distribution if the null
hypothesis is supported. It can be used to determine if two sets of
data are significantly different from each other, and is most commonly applied
when the test statistic would follow a normal distribution if the value of a scaling
term in the test statistic were known. When the scaling term
is unknown and is replaced by an estimate based on the data, the test statistic
(under certain conditions) follows a Student's t distribution.
Student's t-test
When to use it
Use Student's t-test when you have one nominal variable and one measurement variable, and you want to compare the mean values of the measurement variable. The nominal variable must have only two values, such as "male" and "female" or "treated" and "untreated."Null hypothesis
The statistical null hypothesis is that the means of the measurement variable are equal for the two categories.How the test works
The test statistic, ts, is calculated using a formula that has the difference between the means in the numerator; this makes ts get larger as the means get further apart. The denominator is the standard error of the difference in the means, which gets smaller as the sample variances decrease or the sample sizes increase. Thus ts gets larger as the means get farther apart, the variances get smaller, or the sample sizes increase.The probability of getting the observed ts value under the null hypothesis is calculated using the t-distribution. The shape of the t-distribution, and thus the probability of getting a particular ts value, depends on the number of degrees of freedom. The degrees of freedom for a t-test is the total number of observations in the groups minus 2, or n1+n2-2.
Assumptions
The t-test assumes that the observations within each group are normally distributed and the variances are equal in the two groups. It is not particularly sensitive to deviations from these assumptions, but if the data are very non-normal, the Mann-Whitney U-test can be used. Welch's t-test can be used if the variances are unequal.Example
In fall 2004, students in the 2 p.m. section of my Biological Data Analysis class had an average height of 66.6 inches, while the average height in the 5 p.m. section was 64.6 inches. Are the average heights of the two sections significantly different? Here are the data:2 p.m. 5 p.m.
69 68
70 62
66 67
63 68
68 69
70 67
69 61
67 59
62 62
63 61
76 69
59 66
62 62
62 62
75 61
62 70
72
63
There is one
measurement variable, height, and one nominal variable, class section. The null
hypothesis is that the mean heights in the two sections are the same. The
results of the t-test (t=1.29, 32 d.f., P=0.21) do not reject the null
hypothesis.Submitted by:
Prachee Kasera
Nitesh Beriwal
Raghav Bhattar
Parthojit Sar
Neha Gupta
No comments:
Post a Comment