Summary of 13th
August Session
A t-test is
any statistical
hypothesis test in which the test statistic follows a Student's t distribution if
the null hypothesis is
supported. It can be used to determine if two sets of data are significantly
different from each other, and is most commonly applied when the test statistic
would follow a normal distribution if
the value of a scaling term in the test statistic were known. When the scaling term is unknown and is
replaced by an estimate based on the data,
the test statistic (under certain conditions) follows a
Student's tdistribution.
Among the most
frequently used t-tests are:
·
A one-sample location test of
whether the mean of a normally distributed population has a value specified in
a null hypothesis.
·
A two-sample location test of the null hypothesis that the means of two normally distributed populations
are equal. All such tests are usually called Student's t-tests,
though strictly speaking that name should only be used if the variances of the two populations are
also assumed to be equal; the form of the test used when this assumption is
dropped is sometimes called Welch's t-test. These tests are often
referred to as "unpaired" or "independent
samples" t-tests, as they are typically applied when the statistical unitsunderlying the two samples
being compared are non-overlapping.[6]
·
A test of the null hypothesis that the difference between two responses
measured on the same statistical unit has a mean value of zero. For example,
suppose we measure the size of a cancer patient's tumor before and after a
treatment. If the treatment is effective, we expect the tumor size for many of
the patients to be smaller following the treatment. This is often referred to
as the "paired" or "repeated measures" t-test:[6][7] see paired difference
test.
·
A test of whether the slope of a regression line differs significantly from
0.
Types Of T-test
·
Single Sample T-test- The
one-sample t-test compares the mean score of a sample to a known value, usually
the population mean (the average for the outcome of some population of
interest). The basic idea of the test is a comparison of the average of the
sample (observed average) and the population (expected average), with an
adjustment for the number of cases in the sample and the standard deviation of
the average
Independent Sample T-test-The
Independent-Samples T Test procedure compares means for two groups of cases.
Ideally, for this test, the subjects should be randomly assigned to two groups,
so that any difference in response is due to the treatment (or lack of
treatment) and not to other factors. This is not the case if you compare
average income for males and females. A person is not randomly assigned to be a
male or female. In such situations, you should ensure that differences in other
factors are not masking or enhancing a significant difference in means.
Differences in average income may be influenced by factors such as education
Example. Patients with high
blood pressure are randomly assigned to a placebo group and a treatment group.
The placebo subjects receive an inactive pill, and the treatment subjects
receive a new drug that is expected to lower blood pressure. After the subjects
are treated for two months, the two-sample t test is used to compare
the average blood pressures for the placebo group and the treatment group. Each
patient is measured once and belongs to one group.
Paired Sample T-test-
procedure compares the means of two variables for a single group. The procedure
computes the differences between values of the two variables for each case and
tests whether the average differs from 0.
Example. In
a study on high blood pressure, all patients are measured at the beginning of
the study, given a treatment, and measured again. Thus, each subject has two
measures, often called before and after measures. An alternative design for
which this test is used is a matched-pairs or case-control study, in which each
record in the data file contains the response for the patient and also for his
or her matched control subject. In a blood pressure study, patients and controls
might be matched by age (a 75-year-old patient with a 75-year-old control group
member
Frequency Distributions
One important
set of statistical tests allows us to test for deviations of observed
frequencies from expected frequencies. To introduce these tests, we will start
with a simple, non-biological example. We want to determine if a coin is fair.
In other words, are the odds of flipping the coin heads-up the same as
tails-up. We collect data by flipping the coin 200 times. The coin landed
heads-up 108 times and tails-up 92 times. At first glance, we might suspect
that the coin is biased because heads resulted more often than than tails.
However, we have a more quantitative way to analyze our results, a chi-squared
test.
To perform a
chi-square test (or any other statistical test), we first must establish our
null hypothesis. In this example, our null hypothesis is that the coin should
be equally likely to land head-up or tails-up every time. The null hypothesis
allows us to state expected frequencies. For 200 tosses, we would expect 100
heads and 100 tails.
The next step is
to prepare a table as follows.
Heads
|
Tails
|
Total
|
|
Observed
|
108
|
92
|
200
|
Expected
|
100
|
100
|
200
|
Total
|
208
|
192
|
400
|
The Observed
values are those we gather ourselves. The expected values are the frequencies
expected, based on our null hypothesis. We total the rows and columns as
indicated. It's a good idea to make sure that the row totals equal the column
totals (both total to 400 in this example).
Using
probability theory, statisticians have devised a way to determine if a
frequency distribution differs from the expected distribution. To use this
chi-square test, we first have to calculate chi-squared.
Chi-squared = � (observed-expected)2/(expected)
We have two
classes to consider in this example, heads and tails.
Chi-squared =
(100-108)2/100 + (100-92)2/100 = (-8)2/100 + (8)2/100 = 0.64 + 0.64 = 1.28
Now we have to
consult a table of critical values of the chi-squared distribution. Here is a
portion of such a table.
df/prob.
|
0.99
|
0.95
|
0.90
|
0.80
|
0.70
|
0.50
|
0.30
|
0.20
|
0.10
|
0.05
|
1
|
0.00013
|
0.0039
|
0.016
|
0.64
|
0.15
|
0.46
|
1.07
|
1.64
|
2.71
|
3.84
|
2
|
0.02
|
0.10
|
0.21
|
0.45
|
0.71
|
1.39
|
2.41
|
3.22
|
4.60
|
5.99
|
3
|
0.12
|
0.35
|
0.58
|
1.00
|
1.42
|
2.37
|
3.66
|
4.64
|
6.25
|
7.82
|
4
|
0.3
|
0.71
|
1.06
|
1.65
|
2.20
|
3.36
|
4.88
|
5.99
|
7.78
|
9.49
|
5
|
0.55
|
1.14
|
1.61
|
2.34
|
3.00
|
4.35
|
6.06
|
7.29
|
9.24
|
11.07
|
written by Prateek Jain
ReplyDelete