Introduction-there
were 2 sessions taken on Chi-Square and T-tests , both the methods deal with
the comparison of two sample spaces , Chi square is to be used for discrete
variables whereas T-test can be used for a set of continuous variables. The
Chi-Square distribution is merely the distribution of the sum of the squares of
a set of normally distributed random variables. Its value stems from the fact
that the sum of random variables from any distribution can be closely
approximated by a normal distribution as the sum includes a greater and greater
number of samples. Thus the test is widely applicable for all distributions.
A chi-squared
test, also referred to as chi-square
test or x^2 test, is any statistical hypothesis test in
which the sampling distributionof
the test statistic is a chi-squared distribution when the null hypothesis is
true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling
distribution (if the null hypothesis is true) can be made to approximate a chi-squared
distribution as closely as desired by making the sample size large enough.
The null hypothesis of independence is rejected if X^2 is
large, because this means that observed frequencies and expected frequencies
are far apart. The chi-square curve is used to judge whether the calculated
test statistic is large enough. We reject H0 if the
test statistic is large enough so that the area beyond it (under the chi-square
curve with (r-1)(c-1) degrees of freedom) is less than .05.
The P-value is the area greater
than X^2 under the chi-square curve with (r-1)(c-1) degrees of
freedom.
Distributions
where Chi square can be used-
Discrete uniform distributionOther distributions
When testing whether observations are random variables whose distribution belongs to a given family of distributions, the "theoretical frequencies" are calculated using a distribution from that family fitted in some standard way. The reduction in the degrees of freedom is calculated as p=s+1, where is the number of co-variates used in fitting the distribution. For instance, when checking a three-co-variate Weibull distribution, p=4, and when checking a normal distribution (where the parameters are mean and standard deviation), p=3. In other words, there will be n-p degrees of freedom, where n is the number of categories.
It should be noted that the degrees of freedom are not based on the number of observations as with a Student's t or F-distribution. For example, if testing for a fair, six-sided dice, there would be five degrees of freedom because there are six categories/parameters (each number). The number of times the die is rolled will have absolutely no effect on the number of degrees of freedom.
Goodness of fit
For example, to test the
hypothesis that a random sample of 100 people has been drawn from a population
in which men and women are equal in frequency, the observed number of men and
women would be compared to the theoretical frequencies of 50 men and 50 women.
If there were 44 men in the sample and 56 women, then
If the null hypothesis is true
(i.e., men and women are chosen with equal probability), the test statistic
will be drawn from a chi-squared distribution with one degree of freedom. If the male frequency
is known, then the female frequency is determined.
Consultation of the chi-squared distribution for 1 degree
of freedom shows that the probability of
observing this difference (or a more extreme difference than this) if men and
women are equally numerous in the population is approximately 0.23. This
probability is higher than conventional criteria for statistical significance (0.001–0.05),
so normally we would not reject the null hypothesis that the number of men in
the population is the same as the number of women (i.e., we would consider our
sample within the range of what we'd expect for a 50/50 male/female ratio.)
T square
A t-test is any statistical hypothesis test in
which the test statistic follows
a Student's t distribution if
the null hypothesis is
supported. It can be used to determine if two sets of data are significantly
different from each other, and is most commonly applied when the test statistic
would follow a normal distribution if
the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate
based on the data, the test statistic (under certain conditions) follows a
Student's tdistribution.
·
A one-sample location test of whether the mean of a normally
distributed population has a value specified in a null hypothesis.
·
A two-sample location test of the
null hypothesis that the means of two normally distributed populations
are equal. All such tests are usually called Student's t-tests,
though strictly speaking that name should only be used if the variances of the two populations are also assumed to be
equal; the form of the test used when this assumption is dropped is sometimes
called Welch's t-test.
These tests are often referred to as "unpaired" or "independent
samples" t-tests, as they are typically applied when the statistical unitsunderlying the two samples being compared are
non-overlapping.[6]
·
A test of the null hypothesis
that the difference between two responses measured on the same statistical unit
has a mean value of zero. For example, suppose we measure the size of a cancer
patient's tumor before and after a treatment. If the treatment is effective, we
expect the tumor size for many of the patients to be smaller following the
treatment. This is often referred to as the "paired" or
"repeated measures" t-test:[6][7] see paired difference test.
·
A test of whether the slope of a regression line differs significantly from 0.
Types Of T-test
·
Single
Sample T-test- The one-sample t-test compares the
mean score of a sample to a known value, usually the population mean (the
average for the outcome of some population of interest). The basic idea of the
test is a comparison of the average of the sample (observed average) and the
population (expected average), with an adjustment for the number of cases in
the sample and the standard deviation of the average. Working through an
example can help to highlight the issues involved and demonstrate how to
conduct a t-test using actual data.
·
Independent
Sample T-test-The Independent-Samples T Test procedure compares
means for two groups of cases. Ideally, for this test, the subjects should be
randomly assigned to two groups, so that any difference in response is due to
the treatment (or lack of treatment) and not to other factors. This is not the
case if you compare average income for males and females. A person is not
randomly assigned to be a male or female. In such situations, you should ensure
that differences in other factors are not masking or enhancing a significant
difference in means. Differences in average income may be influenced by factors
such as education
Example. Patients with high blood pressure are randomly assigned
to a placebo group and a treatment group. The placebo subjects receive an
inactive pill, and the treatment subjects receive a new drug that is expected
to lower blood pressure. After the subjects are treated for two months, the
two-sample t test is used to compare the average blood
pressures for the placebo group and the treatment group. Each patient is
measured once and belongs to one group.
·
Paired
Sample T-test- procedure compares the means of two variables for a
single group. The procedure computes the differences between values of the two
variables for each case and tests whether the average differs from 0.
Example. In a study on high blood pressure, all patients are
measured at the beginning of the study, given a treatment, and measured again.
Thus, each subject has two measures, often called before and after measures.
An alternative design for which this test is used is a matched-pairs or
case-control study, in which each record in the data file contains the response
for the patient and also for his or her matched control subject. In a blood
pressure study, patients and controls might be matched by age (a 75-year-old
patient with a 75-year-old control group member)
Participants
Poulami Sarkar
Pragya Singh
Priyanka Doshi
Nilay Kohaley
Pawan Agarwal
.
This comment has been removed by the author.
ReplyDeleteThis blog has been written by Poulami Sarkar
ReplyDelete