"Chi-Square and T-Test"
Today's lecture by Prof Uday Bhate explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference.
When to Use Chi-Square Test
for Independence
The
test procedure described in this blog is appropriate when the following
conditions are met:- The sampling method is simple random sampling.
- Each population is at least 10 times as large as
its respective sample.
- The variables under study are each categorical.
- If sample data are displayed in a contingency table, the
expected frequency count for each cell of the table is at least 5.
State the Hypotheses:
Suppose
that Variable A has r levels, and Variable B has c levels.
The null hypothesis states
that knowing the level of Variable A does not help you predict the level of
Variable B. That is, the variables are independent.H0: Variable A and Variable B are
independent. Ha: Variable A and Variable B are not independent. |
Note: Support for the alternative hypothesis suggests that the variables are related; but the relationship is not necessarily causal, in the sense that one variable "causes" the other.
Formulate an Analysis Plan
The
analysis plan describes how to use sample data to accept or reject the null
hypothesis. The plan should specify the following elements.- Significance level: Often, researchers
choose significance levels equal
to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
- Test method. Use the chi-square test for independence to determine whether there is a significant relationship between two categorical variables.
Analyze Sample Data
Using
sample data, find the degrees of freedom, expected frequencies, test statistic,
and the P-value associated with the test statistic. The approach described in
this section is illustrated in the sample problem at the end of this lesson.- Degrees of freedom. The degrees of freedom (DF)
is equal to:
DF
= (r - 1) * (c - 1)
where r is
the number of levels for one categorical variable, and c is the number of
levels for the other categorical variable.
- Expected frequencies. The expected frequency counts are computed
separately for each level of one categorical variable at each level of the
other categorical variable. Compute r * c expected frequencies, according
to the following formula.
Er,c =
(nr * nc) / n
where
Er,c is the expected frequency count for level r of
Variable A and level c of Variable B, nr is the total number
of sample observations at level r of Variable A, nc is the total number of
sample observations at level c of Variable B, and n is the
total sample size.
- Test statistic. The test statistic is a chi-square random variable (Χ2)
defined by the following equation.
Χ2 =
Σ [ (Or,c - Er,c)2 / Er,c ]
where
Or,c is the observed frequency count at level r of
Variable A and level c of Variable B, and Er,c is the
expected frequency count at level r of Variable A and
level c of Variable B.
- P-value. The
P-value is the probability of observing a sample statistic as extreme as
the test statistic. Since the test statistic is a chi-square, use
the Chi-Square Distribution Calculator to assess the probability associated with the test
statistic. Use the degrees of freedom computed above.
Interpret Results
If
the sample findings are unlikely, given the null hypothesis, the researcher
rejects the null hypothesis. Typically, this involves comparing the P-value to
the significance level,
and rejecting the null hypothesis when the P-value is less than the
significance level.http://stattrek.com/chi-square-test/independence.aspx?tutorial=ap
T-TEST
The t-test assesses whether the means of two
groups( Continuous Variable or Summary Variable) are statistically different
from each other. This analysis is appropriate whenever you want to compare the
means of two groups, and especially appropriate as the analysis for the posttest-only
two-group randomized experimental design.
Figure 1:T test depiction through various diagram
Figure 2. Idealized distributions for
treated and comparison group posttest values.
|
Figure 2 shows the distributions for the
treated (blue) and control (green) groups in a study. Actually, the figure
shows the idealized distribution -- the actual distribution would usually be
depicted with a histogram or bar
graph. The figure indicates where the control and
treatment group means are located. The question the t-test addresses is whether
the means are statistically different.
There are 3 types of
t-test:
·
Single
Sample t-test
·
Independent
Sample t-test
·
Paired
Sample t-test
Single Sample t-test :
The single-sample t-test is used when we want
to know whether our sample comes from a particular population but we
do not have full population information available to us. For instance, we may
want to know if a particular sample of college students is similar to or
different from college students in general. The single-sample t-test is
used only for tests of the sample mean. All parametric statistics have
a set of assumptions that must be met in order to properly use the
statistics to test hypotheses. The assumptions of the single-sample t-test are
listed below.
1.
Random
sampling from a defined population.
2.
Interval
or ratio scale of measurement.
3.
Population
is normally distributed
Independent Sample
t-test:
Hypothesis testing procedure that uses separate samples for
each treatment condition .Use this test when the population mean and standard
deviation are unknown and 2 separate groups are being compared.
Example: Do males and females differ in terms of their exam
scores? Take a sample of males and a separate sample of females and apply the
hypothesis testing steps to determine if there is a significant difference in
scores between the groups. Assumptions for the Independent t-Test:
1.
Independence:
Observations within each sample must be independent (they don’t influence each
other).
2.
Normal
Distribution: The scores in each population must be normally distributed
3.
Homogeneity
of Variance: The two populations must have equal variances (the degree to which
the distributions are spread out is approximately equal)
Paired Sample t-test:
A
paired sample t-test is used to determine whether there is a significant
difference between the average values of the same measurement made under two
different conditions. Both measurements are made on each unit in a sample and
the test is based on the paired differences between these two values. The usual
null hypothesis is that the difference in the mean values is zero. For example,
the yield of two strains of barley is measured in successive years in twenty
different plots of agricultural land (the units) to investigate whether one
crop gives a significantly greater yield than the other, on average.
The
null hypothesis for the paired sample t-test is H0: d = µ1 - µ2 = 0
where
d is the mean value of the difference.
This
null hypothesis is tested against one of the following alternative hypotheses,
depending on the question posed: H1: d = 0
H1:
d > 0
H1:
d < 0
The
paired sample t-test is a more powerful alternative to a two sample procedure,
such as the two sample t-test, but can only be used when we have matched
samples.
Session taken by: Professor Uday Bhate
Blog written by : Nitin Boratwar
Roll No. 2013179
Group no.:7
Group no.:7
Group member
Nidhi
Nitesh Singh Patel
Pallavi bizoara
Palak Jain
No comments:
Post a Comment