Wednesday, 14 August 2013

How to Perform and Interpret Chi-Square and T-Tests

"Chi-Square and T-Test"
  
  Today's lecture by Prof Uday Bhate explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference.

When to Use Chi-Square Test for Independence

The test procedure described in this blog is appropriate when the following conditions are met:
  • The sampling method is simple random sampling.
  • Each population is at least 10 times as large as its respective sample.
  • The variables under study are each categorical.
  • If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses (2) formulate an analysis plan (3) analyze sample data  (4) interpret results.

State the Hypotheses:

Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent.
H0: Variable A and Variable B are independent.
Ha: Variable A and Variable B are not independent.
The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.
Note: Support for the alternative hypothesis suggests that the variables are related; but the relationship is not necessarily causal, in the sense that one variable "causes" the other.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.
  • Significance level: Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the chi-square test for independence to determine whether there is a significant relationship between two categorical variables.

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.
  • Degrees of freedom. The degrees of freedom (DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one categorical variable, and c is the number of levels for the other categorical variable.
  • Expected frequencies. The expected frequency counts are computed separately for each level of one categorical variable at each level of the other categorical variable. Compute r * c expected frequencies, according to the following formula.
Er,c = (nr * nc) / n
where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level c of Variable B, and n is the total sample size.
  • Test statistic. The test statistic is a chi-square random variable (Χ2) defined by the following equation.
Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and Er,c is the expected frequency count at level r of Variable A and level c of Variable B.
  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.
http://stattrek.com/chi-square-test/independence.aspx?tutorial=ap
      
T-TEST 
       The t-test assesses whether the means of two groups( Continuous Variable or Summary Variable) are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design.



Figure 1:T test depiction through various diagram

Figure 2. Idealized distributions for treated and comparison group posttest values.
Figure 2 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution -- the actual distribution would usually be depicted with a histogram or bar graph. The figure indicates where the control and treatment group means are located. The question the t-test addresses is whether the means are statistically different.

There are 3 types of t-test:
·         Single Sample t-test
·         Independent Sample t-test
·         Paired Sample t-test

Single Sample t-test :
The single-sample t-test is used when we want to know whether our sample comes from a particular population but we do not have full population information available to us. For instance, we may want to know if a particular sample of college students is similar to or different from college students in general. The single-sample t-test is used only for tests of the sample mean. All parametric statistics have a set of assumptions that must be met in order to properly use the statistics to test hypotheses. The assumptions of the single-sample t-test are listed below. 
1.      Random sampling from a defined population.
2.      Interval or ratio scale of measurement.
3.      Population is normally distributed

Independent Sample t-test:
Hypothesis testing procedure that uses separate samples for each treatment condition .Use this test when the population mean and standard deviation are unknown and 2 separate groups are being compared.
Example: Do males and females differ in terms of their exam scores? Take a sample of males and a separate sample of females and apply the hypothesis testing steps to determine if there is a significant difference in scores between the groups. Assumptions for the Independent t-Test:
1.      Independence: Observations within each sample must be independent (they don’t influence each other).
2.      Normal Distribution: The scores in each population must be normally distributed
3.      Homogeneity of Variance: The two populations must have equal variances (the degree to which the distributions are spread out is approximately equal)


Paired Sample t-test:
A paired sample t-test is used to determine whether there is a significant difference between the average values of the same measurement made under two different conditions. Both measurements are made on each unit in a sample and the test is based on the paired differences between these two values. The usual null hypothesis is that the difference in the mean values is zero. For example, the yield of two strains of barley is measured in successive years in twenty different plots of agricultural land (the units) to investigate whether one crop gives a significantly greater yield than the other, on average.
The null hypothesis for the paired sample t-test is H0: d = µ1 - µ2 = 0
where d is the mean value of the difference.
This null hypothesis is tested against one of the following alternative hypotheses, depending on the question posed: H1: d = 0
H1: d > 0
H1: d < 0
The paired sample t-test is a more powerful alternative to a two sample procedure, such as the two sample t-test, but can only be used when we have matched samples.

Session taken by: Professor Uday Bhate
Blog written by : Nitin Boratwar
Roll No. 2013179
Group no.:7

Group member
Nidhi
Nitesh Singh Patel
Pallavi bizoara
Palak Jain




No comments:

Post a Comment