Applied Business Statistics

Summary of 13th August Session

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's tdistribution.

Among the most frequently used t-tests are:

· A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis.

· A two-sample location test of the null hypothesis that the means of two normally distributed populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical unitsunderlying the two samples being compared are non-overlapping.[6]

· A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test:[6][7] see paired difference test.

· A test of whether the slope of a regression line differs significantly from 0.

Types Of T-test

· Single Sample T-test- The one-sample t-test compares the mean score of a sample to a known value, usually the population mean (the average for the outcome of some population of interest). The basic idea of the test is a comparison of the average of the sample (observed average) and the population (expected average), with an adjustment for the number of cases in the sample and the standard deviation of the average

Independent Sample T-test-The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the treatment (or lack of treatment) and not to other factors. This is not the case if you compare average income for males and females. A person is not randomly assigned to be a male or female. In such situations, you should ensure that differences in other factors are not masking or enhancing a significant difference in means. Differences in average income may be influenced by factors such as education

Example. Patients with high blood pressure are randomly assigned to a placebo group and a treatment group. The placebo subjects receive an inactive pill, and the treatment subjects receive a new drug that is expected to lower blood pressure. After the subjects are treated for two months, the two-sample t test is used to compare the average blood pressures for the placebo group and the treatment group. Each patient is measured once and belongs to one group.

Paired Sample T-test- procedure compares the means of two variables for a single group. The procedure computes the differences between values of the two variables for each case and tests whether the average differs from 0.

Example. In a study on high blood pressure, all patients are measured at the beginning of the study, given a treatment, and measured again. Thus, each subject has two measures, often called before and after measures. An alternative design for which this test is used is a matched-pairs or case-control study, in which each record in the data file contains the response for the patient and also for his or her matched control subject. In a blood pressure study, patients and controls might be matched by age (a 75-year-old patient with a 75-year-old control group member

Frequency Distributions

One important set of statistical tests allows us to test for deviations of observed frequencies from expected frequencies. To introduce these tests, we will start with a simple, non-biological example. We want to determine if a coin is fair. In other words, are the odds of flipping the coin heads-up the same as tails-up. We collect data by flipping the coin 200 times. The coin landed heads-up 108 times and tails-up 92 times. At first glance, we might suspect that the coin is biased because heads resulted more often than than tails. However, we have a more quantitative way to analyze our results, a chi-squared test.

To perform a chi-square test (or any other statistical test), we first must establish our null hypothesis. In this example, our null hypothesis is that the coin should be equally likely to land head-up or tails-up every time. The null hypothesis allows us to state expected frequencies. For 200 tosses, we would expect 100 heads and 100 tails.

The next step is to prepare a table as follows.

	Heads	Tails	Total
Observed	108	92	200
Expected	100	100	200
Total	208	192	400

The Observed values are those we gather ourselves. The expected values are the frequencies expected, based on our null hypothesis. We total the rows and columns as indicated. It's a good idea to make sure that the row totals equal the column totals (both total to 400 in this example).

Using probability theory, statisticians have devised a way to determine if a frequency distribution differs from the expected distribution. To use this chi-square test, we first have to calculate chi-squared.

Chi-squared = � (observed-expected)2/(expected)

We have two classes to consider in this example, heads and tails.

Chi-squared = (100-108)2/100 + (100-92)2/100 = (-8)2/100 + (8)2/100 = 0.64 + 0.64 = 1.28

Now we have to consult a table of critical values of the chi-squared distribution. Here is a portion of such a table.

df/prob.	0.99	0.95	0.90	0.80	0.70	0.50	0.30	0.20	0.10	0.05
1	0.00013	0.0039	0.016	0.64	0.15	0.46	1.07	1.64	2.71	3.84
2	0.02	0.10	0.21	0.45	0.71	1.39	2.41	3.22	4.60	5.99
3	0.12	0.35	0.58	1.00	1.42	2.37	3.66	4.64	6.25	7.82
4	0.3	0.71	1.06	1.65	2.20	3.36	4.88	5.99	7.78	9.49
5	0.55	1.14	1.61	2.34	3.00	4.35	6.06	7.29	9.24	11.07

The left-most column list the degrees of freedom (df). We determine the degrees of freedom by subtracting one from the number of classes. In this example, we have two classes (heads and tails), so our degrees of freedom is 1. Our chi-squared value is 1.28. Move across the row for 1 df until we find critical numbers that bound our value. In this case, 1.07 (corresponding to a probability of 0.30) and 1.64 (corresponding to a probability of 0.20). We can interpolate our value of 1.24 to estimate a probability of 0.27. This value means that there is a 73% chance that our coin is biased. In other words, the probability of getting 108 heads out of 200 coin tosses with a fair coin is 27%. In biological applications, a probability � 5% is usually adopted as the standard. This value means that the chances of an observed value arising by chance is only 1 in 20. Because the chi-squared value we obtained in the coin example is greater than 0.05 (0.27 to be precise), we accept the null hypothesis as true and conclude that our coin is fair.

We have gathered data to see if the percentage of elevated cholesterol (> 220 ppm) is the same in girls and boys. Our sample is all the sixth-graders in the state of Maine.

The resulting data are as follows: of 7,532 boys, 397 had elevated cholesterol; of 7,955 girls, 242 had elevated cholesterol.

Written by Prateek Jain

Group Members

PRIYANKA SUDAN

PRANSHU AGGARWAL

NISHANT R

POOJA SHUKLA

PRATEEK JAIN

Wednesday, 14 August 2013

Summary of 13th August Session

Among the most frequently used t-tests are:

· A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis.

· A test of whether the slope of a regression line differs significantly from 0.

Types Of T-test

Frequency Distributions

The next step is to prepare a table as follows.

Heads

Tails

Total

Observed

108

92

200

Expected

100

100

200

Total

208

192

400

The Observed values are those we gather ourselves. The expected values are the frequencies expected, based on our null hypothesis. We total the rows and columns as indicated. It's a good idea to make sure that the row totals equal the column totals (both total to 400 in this example).

Using probability theory, statisticians have devised a way to determine if a frequency distribution differs from the expected distribution. To use this chi-square test, we first have to calculate chi-squared.

Chi-squared = � (observed-expected)2/(expected)

We have two classes to consider in this example, heads and tails.

Chi-squared = (100-108)2/100 + (100-92)2/100 = (-8)2/100 + (8)2/100 = 0.64 + 0.64 = 1.28

Now we have to consult a table of critical values of the chi-squared distribution. Here is a portion of such a table.

df/prob.

0.99

0.95

0.90

0.80

0.70

0.50

0.30

0.20

0.10

0.05

1

0.00013

0.0039

0.016

0.64

0.15

0.46

1.07

1.64

2.71

3.84

2

0.02

0.10

0.21

0.45

0.71

1.39

2.41

3.22

4.60

5.99

3

0.12

0.35

0.58

1.00

1.42

2.37

3.66

4.64

6.25

7.82

4

0.3

0.71

1.06

1.65

2.20

3.36