Wednesday, 14 August 2013

METHODS OF COMPARISONS OF SAMPLE SPACES IN STATISTICS

Introduction-there were 2 sessions taken on Chi-Square and T-tests , both the methods deal with the comparison of two sample spaces , Chi square is to be used for discrete variables whereas T-test can be used for a set of continuous variables. The Chi-Square distribution is merely the distribution of the sum of the squares of a set of normally distributed random variables. Its value stems from the fact that the sum of random variables from any distribution can be closely approximated by a normal distribution as the sum includes a greater and greater number of samples. Thus the test is widely applicable for all distributions.
A chi-squared test, also referred to as chi-square test or x^2 test, is any statistical hypothesis test in which the sampling distributionof the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.
The null hypothesis of independence is rejected if X^2 is large, because this means that observed frequencies and expected frequencies are far apart. The chi-square curve is used to judge whether the calculated test statistic is large enough. We reject H0 if the test statistic is large enough so that the area beyond it (under the chi-square curve with (r-1)(c-1) degrees of freedom) is less than .05.
The P-value is the area greater than X^2 under the chi-square curve with (r-1)(c-1) degrees of freedom.

Distributions where Chi square can be used-
Discrete uniform distribution
In this case N observations are divided among n cells. A simple application is to test the hypothesis that, in the general population, values would occur in each cell with equal frequency. The "theoretical frequency" for any cell (under the null hypothesis of a discrete uniform distribution) is thus calculated as

and the reduction in the degrees of freedom is p=1, notionally because the observed frequencies O_i are constrained to sum to N.
Other distributions
When testing whether observations are random variables whose distribution belongs to a given family of distributions, the "theoretical frequencies" are calculated using a distribution from that family fitted in some standard way. The reduction in the degrees of freedom is calculated as p=s+1, where s is the number of co-variates used in fitting the distribution. For instance, when checking a three-co-variate Weibull distribution, p=4, and when checking a normal distribution (where the parameters are mean and standard deviation), p=3. In other words, there will be n-p degrees of freedom, where n is the number of categories.
It should be noted that the degrees of freedom are not based on the number of observations as with a Student's t or F-distribution. For example, if testing for a fair, six-sided dice, there would be five degrees of freedom because there are six categories/parameters (each number). The number of times the die is rolled will have absolutely no effect on the number of degrees of freedom.
Goodness of fit
For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then
If the null hypothesis is true (i.e., men and women are chosen with equal probability), the test statistic will be drawn from a chi-squared distribution with one degree of freedom. If the male frequency is known, then the female frequency is determined.

Consultation of the chi-squared distribution for 1 degree of freedom shows that the probability of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for statistical significance (0.001–0.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e., we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)


T square

 t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's tdistribution.

Among the most frequently used t-tests are:
·         A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis.
·         A two-sample location test of the null hypothesis that the means of two normally distributed populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical unitsunderlying the two samples being compared are non-overlapping.[6]
·         A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test:[6][7] see paired difference test.
·         A test of whether the slope of a regression line differs significantly from 0.
Types Of T-test
·         Single Sample T-test- The one-sample t-test compares the mean score of a sample to a known value, usually the population mean (the average for the outcome of some population of interest). The basic idea of the test is a comparison of the average of the sample (observed average) and the population (expected average), with an adjustment for the number of cases in the sample and the standard deviation of the average. Working through an example can help to highlight the issues involved and demonstrate how to conduct a t-test using actual data.
·         Independent Sample T-test-The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the treatment (or lack of treatment) and not to other factors. This is not the case if you compare average income for males and females. A person is not randomly assigned to be a male or female. In such situations, you should ensure that differences in other factors are not masking or enhancing a significant difference in means. Differences in average income may be influenced by factors such as education 
Example. Patients with high blood pressure are randomly assigned to a placebo group and a treatment group. The placebo subjects receive an inactive pill, and the treatment subjects receive a new drug that is expected to lower blood pressure. After the subjects are treated for two months, the two-sample t test is used to compare the average blood pressures for the placebo group and the treatment group. Each patient is measured once and belongs to one group.
·         Paired Sample T-test- procedure compares the means of two variables for a single group. The procedure computes the differences between values of the two variables for each case and tests whether the average differs from 0.
                Example. In a study on high blood pressure, all patients are measured at the beginning of the study, given a treatment, and measured again. Thus, each subject has two measures, often called before and after measures. An alternative design for which this test is used is a matched-pairs or case-control study, in which each record in the data file contains the response for the patient and also for his or her matched control subject. In a blood pressure study, patients and controls might be matched by age (a 75-year-old patient with a 75-year-old control group member)


Participants
  Poulami Sarkar
  Pragya Singh
  Priyanka Doshi
  Nilay Kohaley
   Pawan Agarwal



.



SESSION 11 AND SESSION 12

What is a variable?

variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item. Age, sex, business income and expenses, country of birth, capital expenditure, class grades, eye colour and vehicle type are examples of variables. It is called a variable because the value may vary between data units in a population, and may change in value over time. 

For example; 'income' is a variable that can vary between data units in a population (i.e. the people or businesses being studied may not have the same incomes) and can also vary over time for each data unit (i.e. income can go up or down). 


What are the types of variables?


There are different ways variables can be described according to the ways they can be studied, measured, and presented.

Numeric variables have values that describe a measurable quantity as a number, like 'how many' or 'how much'. Therefore numeric variables are quantitative variables.

Numeric variables may be further described as either continuous or discrete:
  • continuous variable is a numeric variable. Observations can take any value between a certain set of real numbers. The value given to an observation for a continuous variable can include values as small as the instrument of measurement allows. Examples of continuous variables include height, time, age, and temperature.
  • discrete variable is a numeric variable. Observations can take a value based on a count from a set of distinct whole values. A discrete variable cannot take the value of a fraction between one value and the next closest value. Examples of discrete variables include the number of registered cars, number of business locations, and number of children in a family, all of of which measured as whole units (i.e. 1, 2, 3 cars).

The data collected for a numeric variable are quantitative data.



NULL HYPOTHESIS :

When there is no Relation between Two Variables

Chi-Square

chi-squared test, also referred to as chi-square test or χw² test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.

Chi Square :((O-E)^2)/E

O = Observed frequency
E = Expected frequency
http://psychology.ucdavis.edu/sommerb/sommerdemo/stat_inf/gifs/sigma.png = Sum of above across all cells
6. Find the probability value (p) associated with the obtained Chi-square statistic
a.         Calculate degrees of freedom (df)
df = (# rows - 1)(# columns - 1)
b.         Use the abbreviated table of Critical Values for Chi-square test to find the p value.

T- test

t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's t distribution.

Independent samples

The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample t-test to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational.

Paired samples

Paired samples t-tests typically consist of a sample of matched pairs of similar units, or one group of units that has been tested twice (a "repeated measures" t-test).
A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication. By comparing the same patient's numbers before and after treatment, we are effectively using each patient as their own control. That way the correct rejection of the null hypothesis (here: of no difference made by the treatment) can become much more likely, with statistical power increasing simply because the random between-patient variation has now been eliminated. Note however that an increase of statistical power comes at a price: more tests are required, each subject having to be tested twice. Because half of the sample now depends on the other half, the paired version of Student's t-test has only 'n/2 - 1' degrees of freedom (with 'n' being the total number of observations). Pairs become individual test units, and the sample has to be doubled to achieve the same number of degrees of freedom
How It Works
  1. The null hypothesis is that the two population means are equal to each other. To test the null hypothesis, you need to calculate the following values: xs.gif (974 bytes)(the means of the two samples),s12s22 (the variances of the two samples), n1n2 (the sample sizes of the two samples), and k (the degrees of freedom).
T-test formula
  1. Compute the t-statistic.
T-test statistic
  1. Compare the calculated t-value, with k degrees of freedom, to the critical t value from the tdistribution table at the chosen confidence level and decide whether to accept or reject the null hypothesis.
*Reject the null hypothesis when: calculated t-value > critical t-value
  1. Note: This procedure can be used when the distribution variances from the two populations are not equal, and the sample sizes are not equal.



BY :  Priyadarshi Tandon(2013211)

GROUP MEMBERS : 

1) PRIYADARSHI TANDON
2) P.PRIYATHAM KIREETI 
3) NISHIDH LAD
4) P.S.V.P.S.G. KARTHEEKI
5) P. KALYANI

How to Perform and Interpret Chi-Square and T-Tests

"Chi-Square and T-Test"
  
  Today's lecture by Prof Uday Bhate explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference.

When to Use Chi-Square Test for Independence

The test procedure described in this blog is appropriate when the following conditions are met:
  • The sampling method is simple random sampling.
  • Each population is at least 10 times as large as its respective sample.
  • The variables under study are each categorical.
  • If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.
This approach consists of four steps: (1) state the hypotheses (2) formulate an analysis plan (3) analyze sample data  (4) interpret results.

State the Hypotheses:

Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent.
H0: Variable A and Variable B are independent.
Ha: Variable A and Variable B are not independent.
The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.
Note: Support for the alternative hypothesis suggests that the variables are related; but the relationship is not necessarily causal, in the sense that one variable "causes" the other.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.
  • Significance level: Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the chi-square test for independence to determine whether there is a significant relationship between two categorical variables.

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.
  • Degrees of freedom. The degrees of freedom (DF) is equal to:
DF = (r - 1) * (c - 1)
where r is the number of levels for one categorical variable, and c is the number of levels for the other categorical variable.
  • Expected frequencies. The expected frequency counts are computed separately for each level of one categorical variable at each level of the other categorical variable. Compute r * c expected frequencies, according to the following formula.
Er,c = (nr * nc) / n
where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level c of Variable B, and n is the total sample size.
  • Test statistic. The test statistic is a chi-square random variable (Χ2) defined by the following equation.
Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and Er,c is the expected frequency count at level r of Variable A and level c of Variable B.
  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.
http://stattrek.com/chi-square-test/independence.aspx?tutorial=ap
      
T-TEST 
       The t-test assesses whether the means of two groups( Continuous Variable or Summary Variable) are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design.



Figure 1:T test depiction through various diagram

Figure 2. Idealized distributions for treated and comparison group posttest values.
Figure 2 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution -- the actual distribution would usually be depicted with a histogram or bar graph. The figure indicates where the control and treatment group means are located. The question the t-test addresses is whether the means are statistically different.

There are 3 types of t-test:
·         Single Sample t-test
·         Independent Sample t-test
·         Paired Sample t-test

Single Sample t-test :
The single-sample t-test is used when we want to know whether our sample comes from a particular population but we do not have full population information available to us. For instance, we may want to know if a particular sample of college students is similar to or different from college students in general. The single-sample t-test is used only for tests of the sample mean. All parametric statistics have a set of assumptions that must be met in order to properly use the statistics to test hypotheses. The assumptions of the single-sample t-test are listed below. 
1.      Random sampling from a defined population.
2.      Interval or ratio scale of measurement.
3.      Population is normally distributed

Independent Sample t-test:
Hypothesis testing procedure that uses separate samples for each treatment condition .Use this test when the population mean and standard deviation are unknown and 2 separate groups are being compared.
Example: Do males and females differ in terms of their exam scores? Take a sample of males and a separate sample of females and apply the hypothesis testing steps to determine if there is a significant difference in scores between the groups. Assumptions for the Independent t-Test:
1.      Independence: Observations within each sample must be independent (they don’t influence each other).
2.      Normal Distribution: The scores in each population must be normally distributed
3.      Homogeneity of Variance: The two populations must have equal variances (the degree to which the distributions are spread out is approximately equal)


Paired Sample t-test:
A paired sample t-test is used to determine whether there is a significant difference between the average values of the same measurement made under two different conditions. Both measurements are made on each unit in a sample and the test is based on the paired differences between these two values. The usual null hypothesis is that the difference in the mean values is zero. For example, the yield of two strains of barley is measured in successive years in twenty different plots of agricultural land (the units) to investigate whether one crop gives a significantly greater yield than the other, on average.
The null hypothesis for the paired sample t-test is H0: d = µ1 - µ2 = 0
where d is the mean value of the difference.
This null hypothesis is tested against one of the following alternative hypotheses, depending on the question posed: H1: d = 0
H1: d > 0
H1: d < 0
The paired sample t-test is a more powerful alternative to a two sample procedure, such as the two sample t-test, but can only be used when we have matched samples.

Session taken by: Professor Uday Bhate
Blog written by : Nitin Boratwar
Roll No. 2013179
Group no.:7

Group member
Nidhi
Nitesh Singh Patel
Pallavi bizoara
Palak Jain




Tuesday, 13 August 2013

                        SESSION 11 AND SESSION 12
Types of Variables
Variables may have two types, continuous and categorical:
Two category variabes
Continuous variables -- A continuous variable has numeric values such as 1, 2, 3.14, -5, etc. The relative magnitude of the values is significant
(e.g., a value of 2 indicates twice the magnitude of 1). Examples of continuous variables are blood pressure, height, weight, income, age, and
probability of illness. Some programs call continuous variables “ordered” or “monotonic” variables.

Categorical variables -- A categorical variable has values that function as labels rather than as numbers. Some programs call categorical variables
 “nominal” variables. For example, a categorical variable for gender might use the value 1 for male and 2 for female. The actual magnitude of the
value is not significant; coding male as 7 and female as 3 would work just as well. As another example, marital status might be coded as 1 for single,
 2 for married, 3 for divorced and 4 for widowed. DTREG allows you to use non-numeric (character string) values for categorical variables. So your
 dataset could have the strings “Male” and “Female” or “M” and “F” for a categorical gender variable. Because categorical values are stored and
compared as string values, a categorical value of 001 is different than a value of 1. In contrast, values of 001 and 1 would be equal for continuous variables.

NULL HYPOTHESIS :There is no relation between two variables.

A chi-squared test, also referred to as chi-square test or χw² test, is any statistical hypothesis test in which the sampling distribution of the test statistic
 is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that
 the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size
 large enough.

Chi Square :((O-E)^2)/E

So you accept or reject the hypothesis.

Degreee of freedom: How many number is needed to predict other varaible.
Formula for degree of freedom =(row-1)(column-1)

Confidence is 95% for normal return business
Confidence is 100% for critical decision.

We can see this in chi square table.
Chi square table:
With the help of chi square table and degree of freedom we can find the value of χw² with 95 % confidence.
If the value we calculate is greater than value of chi square table we will reject that value otherwise we will accept it.

Gender Females Males Total
3 2 3 5
5 2 1 3
6 1 1 2
Total 5 5 10


Gender Females Males Total
3 2.5 2.5 5
5 1.5 1.5 3
6 1 1 2
Total 10


Gender Females Males Total
3 .1 .1 .2
5 .16 .16 .32
6 0 0
Total .526


Degree of freedom=2

So we can check it through the Chi square table with the help of degree of freedom and confidence required is 95%.The calculated
value we get is greater than the Chi square table therefore we reject that hypothesis.








T table is used for continuous variable

1 Single sample T-test:
  Example :To measure the tolerance of bolt,first we set a benchmark and and compare the significant value with it.Accordingly we will
  select whether to select the lot or not
2 Independent sample T-test:
  When data are taken from the same population or when means of variable are same but data are not same.
  Example :If we take the different cities and calculate their respective means and the do Independent T test to find out which city is performing well.


  CITIES MEAN DEGREE OF FREEDOM SIFNIFICANT  VALUE
  DELHI        321.998        15        .602
  MUMBAI 322.0145 15        .080
  PUNE        321.9983 15        .522
  BANGLORE 321.9954 15        .020
  JAIPUR 322.0042 15        .085
  NOIDA        322.0025 15        .274
  CALCUTTA 322.0062 15        .018
  CHANDIGARH 321.9967 15        .101

  If we set the benchmark as 322 then the 4 cities are performing well and 4 cities are not performing well.
  With the help of significant value we find that means of Calcutta,Mumbai and Banglore are not same.
  So we reach to the conclusion that Mumbai and Calcutta are performing well as there value is greater
  than benchmark set ie. 322 but Banglore is not performing well as its value is less than 322.


 
3 Paired sample T-test:
  Example :In medicine some parameters are measured before supply and some measures are measured after supply.


BY :  RAGHAV KABRA(2013217)
GROUP MEMBERS : RAGHAV KABRA(2013217)
                                    ABHISHEK PANWALA(2013190)    
                                      PARITA MANDHANA(2013192)
                                      POORVA SABOO(2013200)
                                      PAREENA NEEMA(2013191)






Applied_Business_Statistics 13_08_2013

Calculating Chi – Square manually:

1. Draw a contingency table.
2. Enter the Observed frequencies or counts (O)
3. Calculate totals (in the margins).
4.
Calculate the Expected frequencies (E)

a.
For each cell: Column total/N times Row total

b.
Write the Expected frequency into the appropriate box in the table.
CHECK: Expected frequencies (E) marginal totals are the same as for Observed frequencies (O)
Eyeball the contingency table, noting where the differences between O (observed) and E (Expected) values occur. If they are close to each other, the levels of the independent (predictor) variable are not having an effect.
5. Calculate Chi-square statistic



O = Observed frequency
E = Expected frequency
http://psychology.ucdavis.edu/sommerb/sommerdemo/stat_inf/gifs/sigma.png = Sum of above across all cells
6. Find the probability value (p) associated with the obtained Chi-square statistic
a.         Calculate degrees of freedom (df)
df = (# rows - 1)(# columns - 1)
b.         Use the abbreviated table of Critical Values for Chi-square test to find the p value.
We use this test for comparing the means of two samples (or treatments), even if they have different numbers of replicates. In simple terms, the t-test compares the actual difference between two means in relation to the variation in the data (expressed as the standard deviation of the difference between the means).





                                                                           
                                                                                 T- test

There are two types of t-test:
1.       Paired sample t-test
2.       Independent sample t-test
3.       Single sample t - test


Procedure to conduct a t – test:
1. We need to construct a null hypothesis - an expectation - which the experiment was designed to test. For example:

If we are analysing the heights of pine trees growing in two different locations, a suitable null hypothesis would be that there is no difference in height between the two locations. The student's t-test will tell us if the data are consistent with this or depart significantly from this expectation. [NB: the null hypothesis is simply something to test against. We might well expect a difference between trees growing in a cold, windy location and those in a warm, protected location, but it would be difficult to predict the scale of that difference - twice as high? Three times as high? So it is sensible to have a null hypothesis of "no difference" and then to see if the data depart from this.
2. List the data for sample 1.

3. List the data for sample 2.

4. Record the number (n) of replicates for each sample (the number of replicates for sample 1 being termed n1 and the number for sample 2 being termed n2)

5. Calculate mean of each sample (1 and 2).

6. Calculate s 2 for each sample; call these s 12 and s 22 [Note that actually we are using S2 as an estimate of s 2 in each case]

5. Calculate the variance of the difference between the two means (sd2)

6. Calculate sd (the square root of sd2)

7. Calculate the t value.



(When doing this, transpose 1 and 2 if 2 > 1 so that you always get a positive value)

8. Enter the t-table at (n1 + n2 -2) degrees of freedom; choose the level of significance required (normally p = 0.05) and read the tabulated t value.

9. If the calculated t value exceeds the tabulated value we say that the means are significantly different at that level of probability.

10. A significant difference at p = 0.05 means that if the null hypothesis were correct (i.e. the samples or treatments do not differ) then we would expect to get a t value as great as this on less than 5% of occasions. So we can be reasonably confident that the samples/treatments do differ from one another, but we still have nearly a 5% chance of being wrong in reaching this conclusion.

Now compare your calculated t value with tabulated values for higher levels of significance (e.g. p = 0.01). These levels tell us the probability of our conclusion being correct. For example, if our calculated t value exceeds the tabulated value for p = 0.01, then there is a 99% chance of the means being significantly different (and a 99.9% chance if the calculated t value exceeds the tabulated value for p = 0.001). By convention, we say that a difference between means at the 95% level is "significant", a difference at 99% level is "highly significant" and a difference at 99.9% level is "very highly significant".

What does this mean in "real" terms? Statistical tests allow us to make statements with a degree of precision, but cannot actually prove or disprove anything. A significant result at the 95% probability level tells us that our data are good enough to support a conclusion with 95% confidence (but there is a 1 in 20 chance of being wrong). In biological work we accept this level of significance as being reasonable.

Written By: Priyesh Bhadauriya
Priyesh Bhadauriya 2013214

Group members:
Nikita Agarwal 2013171
Nimisha Agarwal 2013173
Parth Mehta 2013193
Nihal Moidu