Applied Business Statistics: Statistics

In today's class we discussed about mean, median, mode, T test, correlation, regression and cross tab. I would brief you about the following topics.

Mean: The mean may often be confused with the median, mode or range. The mean is the arithmetic average of a set of values, or distribution; however, forskewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode). For example, mean income is skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income, and favors the larger number of people with lower incomes. The median or mode are often more intuitive measures of such data.

For example, the arithmetic mean of five values: 4, 36, 45, 50, 75 is

$\frac{4 + 36 + 45 + 50 + 75}{5} = \frac{210}{5} = 42.$

Median : Median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 5, 9} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values, which corresponds to interpreting the median as the fully trimmed mid-range. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown poin t of 50%: so long as no more than half the data is contaminated, the median will not give an arbitrarily large result.

Mode: The mode of a set of data values is the value(s) that occurs most often.The mode has applications in printing. For example, it is important to print more of the most popular books; because printing different books in equal numbers would cause a shortage of some books and an oversupply of others.Likewise, the mode has applications in manufacturing. For example, it is important to manufacture more of the most popular shoes; because manufacturing different shoes in equal numbers would cause a shortage of some shoes and an oversupply of others.

For example,

48 44 48 45 42 49 48

The mode is 48 as it appears most often

T test: A t-test is any statistical hypothesis tes t in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's t distribution

Uses:

A one-sample location test of whether the mean of a population has a value specified in a null hypothesis.

A two-sample location test of the null hypothesis that the means of two populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.

A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test

A test of whether the slope of a regression line differs significantly from 0.

Correlation

When two sets of data are strongly linked together we say they have a high Correlation

Correlation is Positive when the values increase together, and
Correlation is Negative when one value decreases as the other increases

Correlation can have a value:

1 is a perfect positive correlation
0 is no correlation (the values don't seem linked at all)
-1 is a perfect negative correlation

The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.

Regression

Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a de pendent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.

Regression analysis is widely used for prediction and weather forecasting.

Cross Tab

Cross tabulation (or crosstabs for short) is a statistical process that summarises categorical data to create a contingency table. They are heavily used in survey research, business intelligence, engineering and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

Some entries may be weighted, unweighted tables are commonly known as pivot tables.

Example]

Sample #	Gender	Handedness
1	Female	Right-handed
2	Male	Left-handed
3	Female	Right-handed
4	Male	Right-handed
5	Male	Left-handed
6	Male	Right-handed
7	Female	Right-handed
8	Female	Left-handed
9	Male	Right-handed
10	Female	Right-handed

Cross-tabulation leads to the following contingency table:

	Left- handed	Right- handed	Total
Males	2	3	5
Females	1	4	5
Total	3	7	10

We had a clear understanding of all these concepts in today's class.

Written by: Neeraj Ramadoss (2013167)

Group Members

Nishanth Agarwal

Nitin Kumar Shukla

Prakar Swami

Prerana Arora

Praveen Iyer

Neeraj Ramadoss

Applied Business Statistics

Friday, 30 August 2013

Statistics - Session 17 & Session 18

Example]

No comments:

Post a Comment