Applied Business Statistics

Mean :

In probability and statistics, mean and expected value are used synonymously to refer to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution. In the case of a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value x of X and its probability P(x), and then adding all these products together, giving $\mu = \sum x P(x)$ . An analogous formula applies to the case of a continuous probability distribution. Not every probability distribution has a defined mean; see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite: for example, when the probability of the value

is $\tfrac{1}{2^n}$ for n = 1, 2, 3, ....

Mode:

The mode is the value that appears most often in a set of data. The mode of a discrete probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. The mode of a continuous probability distribution is the value x at which its probability density function has its maximum value, so, informally speaking, the mode is at the peak.

Median :

In statistics and probability theory, the median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 5, 9} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values, which corresponds to interpreting the median as the fully trimmed mid-range. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data is contaminated, the median will not give an arbitrarily large result.

T- Test:

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's t distribution.

Two Sample t-Test

Purpose

To compare responses from two groups. These two groups can come from different experimental treatments, or different natural "populations".

Assumptions

each group is considered to be a sample from a distinct population
the responses in each group are independent of those in the other group
the distributions of the variable of interest are normal

How It Works

1. The null hypothesis is that the two population means are equal to each other. To test the null hypothesis, you need to calculate the following values:

(the means of the two samples),s₁², s₂² (the variances of the two samples), n₁, n₂ (the sample sizes of the two samples), and k (the degrees of freedom).

2. Compute the t-statistic.

3. Compare the calculated t-value, with k degrees of freedom, to the critical t value from the tdistribution table at the chosen confidence level and decide whether to accept or reject the null hypothesis.

*Reject the null hypothesis when: calculated t-value > critical t-value

4. Note: This procedure can be used when the distribution variances from the two populations are not equal, and the sample sizes are not equal.

Correlation

In the world of finance, a statistical measure of how two securities move in relation to each other. Correlations are used in advanced portfolio management.

Correlation is computed into what is known as the correlation coefficient, which ranges between -1 and +1. Perfect positive correlation (a correlation co-efficient of +1) implies that as one security moves, either up or down, the other security will move in lockstep, in the same direction. Alternatively, perfect negative correlation means that if one security moves in either direction the security that is perfectly negatively correlated will move in the opposite direction. If the correlation is 0, the movements of the securities are said to have no correlation; they are completely random.

In real life, perfectly correlated securities are rare, rather you will find securities with some degree of correlation.

Regression:

In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed.

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable; for example, correlation does not imply causation.

Regression Models:

Regression models involve the following variables:

· The unknown parameters, denoted as β, which may represent a scalar or a vector.

· The independent variables, X.

· The dependent variable, Y.

In various fields of application, different terminologies are used in place of dependent and independent variables.

A regression model relates Y to a function of X and β.

$Y \approx f (\mathbf {X}, \boldsymbol{\beta} )$

The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis, the form of the function f must be specified. Sometimes the form of this function is based on knowledge about the relationship between Y and X that does not rely on the data. If no such knowledge is available, a flexible or convenient form for f is chosen.

Cross Tab:

Cross tabulation (or crosstabs for short) is a statistical process that summarises categorical data to create a contingency table. They are heavily used in survey research, business intelligence, engineering and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

Some entries may be weighted, unweighted tables are commonly known as pivot tables.

n statistics, a "crosstab" is another name for a contingency table, which is a type of table created by cross tabulation. In survey research (e.g., polling, market research), a "crosstab" is any table showing summary statistics. Commonly, crosstabs in survey research are concatenations of multiple different tables. For example, the crosstab below combines multiple contingency tables and tables of averages.

Pareena Neema

Group Members:
Pareena Neema (2013191)
Abhishek Panwala(2013190)
Raghav Kabra(2013217)
Poorva Saboo(2013200)

Applied Business Statistics

Saturday, 31 August 2013

No comments:

Post a Comment