CORRELATION
In statistics, dependence refers to any statistical relationship
between two random variables or
two sets of data. Correlation refers to any of a broad class of
statistical relationships involving dependence.
Familiar
examples of dependent phenomena include the correlation between the physical statures of
parents and their offspring, and the correlation between the demand for a
product and its price. Correlations are useful because they can indicate a
predictive relationship that can be exploited in practice. For example, an
electrical utility may produce less power on a mild day based on the
correlation between electricity demand and weather. In this example there is a causal
relationship, because extreme weather causes people to
use more electricity for heating or cooling; however, statistical dependence is
not sufficient to demonstrate the presence of such a causal relationship (i.e., correlation does not imply causation).
Formally, dependence refers to any situation in which
random variables do not satisfy a mathematical condition of probabilistic independence. In loose usage, correlation can refer to any departure of two or
more random variables from independence, but technically it refers to any of
several more specialized types of relationship between mean
values. There are several correlation coefficients, often
denoted ρ or r,
measuring the degree of correlation. The commonest of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship
between two variables (which may exist even if one is a nonlinear function of
the other). Other correlation coefficients have been developed to be more robust than
the Pearson correlation – that is, more sensitive to nonlinear
relationships. Mutual
information can also be applied to measure dependence between two
variables.
The most familiar measure of dependence between two quantities is the Pearson
product-moment correlation coefficient, or "Pearson's correlation coefficient", commonly called
simply "the correlation coefficient". It is obtained by dividing the covariance of the two variables by the product of
their standard deviationsThe Pearson
correlation is defined only if both of the standard deviations are finite and
both of them are nonzero. It is a corollary of the Cauchy–Schwarz inequality that
the correlation cannot exceed 1 in absolute value. The correlation coefficient is symmetric: corr(X,Y) = corr(Y,X).
The Pearson correlation
is +1 in the case of a perfect positive (increasing) linear relationship
(correlation), −1 in the case of a perfect decreasing (negative) linear
relationship (anticorrelation),[5]and some value between −1 and 1 in all other cases,
indicating the degree of linear
dependence between the variables. As it approaches zero there is less
of a relationship (closer to uncorrelated). The closer the coefficient is to
either −1 or 1, the stronger the correlation between the variables.
Techniques
in Determining Correlation
There are
several different correlation techniques. The Survey System's optional Statistics
Moduleincludes
the most common type, called the Pearson or product-moment correlation. The
module also includes a variation on this type called partial correlation. The
latter is useful when you want to look at the relationship between two
variables while removing the effect of one or two other variables.
Like all
statistical techniques, correlation is only appropriate for certain kinds of
data. Correlation works for quantifiable data in which numbers
are meaningful, usually quantities of some sort. It cannot be used for purely
categorical data, such as gender, brands purchased, or favorite color.
REGRESSION:
In statistics, regression
analysis is a statistical process for estimating the relationships
among variables. It includes many techniques for modeling and analyzing several
variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression
analysis helps one understand how the typical value of the dependent variable
changes when any one of the independent variables is varied, while the other
independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given
the independent variables – that is, the average value of
the dependent variable when the independent variables are fixed. Less commonly,
the focus is on a quantile, or other location parameter of the conditional distribution
of the dependent variable given the independent variables. In all cases, the
estimation target is a function of the independent variables
called the regression function. In regression analysis, it is also
of interest to characterize the variation of the dependent variable around the
regression function, which can be described by a probability distribution.
Regression
analysis is widely used for prediction and forecasting,
where its use has substantial overlap with the field of machine learning.
Regression analysis is also used to understand which among the independent
variables are related to the dependent variable, and to explore the forms of
these relationships. In restricted circumstances, regression analysis can be
used to infer causal relationships between the independent and
dependent variables. However this can lead to illusions or false relationships,
so caution is advisable;for example, correlation does
not imply causation.
A large body
of techniques for carrying out regression analysis has been developed. Familiar
methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is
defined in terms of a finite number of unknown parameters that
are estimated from the data. Nonparametric regression refers to techniques that allow
the regression function to lie in a specified set of functions, which may be infinite-dimensional. n statistics, linear
regression is an approach to model the relationship between a
scalar dependent variable y and one or
more explanatory variablesdenoted X. The case of one
explanatory variable is called simple linear regression.
Steps in
such investigation
- Plot the data. In many cases the plot can tell us
visually whether there seems to be a relationship: if there is some
correlation, do the variables increase or decrease together?, does one
decrease when the other increases? Also, is a straight line a
suitable model to describe the relationship between the two variables, and
so on. If we want to go beyond this qualitative level of analysis then
simple linear regression is often a useful tool. This involves fitting a
straight line through our data and investigating the properties of the
fitted line. It is conventional to plot the Y- response variable on the
vertical axis and the independent variable X on the horizontal axis.
- Plot the line of best fit. If the the plot suggests a linear relationship, we proceed to quantify the relationship between the two variables by fitting a regression line through the data points.
Using
regression we can also fit many other types of models including those where we
have more than one independent variable.
Submitted
By: Pragya Singh (2013203)
Group
Members: Priyanka Doshi (2013212)
Poulami Sarkar
(2013201)
Nilay Kohaley
(2013172)
Pawan Agarwal
(2013195)