Mean :
In probability and statistics, mean and expected value are used synonymously to refer to one measure of the central tendency either of a probability
distribution or of the random variable characterized by that
distribution. In the case of a discrete
probability distribution of a random variable X, the mean is equal to the sum
over every possible value weighted by the probability of that value; that is, it
is computed by taking the product of each possible value x of X and its probability P(x), and
then adding all these products together, giving . An analogous formula applies to the
case of a continuous
probability distribution. Not every probability distribution has a defined
mean; see the Cauchy distribution for an example. Moreover, for some distributions the mean is infinite:
for example, when the probability of the value is for n = 1, 2, 3, ....
Mode:
The mode is the value that appears most often
in a set of data. The mode of a discrete
probability distribution is the value x at which its probability mass
function takes its maximum value. In
other words, it is the value that is most likely to be sampled. The mode of a continuous
probability distribution is the value x at which its probability density
function has its maximum value, so,
informally speaking, the mode is at the peak.
Median
:
In statistics and probability theory, the median is the numerical value separating the
higher half of a data sample, a population, or a probability
distribution, from the lower half. The median of a finite list of numbers can be
found by arranging all the observations from lowest value to highest value and
picking the middle one (e.g., the median of {3, 5, 9} is 5). If there is an
even number of observations, then there is no single middle value; the median
is then usually defined to be the mean of the two middle values, which corresponds to interpreting the
median as the fully trimmed mid-range. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data is contaminated, the
median will not give an arbitrarily large result.
T-
Test:
A t-test is any statistical
hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are
significantly different from each other, and is most commonly applied when the
test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When
the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows
a Student's t distribution.
Two Sample t-Test
Purpose
To compare responses from two groups. These two
groups can come from different experimental treatments, or different natural
"populations".
Assumptions
- each group is
considered to be a sample from a distinct population
- the responses in
each group are independent of those in the other group
- the distributions
of the variable of interest are normal
How It Works
1. The null
hypothesis is that the two population means are equal to each other. To test
the null hypothesis, you need to calculate the following values: (the
means of the two samples),s12, s22 (the variances
of the two samples), n1, n2 (the
sample sizes of the two samples), and k (the degrees of freedom).
2. Compute
the t-statistic.
3. Compare
the calculated t-value,
with k degrees of freedom, to the critical t value from the tdistribution
table at the chosen confidence level and decide whether to accept or reject the
null hypothesis.
*Reject
the null hypothesis when: calculated t-value
> critical t-value
4. Note:
This procedure can be used when the distribution variances from the two
populations are not equal, and the sample sizes are not equal.
In the world of finance, a
statistical measure of how two securities move in relation to each other.
Correlations are used in advanced portfolio management.
Correlation is computed
into what is known as the correlation coefficient, which ranges between -1 and
+1. Perfect positive correlation (a correlation co-efficient of +1) implies
that as one security moves, either up or down, the other security will move in
lockstep, in the same direction. Alternatively, perfect negative correlation
means that if one security moves in either direction the security that is
perfectly negatively correlated will move in the opposite direction. If the
correlation is 0, the movements of the securities are said to have no
correlation; they are completely random.
In real life, perfectly correlated securities are rare, rather you will find securities with some degree of correlation.
In real life, perfectly correlated securities are rare, rather you will find securities with some degree of correlation.
Regression:
In statistics, regression analysis is a statistical process for
estimating the relationships among variables. It includes many techniques for
modeling and analyzing several variables, when the focus is on the relationship
between a dependent variable and one or more independent variables. More specifically,
regression analysis helps one understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the
other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed.
Regression
analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field
of machine learning. Regression analysis is also used to
understand which among the independent variables are related to the dependent
variable, and to explore the forms of these relationships. In restricted
circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead
to illusions or false relationships, so caution is advisable; for example, correlation
does not imply causation.
Regression Models:
Regression models involve the following
variables:
·
The independent variables, X.
·
The dependent variable, Y.
In various fields of application, different terminologies are used in
place of dependent and independent variables.
A regression model relates Y to a function of X and β.
The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis, the form of the function f must be specified. Sometimes the form of this function is based on
knowledge about the relationship between Y and X that does not rely on the data. If no such knowledge is available,
a flexible or convenient form for f is chosen.
Cross Tab:
Cross
tabulation (or crosstabs for short) is a statistical process
that summarises categorical data to
create a contingency table. They
are heavily used in survey research, business intelligence, engineering and
scientific research. They provide a basic picture of the interrelation between
two variables and can help find interactions between them.
Some entries may be weighted, unweighted tables are
commonly known as pivot tables.
n statistics, a "crosstab" is another name for a contingency table,
which is a type of table created by cross tabulation. In survey
research (e.g., polling, market research), a "crosstab" is any table
showing summary statistics.
Commonly, crosstabs in survey research are concatenations of multiple different
tables. For example, the crosstab below combines multiple contingency tables
and tables of averages.
Pareena Neema
Group Members:
Pareena Neema (2013191)
Abhishek Panwala(2013190)
Raghav Kabra(2013217)
Poorva Saboo(2013200)
No comments:
Post a Comment