In today's lecture we learnt how to
work on retail stores data. We learnt how to process data and the different
relations between various variables
variable view of the data.
The classification of variables is as follows-
Nominal - It is the lowest level of data measurement . These
numbers don't have any meaning . These can only be used to classify or
categorize .
Ordinal - It is the second level of data measurement . These
numbers can be used to rank or order objects.
Next we learnt the concept of crosstabs-
Crosstabs-In statistics, a "crosstab" is another name
for a contingency table, which is a type of table
created by crosstabulation. In survey research (e.g.,
polling, market research), a "crosstab" is any table showing summary
statistics. Commonly, crosstabs in survey research are
concatenations of multiple different tables.
This is how
we do crosstab.
Analyse ->
Descriptive Statistics -> Crosstabs
After examining the distribution of each of the variables, the
researcher’s next task is to look
for relationships among two or more of the variables. Some of the
tools that may be used
include correlation and regression, or derivatives such as
the t-test, analysis of variance, and
contingency table (crosstabulation) analysis. The type of analysis
chosen depends on the
research design, characteristics of the variables, shape of the
distributions, level of measurement,
and whether the assumptions required for a particular statistical
test are met.
A crosstabulation is a joint frequency
distribution of cases based on two or more categorical
variables. Displaying a distribution of cases by their values on
two or more variables is
known as contingency table analysis and is one of the more
commonly used analytic methods
in the social sciences
Assumptions: The assumptions for chi-square include:
1. Random sampling is not required, provided the sample is
not biased. However, the best
way to insure the sample is not biased is random selection.
2. Independent observations. A critical assumption for chi-square
is independence of observations.
One person’s response should tell us nothing about another
person’s response.
Observations are independent if the sampling of one observation
does not affect the choice
of the second observation. (In contrast, consider an example in
which the observations are
not independent. A researcher wishes to estimate to what extent
students in a school engage
in cheating on tests and homework. The researcher randomly chooses
one student to interview.
At the completion of the interview the researcher asks the student
for the name of a
friend so that the friend can be interviewed, too).
3. Mutually exclusive row and column variable categories that
include all observations.
The chi-square test of association cannot be conducted when categories
overlap or do not
include all of the observations.
4. Large expected frequencies. The chi-square test is based on an
approcimation that works
best when the expected frequencies are fairly large. No expected
frequency should be less
than 1 and no more than 20% of the expected frequencies should be
less than 5.
Hypotheses The null hypothesis is the k classifications
are independent (i.e., no relationship between
classifications). The alternative hypothesis is that the k classifications
are dependent (i.e.,
that a relationship or dependency exists).
Next
we learned about Null Hypothesis
and CHI-Square
The
null hypothesis in cross tab says that there is no relationship between the two
variables we are testing.
Significance
value - < 0.05 reject
> 0.05 accept
When
it is more than 0.05 we accept it and it means there is no relation between the
two variables.
Then we did the correlation of various satisfactions.
If
the correlation value is high then if one variable is high than other
correlation variable is also high.
When
the numbers get low, stop analysis.
For
any deep analysis take big sample space.
Submitted
by-
Pranav
Sharma (2013206)
Group
Members-
Payal
Singh
Nupur
Mandhyan
Omkar
Radhika
Agarwall
No comments:
Post a Comment