SBD – SESSION 9
& 10
Different responses from customers
1) Strongly negative
2) Somewhat negative
3) Neutral
4) Somewhat positive
2) Somewhat negative
3) Neutral
4) Somewhat positive
5) Strongly positive
CROSS TABULATION
Cross-tabulation is one of the most
useful analytical tools and is a main-stay of the market research industry.
Cross-tabulation analysis, also known as contingency table analysis, is most
often used to analyze categorical (nominal measurement scale) data. A
cross-tabulation is a two (or more) dimensional table that records the number
(frequency) of respondents that have the specific characteristics described in
the cells of the table. Cross-tabulation tables provide a wealth of information
about the relationship between the variables. In simple terms cross tabulation
is a presentation of data about categorical variable in a tabular form to
aid in identifying a relationship between the variables.
After examining
the distribution of each of the variables, the researcher’s next task is to
look
for relationships among two or
more of the variables. Some of the tools that may be used
include correlation and
regression, or derivatives such as the t-test, analysis of variance, and
contingency table
(crosstabulation) analysis. The type of analysis chosen depends on the
research design, characteristics
of the variables, shape of the distributions, level of measurement,
and whether the assumptions
required for a particular statistical test are met.
A crosstabulation is a
joint frequency distribution of cases based on two or more categorical
variables. Displaying a
distribution of cases by their values on two or more variables is
known as contingency table
analysis and is one of the more commonly used analytic methods
in the social sciences. The joint
frequency distribution can be analyzed with the chisquare
statistic ( ) to determine
whether the variables are statistically independent or if
they are associated. If a
dependency between variables does exist, then other indicators of
association, such as Cramer’s V,
gamma, Sommer’s d, and so forth, can be used to describe
the degree which the values of
one variable predict or vary with those of the other variable.
More advanced techniques such as
log-linear models and multinomial regression can be
used to clarify the relationships
contained in contingency tables.
Considerations: Type of
variables. Are the variables of interest continuous or discrete (e.g.,
categorical)?
Categorical variables contain
integer values that indicate membership in one of several possible
categories. The range of possible
values for such variables is limited, and whenever the
range of possible values is
relatively circumscribed, the distribution is unlikely to approach
that of the Gaussian distribution.
Continuous variables, in contrast, have a much wider
range, no limiting categories,
and have the potential to approximate the Gaussian distribution,
provided their range is not
artifically truncated. Whenever you encounter a categorical
or a nominal, discrete variable,
be aware that the assumption of normality is likely violated.
Shape of the distribution.
Categorical variables often have such a small number of possible
values that one cannot even
pretend that the assumption of normality is approximated.
Consider for example, the
possible values for sex, grade levels, and so forth. Statistical tests
that require the assumption of
normality cannot be used to analyze such data. (Of course, a
statistical program such as SPSS
will process the numbers without complaint and yield
results that may appear to be
interpretable — but only to those who ignore the necessity of
examining the distributions of
each variable first, and who fail to check whether the
assumptions were met). Because
the assumption of normality is a requirement for the t-test,
analysis of variance, correlation
and regression, these procedures cannot be used to analyze
count data.
Assumptions: The assumptions for
chi-square include:
1. Random sampling is not
required, provided the sample is not biased. However, the best
way to insure the sample is not
biased is random selection.
2. Independent observations. A
critical assumption for chi-square is independence of observations.
One person’s response should tell
us nothing about another person’s response.
Observations are independent if
the sampling of one observation does not affect the choice
of the second observation. (In
contrast, consider an example in which the observations are
not independent. A researcher
wishes to estimate to what extent students in a school engage
in cheating on tests and
homework. The researcher randomly chooses one student to interview.
At the completion of the
interview the researcher asks the student for the name of a
friend so that the friend can be
interviewed, too).
3. Mutually exclusive row and
column variable categories that include all observations.
The chi-square test of
association cannot be conducted when categories overlap or do not
include all of the observations.
4. Large expected frequencies.
The chi-square test is based on an approcimation that works
best when the expected
frequencies are fairly large. No expected frequency should be less
than 1 and no more than 20% of
the expected frequencies should be less than 5.
Hypotheses The null hypothesis is
the k classifications are independent (i.e., no relationship between
classifications). The alternative
hypothesis is that the k classifications are dependent (i.e.,
that a relationship or dependency exists).
Submitted by: Nihal Moidu (2013170)
Group Members:
Nikita Agarwal
Nimisha Agarwal
Parth Mehta
Priyesh Bhadauriya
No comments:
Post a Comment