Sunday, 21 July 2013

SBD – SESSION 9 & 10




Different responses from customers
        1) Strongly negative
        2) Somewhat negative
        3) Neutral
        4) Somewhat positive
        5) Strongly positive

CROSS TABULATION

Cross-tabulation is one of the most useful analytical tools and is a main-stay of the market research industry. Cross-tabulation analysis, also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data. A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables. In simple terms cross tabulation is a presentation of data about categorical variable in a tabular form to aid in identifying a relationship between the variables.

After examining the distribution of each of the variables, the researcher’s next task is to look
for relationships among two or more of the variables. Some of the tools that may be used
include correlation and regression, or derivatives such as the t-test, analysis of variance, and
contingency table (crosstabulation) analysis. The type of analysis chosen depends on the
research design, characteristics of the variables, shape of the distributions, level of measurement,
and whether the assumptions required for a particular statistical test are met.
A crosstabulation is a joint frequency distribution of cases based on two or more categorical
variables. Displaying a distribution of cases by their values on two or more variables is
known as contingency table analysis and is one of the more commonly used analytic methods
in the social sciences. The joint frequency distribution can be analyzed with the chisquare
statistic ( ) to determine whether the variables are statistically independent or if
they are associated. If a dependency between variables does exist, then other indicators of
association, such as Cramer’s V, gamma, Sommer’s d, and so forth, can be used to describe
the degree which the values of one variable predict or vary with those of the other variable.
More advanced techniques such as log-linear models and multinomial regression can be
used to clarify the relationships contained in contingency tables.
Considerations: Type of variables. Are the variables of interest continuous or discrete (e.g., categorical)?
Categorical variables contain integer values that indicate membership in one of several possible
categories. The range of possible values for such variables is limited, and whenever the
range of possible values is relatively circumscribed, the distribution is unlikely to approach
that of the Gaussian distribution. Continuous variables, in contrast, have a much wider
range, no limiting categories, and have the potential to approximate the Gaussian distribution,
provided their range is not artifically truncated. Whenever you encounter a categorical
or a nominal, discrete variable, be aware that the assumption of normality is likely violated.
Shape of the distribution. Categorical variables often have such a small number of possible
values that one cannot even pretend that the assumption of normality is approximated.
Consider for example, the possible values for sex, grade levels, and so forth. Statistical tests
that require the assumption of normality cannot be used to analyze such data. (Of course, a
statistical program such as SPSS will process the numbers without complaint and yield
results that may appear to be interpretable — but only to those who ignore the necessity of
examining the distributions of each variable first, and who fail to check whether the
assumptions were met). Because the assumption of normality is a requirement for the t-test,
analysis of variance, correlation and regression, these procedures cannot be used to analyze
count data.

Assumptions: The assumptions for chi-square include:
1. Random sampling is not required, provided the sample is not biased. However, the best
way to insure the sample is not biased is random selection.
2. Independent observations. A critical assumption for chi-square is independence of observations.
One person’s response should tell us nothing about another person’s response.
Observations are independent if the sampling of one observation does not affect the choice
of the second observation. (In contrast, consider an example in which the observations are
not independent. A researcher wishes to estimate to what extent students in a school engage
in cheating on tests and homework. The researcher randomly chooses one student to interview.
At the completion of the interview the researcher asks the student for the name of a
friend so that the friend can be interviewed, too).
3. Mutually exclusive row and column variable categories that include all observations.
The chi-square test of association cannot be conducted when categories overlap or do not
include all of the observations.
4. Large expected frequencies. The chi-square test is based on an approcimation that works
best when the expected frequencies are fairly large. No expected frequency should be less
than 1 and no more than 20% of the expected frequencies should be less than 5.
Hypotheses The null hypothesis is the k classifications are independent (i.e., no relationship between
classifications). The alternative hypothesis is that the k classifications are dependent (i.e.,
that a relationship or dependency exists).

Submitted by: Nihal Moidu (2013170)
Group Members:
Nikita Agarwal
Nimisha Agarwal
Parth Mehta
Priyesh Bhadauriya


No comments:

Post a Comment