Applied Business Statistics

SBD – SESSION 9 & 10

Different responses from customers

1) Strongly negative
2) Somewhat negative
3) Neutral
4) Somewhat positive

5) Strongly positive

CROSS TABULATION

Cross-tabulation is one of the most useful analytical tools and is a main-stay of the market research industry. Cross-tabulation analysis, also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data. A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables. In simple terms cross tabulation is a presentation of data about categorical variable in a tabular form to aid in identifying a relationship between the variables.

After examining the distribution of each of the variables, the researcher’s next task is to look

for relationships among two or more of the variables. Some of the tools that may be used

include correlation and regression, or derivatives such as the t-test, analysis of variance, and

contingency table (crosstabulation) analysis. The type of analysis chosen depends on the

research design, characteristics of the variables, shape of the distributions, level of measurement,

and whether the assumptions required for a particular statistical test are met.

A crosstabulation is a joint frequency distribution of cases based on two or more categorical

variables. Displaying a distribution of cases by their values on two or more variables is

known as contingency table analysis and is one of the more commonly used analytic methods

in the social sciences. The joint frequency distribution can be analyzed with the chisquare

statistic ( ) to determine whether the variables are statistically independent or if

they are associated. If a dependency between variables does exist, then other indicators of

association, such as Cramer’s V, gamma, Sommer’s d, and so forth, can be used to describe

the degree which the values of one variable predict or vary with those of the other variable.

More advanced techniques such as log-linear models and multinomial regression can be

used to clarify the relationships contained in contingency tables.

Considerations: Type of variables. Are the variables of interest continuous or discrete (e.g., categorical)?

Categorical variables contain integer values that indicate membership in one of several possible

categories. The range of possible values for such variables is limited, and whenever the

range of possible values is relatively circumscribed, the distribution is unlikely to approach

that of the Gaussian distribution. Continuous variables, in contrast, have a much wider

range, no limiting categories, and have the potential to approximate the Gaussian distribution,

provided their range is not artifically truncated. Whenever you encounter a categorical

or a nominal, discrete variable, be aware that the assumption of normality is likely violated.

Shape of the distribution. Categorical variables often have such a small number of possible

values that one cannot even pretend that the assumption of normality is approximated.

Consider for example, the possible values for sex, grade levels, and so forth. Statistical tests

that require the assumption of normality cannot be used to analyze such data. (Of course, a

statistical program such as SPSS will process the numbers without complaint and yield

results that may appear to be interpretable — but only to those who ignore the necessity of

examining the distributions of each variable first, and who fail to check whether the

assumptions were met). Because the assumption of normality is a requirement for the t-test,

analysis of variance, correlation and regression, these procedures cannot be used to analyze

count data.

Assumptions: The assumptions for chi-square include:

1. Random sampling is not required, provided the sample is not biased. However, the best

way to insure the sample is not biased is random selection.

2. Independent observations. A critical assumption for chi-square is independence of observations.

One person’s response should tell us nothing about another person’s response.

Observations are independent if the sampling of one observation does not affect the choice

of the second observation. (In contrast, consider an example in which the observations are

not independent. A researcher wishes to estimate to what extent students in a school engage

in cheating on tests and homework. The researcher randomly chooses one student to interview.

At the completion of the interview the researcher asks the student for the name of a

friend so that the friend can be interviewed, too).

3. Mutually exclusive row and column variable categories that include all observations.

The chi-square test of association cannot be conducted when categories overlap or do not

include all of the observations.

4. Large expected frequencies. The chi-square test is based on an approcimation that works

best when the expected frequencies are fairly large. No expected frequency should be less

than 1 and no more than 20% of the expected frequencies should be less than 5.

Hypotheses The null hypothesis is the k classifications are independent (i.e., no relationship between

classifications). The alternative hypothesis is that the k classifications are dependent (i.e.,

that a relationship or dependency exists).

Submitted by: Nihal Moidu (2013170)

Group Members:

Nikita Agarwal

Nimisha Agarwal

Parth Mehta

Priyesh Bhadauriya

Applied Business Statistics

Sunday, 21 July 2013

No comments:

Post a Comment