Friday 30 August 2013

STATISTICS SESSION - 17 & 18    [ 29th & 30th AUGUST ]


Today we started with discussing the project that has been given to us by getting a view on some statistics concepts that include mean, median, mode, T test, correlation, regression and cross tab. A brief discussion on this includes-


MEAN, MEDIAN, MODE- Mean, median, and mode are three kinds of "averages". There are many "averages" in statistics, but these are, I think, the three most common, and are certainly the three you are most likely to encounter in your pre-statistics courses, if the topic comes up at all.The "mean" is the "average" you're used to, where you add up all the numbers and then divide by the number of numbers. The "median" is the "middle" value in the list of numbers. To find the median, your numbers have to be listed in numerical order, so you may have to rewrite your list first. The "mode" is the value that occurs most often. If no number is repeated, then there is no mode for the list.The "range" is just the difference between the largest and smallest values.
  • Find the mean, median, mode, and range for the following list of values:
    13, 18, 13, 14, 13, 16, 14, 21, 13
    The mean is the usual average, so:
    (13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15
    Note that the mean isn't a value from the original list. This is a common result. You should not assume that your mean will be one of your original numbers.
    The median is the middle value, so I'll have to rewrite the list in order:
    13, 13, 13, 13, 14, 14, 16, 18, 21
    There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th number:
    13, 13, 13, 13, 14, 14, 16, 18, 21
    So the median is 14.
    The mode is the number that is repeated more often than any other, so 13 is the mode.
    The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8.
    mean: 15
    median: 14
    mode: 13
    range: 8

    T TEST- We use this test for comparing the means of two samples (or treatments), even if they have different numbers of replicates. In simple terms, the t-test compares the actual difference between two means in relation to the variation in the data (expressed as the standard deviation of the difference between the means).

    REGRESSION -  Regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model.

Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables. A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e., the scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. A valuable numerical measure of association between two variables is the correlation coefficient, which is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables.
A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

    CROSS TAB - 
    Cross-tabulation is one of the most useful analytical tools and is a main-stay of the market research industry. One estimate is that single variable frequency analysis and cross-tabulation analysis account for more than 90% of all research analyses.
    Cross-tabulation analysis, also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data. A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table. Cross-tabulation tables provide a wealth of information about the relationship between the variables.
    Cross-tabulation analysis has its own unique language, using terms such as “banners”, “stubs”, “Chi-Square Statistic” and “Expected Values.”
This was a quick review of what we studied in today's lecture.


WRITTEN BY- Nishant Aggarwal


GROUP MEMBERS-Neeraj Ramadoss Nitin Kumar ShuklaPrakar SwamiPrerana AroraPraveen IyerNishant Aggarwal


No comments:

Post a Comment