Saturday, 31 August 2013

Use of Statistical Tools for Project on Rural Urban Distribution of Sex Ratio

The 17TH & 18TH session began with us incorporating some of the very important tools of statistics in our projects such as Measures of Central Tendency – Mean, Median & Mode, T-Test, Regression, Co-relation, Xtab, etc.


Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.

·        Mean - The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

·        Median - The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, we first need to rearrange that data into order of magnitude (smallest first) and then our median mark is the middle mark.

·        Mode - The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. Therefore, sometimes mode is considered to be the most popular option.

                       


Summary of when to use the mean, median and mode

Type of Variable
Best measure of central tendency
Nominal
Mode
Ordinal
Median
Interval/Ratio (not skewed)
Mean
Interval/Ratio (skewed)
Median



T-Test

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's t distribution.




Among the most frequently used t-tests are:
  • A one-sample location test of whether the mean of a population has a value specified in a null hypothesis.
  • A two-sample location test of the null hypothesis that the means of two populations are equal.
  • A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero.
  • A test of whether the slope of a regression line differs significantly from 0.

There were a lot of new things that we got to learn in the 3 hours enriching our knowledge in statistics furthermore. The implementation of these formulas in the project gave us an idea of how these tools are used in a corporate setting and how to analyse and interpret the data and the various graphs. Overall, it was a very informative and interesting session.





Submitted By:- Priyanka Doshi - 2013212

Group members:-
Nilay Kohaley – 2013172
Pawan Agarwal  – 2013195
Poulami Sarkar  – 2013201
Pragya Singh – 2013203

No comments:

Post a Comment