Wednesday, 3 July 2013

DAY 2 - Whisker & 3 M's

Part 1: Analysis of given data.

Data of 2 wheelers registered and population of city way given for the analysis. It included data for years 2009, 2010 and 2011 along with city and states. Analysis included:
  1. Finding Mean, Median and Mode: They can be defined as:
  •  Mean: It is the sum of a collection of numbers divided by the number of numbers in the collection.
  • Median: The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 5, 9} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values.
  • Mode: The mode is the value that appears most often in a set of data.
   2. Frequency Analysis: we classified all states and cities to 5 regions i.e. North, South, East, West and Central region.
3. Box Plot: It is a convenient way of graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers are plotted as individual points.
 
         We tried to find cities with highest population of vehicles first by conventional method mean, median and mode, than using boxplot. The result proved conventional method wrong because data given wasn't normally distributed. So right strategy should be chosen to interpret data.

Part 2:  Explanation of Boxplot.

  • A boxplot is a way of summarizing a set of data measured on an interval scale. It is used in exploratory data analysis. It is a type of graph which is used to show the shape of the distribution, its central value, and variability.
  • We have three points: the first middle point (the median), and the middle points of the two halves (what I call the "sub-medians"). These three points divide the entire data set into quarters, called "quartiles". 

Reading Boxplot:


Let's say we ask 1,000 people (and they miraculously all respond) how many burgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our box-and-whisker. Take the top 50% of the group (500) who ate more burgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (250) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side.
 Written By: Nikita Agarwal 2013171

Group Members:
Priyesh Bhadauriya
Nihal Moidu
Parth Mehta
Nimisha Agarwal
 

No comments:

Post a Comment