Thursday, 4 July 2013

Second Day: Spss


In the second session we were taught the following techniques of representing data.

TECHNIQUES OF REPRESENTATION OF DATA-:
  
·       Bubble Chart: A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size. Bubble charts can facilitate the understanding of social, economical, medical, and other scientific relationships.
·       Histogram: A histogram is a graphical representation of the distribution of data. It is an estimate of the probability distribution  of a continuous variable  and was first introduced by Karl Pearson. A histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval.





  •   Boxplot –  A boxplot is a way of summarizing a set of data measured on an interval scale. It is often used in exploratory data analysis. It is a type of graph which is used to show the shape of the distribution, its central value, and variability. The picture produced consists of the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median. 







METHODS OF CENTRAL MEASUREMENT
  • Mean: Summing up all the observation and dividing by number of observations. Mean of 20, 30, 40 is (20+30+40)/3 = 30.
  • Median: The middle value in an ordered sequence of observations. That is, to find the median we need to order the data set and then find the middle value. In case of an even number of observations the average of the two middle most values is the median. For example, to find the median of {9, 3, 6, 7, 5}, we first sort the data giving {3, 5, 6, 7, 9}, then choose the middle value 6. If the number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the median is the average of the two middle values from the sorted sequence, in this case, (5 + 6) / 2 = 5.5.
  • Mode: The value that is observed most frequently. The mode is undefined for sequences in which no observation is repeated.


We were given a data of population of cities of different States and 2-wheelers bought in those cities for the year 2009, 2010 and 2011. Using the provided data we were asked to find out the cities having maximum number of 2-wheelers in a given Region (North, South, East, West and Central).

A. We loaded the excel sheet in SPSS. Following are the steps:
1. In SPSSWIN click on FILE OPEN DATA. The OPEN DATA FILE Dialog Box will appear.
2. Locate the file of interest: Use the "Look In" pull-down list to identify the folder containing the Excel file of interest
3. From the FILE TYPE pull down menu select EXCEL (*.xls).
4. Click on the file name of interest and click on OPEN or simply double-click on the file name.
5. Keep the box checked that reads "Read variable names from the first row of data". This presumes that the first row of the Excel data file contains variable names in the first row. [If the data resided in a different worksheet in the Excel file, this would need to be entered.]
6. Click on OK. The Excel data file will now appear in the SPSSWIN Data Editor.

B.  State was transformed into Statenum.




C. We calculated the Ratio of 2 wheeler to population and Sorted them in ascending order.



D. The Average, mean, median & mode were then calculated in excel.


Written by: Parita Mandhana

Group Member:
Abhishek Panwala
Poorva saboo
Raghav Kabra
Pareena Neema

No comments:

Post a Comment