In the second session
we were taught the following techniques of representing data. 
TECHNIQUES OF
REPRESENTATION OF DATA-:
·      
Bubble
Chart: A bubble
chart is a type of chart that displays three dimensions of
data. Each entity with its triplet (v1, v2, v3) of associated
data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its
size. Bubble charts can facilitate the understanding of social, economical,
medical, and other scientific relationships.
·      
Histogram:
A histogram is
a graphical representation of the distribution of data. It is an estimate of
the probability distribution  of a continuous variable  and
was first introduced by Karl Pearson. A
histogram is a representation of tabulated frequencies, shown as adjacent rectangles, erected over discrete intervals (bins),
with an area equal to the frequency of the observations in the interval.
- Boxplot – A boxplot is a way of summarizing a set of data measured on an interval scale. It is often used in exploratory data analysis. It is a type of graph which is used to show the shape of the distribution, its central value, and variability. The picture produced consists of the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median.
METHODS
OF CENTRAL MEASUREMENT
- Mean: Summing up all the observation and dividing by number of observations. Mean of 20, 30, 40 is (20+30+40)/3 = 30.
- Median: The middle value in an ordered sequence of observations. That is, to find the median we need to order the data set and then find the middle value. In case of an even number of observations the average of the two middle most values is the median. For example, to find the median of {9, 3, 6, 7, 5}, we first sort the data giving {3, 5, 6, 7, 9}, then choose the middle value 6. If the number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the median is the average of the two middle values from the sorted sequence, in this case, (5 + 6) / 2 = 5.5.
- Mode: The value that is observed most frequently. The mode is undefined for sequences in which no observation is repeated.
We were given a data
of population of cities of different States and 2-wheelers bought in those
cities for the year 2009, 2010 and 2011. Using the provided data we were asked
to find out the cities having maximum number of 2-wheelers in a given Region (North,
South, East, West and Central).
A. We loaded the excel sheet in SPSS. Following are
the steps:
1. In SPSSWIN click on FILE ⇒ OPEN ⇒ DATA. The OPEN DATA
FILE Dialog Box will appear.
2. Locate the file of interest: Use the "Look
In" pull-down list to identify the folder containing the Excel file of
interest
3. From the FILE TYPE pull down menu select EXCEL
(*.xls).
4. Click on the file name of interest and click on
OPEN or simply double-click on the file name.
5. Keep the box checked that reads "Read
variable names from the first row of data". This presumes that the first
row of the Excel data file contains variable names in the first row. [If the
data resided in a different worksheet in the Excel file, this would need to be
entered.]
6. Click on OK. The Excel data file will now appear
in the SPSSWIN Data Editor.
B.  State was
transformed into Statenum.
C. We calculated the Ratio of 2 wheeler to
population and Sorted them in ascending order.
D. The Average, mean, median & mode were then
calculated in excel.
Written by: Parita Mandhana
Group Member:
Abhishek Panwala
Poorva saboo
Raghav Kabra
Pareena Neema




 
No comments:
Post a Comment