Friday 19 July 2013

A Day Well Spent With Statistics (Session 7 & 8, 19th July, 2013)

The session started with a new statistics application named as PERMAP. Lets have 
                                                    
                                                 An Introduction to PERMAP
PERMAP is a program that uses multidimensional scaling (MDS) to reduce multiple pair wise relationships to 2-D pictures, commonly called perceptual maps. The Churchill data are in the form of correlation coefficients that show the relationships between 10 factors that influence the image of a department store. These correlation coefficients were calculated from responses to semantic differential scale questions given to a random selection of shoppers.

Purpose of PERMAP: The use of MDS for the construction of perceptual maps is well developed and several computer programs are available. In fact, MDS was one of the earliest uses of high-speed computers in psychology and the social sciences. The purpose of PERMAP is to provide a particularly convenient method of producing perceptual maps and to do so in a way that helps the researcher avoid a number of common mistakes, as described in following sections.

Usefulness of perceptual maps:  A major advantage of MDS and perceptual maps is that they deal with problems associated with substantiating and communicating results based on data involving more than two dimensions. They discussed the importance of graphical communications and the role of the eye in interpreting and distinguishing object (factor, stimulus, characteristic) grouping.
Although experts may be able to extract the subtle relationships represented in a matrix of numbers, this skill is not widespread. Another important aspect of perceptual maps is that they are forgiving of missing or imprecise data points. Whereas some analytical techniques cannot tolerate missing elements in the input matrix, MDS results are often unaffected. This is because it is not uncommon for there to be much redundancy in the information given by a complete matrix of dissimilarities.

Existing perceptual mapping difficulties: Although the theory behind making perceptual maps is well developed, its application has been controversial There are four major concerns that PERMAP can help alleviate. These include avoiding local minima (i.e., configurations that are optimal with respect to small changes in configuration but not optimal with respect to all possible changes), proving complete convergence, minimizing the influence of outliers, and combining multiple correlation matrices. With care, batch-operated programs can be used in such a manner that all of these difficulties are properly addressed, but moving to a visually interactive program renders these difficulties easier to deal with.
PERMAP provides an interactive, visual system for the construction of perceptual maps from multidimensional dissimilarity data. It can treat up to 30 objects and can aggregate an unlimited number of matrices (cases) describing the pair wise differences or similarities among the objects. Aggregation can be accomplished using any of three methods, and the use of weighting factors is available.
PERMAP was designed to be simple and easy to use by a novice and to offer enough advanced features that it would be of value to the expert. Its major improvement over existing perceptual mapping programs is that it was designed specifically to combat certain common errors associated with multidimensional scaling. For instance, it is particularly effective at showing incomplete convergence, trapping by a local minima, and outlier influence. It is also effective at revealing the importance, or lack of importance, of the choice of the distance metric used in the objective function. Overall, the program provides a means for the researcher to go beyond just finding a solution to developing a feel for the suitability, stability, and variability of the solution.

Then we started working on a new phenomenon known as Z-Scores, which was very much in continuation with our previous learnings of mean & standard deviation. 
Lets know what are Z-Scores?

Sometimes we want to do more than summarize a bunch of scores. Sometimes we want to talk about particular scores within the bunch. We may want to tell other people about whether or not a score is above or below average. We may want to tell other people how far away a particular score is from average. We might also want to compare scores from different bunches of data. We will want to know which score is better. Z-scores can help with all of this.

They Tell Us Important Things

Z-Scores tell us whether a particular score is equal to the mean, below the mean or above the mean of a bunch of scores. They can also tell us how far a particular score is away from the mean. Is a particular score close to the mean or far away?

If a Z-Score….

ü      Has a value of 0, it is equal to the group mean.
ü      Is positive, it is above the group mean.
ü      Is negative, it is below the group mean.
ü      Is equal to +1, it is 1 Standard Deviation above the mean.
ü      Is equal to +2, it is 2 Standard Deviations above the mean.
ü      Is equal to -1, it is 1 Standard Deviation below the mean.
ü      Is equal to -2, it is 2 Standard Deviations below the mean.

How typical a particular score is within bunch of scores? If data are normally distributed, approximately 95% of the data should have Z-score between -2 and +2. Z-scores that do not fall within this range may be less typical of the data in a bunch of scores.

Individual scores from different bunch of data. We can use Z-scores to standardize scores from different groups of data. Then we can compare raw scores from different bunches of data.


We worked out an example related to cars having different Engine sizes, horse power & mileages & calculated in Excel. So here we will discuss how can we calculate Z-Scores using SPSS software.


I’m going to use this example to help you understand how to enter the data. You can follow along first and then enter your own data by using the same steps. Just change the data points of course. Suppose you want to know how well you are doing relative to everyone else in your class on your first test. Here are all the test scores for your class, including yours. 

·         Nidhi = 89%
·         Nikita  = 75%
·         Pallavi  = 50%
·         Palak = 90%
·         Nitin = 81%
·         Nitesh = 65%
·         You = 98%
·         Nishidh = 94%
·         Nihal= 84%
·         Neha = 70%

Enter all test scores into the first column of cells


Simply type in each test score and hit the enter key. Every time you hit enter, the cursor will move to the cell below the one you are currently working in



In this example, you can see all of the test scores in one column. For example, the number 89 in the top cell is the test score of Nidhi. The number 70 in the bottom cell is the test score of Neha.

Naming your variable


It’s really important to name variables for data that you enter. This helps you keep track of them later on when you are trying to analyse  Your current variable name will appear at the top of the column of data.. To rename your variable, double click on the box that says var00001.

A Define Variable box will appear


This box will let you modify some things about your variable. This includes the variable name. Look in the Variable Name box. You can see that right now, the variable name is still var00001.



Give your variable a meaningful name


To change the name, type another name into the box. Make sure the name is meaningful to you and that it describes your variable. Since I am giving an example about grades data, I will name my variable Grades.




Check out what happens to your variable name


When you click the OK button, your variable name will change. In this example, the variable name ‘grades’ now appears where ‘var00001’ used to be.



Success


Congrats! You have just entered data that can be transformed into Z-Scores. But wait. Most students think that they are finished when they name the variable. Not so. There is one last think to do and it is so important.

Save your data to a meaningful place with a meaningful name


Always remember to save your data file before doing an analysis. I am going to choose the name “Z-scores for grades in my class.sav” because it most accurately describes my data. While a bit long, this name will help me find my SPSS data file in the future. I am going to save this file to a folder named MySPSSdocuments because that will help me find the file in the future as well.



Analyze



Click “Analyze,” “Descriptive Statistics,” and then “Descriptives.”




This box will appear. There will be two big windows in this box, one on the right and one on the left. You should see your variable name in the box on the left. Your goal will be to move your variable name to the Variables box on the right. To do this, click on your variable name to highlight it. Next, click the arrow button.


  

Your variable name should move to the box on the right. The next step will be for you to check the box labeled “Save standardized values as variables.” To put a check in this box, simply click the box with your mouse.

Click OK


When you are finished, click the OK button and wait a few seconds for processing.



You will see some descriptive statistics for your data set like the number of scores (N), minimum and maximum score, the mean and the Standard Deviation. This file will NOT contain your Z-scores. However, you may want to remember the mean so write it down if that’s the case.


So where are the Z-scores


It’s interesting. The Z-scores do not appear in the output file. They actually appear instead in the data file that you had created earlier. This data file should still be open. Click this data file to view it.

In your data file


You will see two columns in your data file. The first column is the one that you created. It contains the name that you gave it and the scores that you entered. In our example, the first column is still “grades” and still contains all of the exam grades from students in the class. The second column is new. It is something that was generated by SPSS when we conducted our analysis. The second column contains Z-scores. SPSS will name the second column for you. It will give the second column the same name as the first column with a letter ‘z’ in front of it. In our example, the second column is named ‘zgrades’ to tell you that it contains all of the Z-scores for the grades column.


Save the Data file again


You data file has changed since you conducted this analysis. So, it’s a good idea to save this file again. Click ‘File’ and then ‘Save.’ If you do this, your file will be saved under the same name that you chose originally.


Look in your data file


The data that you typed in will appear in the left most column. Your z-scores will appear in a second column of data with the letter ‘z’ in front of its name. In our example, the second column is named ‘zgrades’ to tell you that it contains all of the Z-scores for the grades column.

Identifying Z-Scores


Each data point that you entered in the column on the left will have a corresponding z-score printed in the column just next to it. In our example, the first score that we entered was a grade of 89. You can see this score at the top of the left most column. The z-score for this raw score of 89 is 0.64 (when rounded). You can see the 0.63594 at the top of the second column. So each z-score will be printed right next to each raw score.

Positive and Negative Z-Scores


Some z-scores will be positive whereas others will be negative. If a z-score is positive, its’ corresponding raw score is above (greater than) the mean. If a z-score is negative, its’ corresponding raw score is below (less than) the mean.

If a Z-Score has a Positive Value…


This means that it is above the group mean. See all of the positive z-scores in our example? For example, look at the top most z-score, 0.64. It is positive because it does not have a negative sign in front of it. This z-score corresponds with the exam score of 89%. Because the z-score is positive, we can conclude that the exam score of 89% is above the group mean. This means that the person who scored a 89% performed better than average.

If a Z-Score has a Negative Value…


This means that it is below the group mean. See the three negative z-scores in our example? They are the ones with the negative sign in front of them. For example, look at the bottom most z-score, -0.65. This z-score corresponds with the exaam score of 70%. Because the z-score is negative, we can conclude that the exam score of 70% is below the group mean. This means that the person who scored a 70% performed less than average.

Z-Scores and Standard Deviation


The absolute value of the z-score tells you how many standard deviations you are away from the mean. If a z-score is equal to 0, it is on the mean. If a Z-Score is equal to +1, it is 1 Standard Deviation above the mean. If a z-score is equal to +2, it is 2 Standard Deviations above the mean. If a z-score is equal to -1, it is 1 Standard Deviation below the mean. If a z-score is equal to -2, it is 2 Standard Deviations below the mean. In our example, your score was 98%. This raw score had a corresponding z-score of +1.24. The value of this z-score tells us that your raw score of 98% was 1.24 standard deviations away from the mean. Further, if we consider the positive sign, we can see that your raw score is 1.24 standard deviations above the group mean. This means that raw score of 98% is pretty darn good relative to the rest of the students in your class. 

Z between -2 and +2


95% of scores are going to be no more than 2 standard deviation units away from the mean. That means that most scores will fall between z=-2 to z=+2. However, some scores will be greater than the absolute value of 2. You can interpret these scores to be very far from the mean

The second session of today started at 3:30 P.M. And the major agenda was to know about bubble graph & how to use the bubble graph?

Lets get to know what is a Bubble Graph?

bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1v2v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size. Bubble charts can facilitate the understanding of social, economic, medical, and other scientific relationships.
Bubble charts can be considered a variation of the scatter plot, in which the data points are replaced with bubbles. As the documentation forMicrosoft Office explains, "this type of chart can be used instead of a Scatter chart if your data has three data series, each of which contains a set of values".
According to Berman (2007), bubble charts can "be used in project management to compare the risk and reward among projects. In a chart each project can be respresented by a bubble,the axis can represent the net present value and probability of success and the size of the bubble can represent the overall cost of the project".

Example: 

This bubble chart displays a fictitious project portfolio. Individual project bubbles are distinguished by their colors and patterns. The chart is divided into equal quadrants to identify relative project attractiveness. Larger bubbles in the upper left quadrant represent the most attractive projects while smaller bubbles in the lower right quadrant represent the least attractive projects. Bubbles with an "X" indicate that the bubble size represents a negative value for NPV. This chart was created using Bubble Chart Pro™ software.


Written By: Nitesh Singh Patel

Team No.: 7

Team Members: 1) Nidhi
                          2) Nitin Boratwar
                          3) Palak Jain
                          4) Pallavi Bizoara


No comments:

Post a Comment