Saturday, 20 July 2013

Encounter With Statistics ( Sessions 7 & 8 ) 19th July,2013


http://www.mathsisfun.com/data/images/normal-distribution-skew-right.gif 

               The Class started with the introduction of PERMAP which was alien initially but as the session progressed we understood the significance of this software. 


PERMAP is an interactive program for making perceptual maps. The program, PERMAP, uses conventional metric multidimensional scaling techniques. That is, it uses pairwise numerical values (correlations, proximities, dissimilarities, etc.) to construct a map showing the relationship between objects. A unique feature of PERMAP is that it embeds the mapping techniques in an interactive, graphical system that minimizes several difficulties associated with multidimensional scaling practices. It is particularly effective at exposing artefacts due to local minima, incomplete convergence, and the effects of outliers. It can associate various attributes with the resultant groupings and provide line-linking options to help the researcher identify the nature of perceived relationships. Problems involving multiple matrices can be treated using three different aggregation methods. The optional use of weighting factors is available.


Its fundamental purpose is to uncover any "hidden structure" that might be residing in a      complex data set. PERMAP takes object-to-object proximity values (similarities, dissimilarities, correlations, distances, interactions, psychological distances, dependencies, confusability, preferences, joint or conditional probabilities, etc.), or up to      30 object attribute values, and uses multidimensional scaling (MDS) to make a map that shows the relationships between the objects. Succinctly, it makes classical metric and nonmetric MDS analyses in one, two, three or eight dimensions, for one-mode two-way or two-mode two-way data, with up to 1000 objects and with missing values allowed. In addition, it can make several new types of MDS analyses involving error bounds or boundary conditions and it can show the effect of degrading the similarity information.

Then we were introduced the term Z Scores 

Z-Scores:

Sometimes we want to do more than summarize a bunch of scores. Sometimes we want to talk about particular scores within the bunch. We may want to tell other people about whether or not a score is above or below average. We may want to tell other people how far away a particular score is from average.      We might also want to compare scores from different bunches of data. We will want to know which score is better. Z-scores can help with all of this.

Z-Scores tell us whether a particular score is equal to the mean, below the mean or above the mean of a bunch of scores. They can also tell us how far a particular score is away from the mean. Is a particular score close to the mean or far away?

Interpreting Z Scores 

If a Z-Score….

  1. Has a value of 0, it is equal to the group mean.
  2. Is positive, it is above the group mean.
  3. Is negative, it is below the group mean.
  4. Is equal to +1, it is 1 Standard Deviation above the mean.
  5. Is equal to +2, it is 2 Standard Deviations above the mean.
  6. Is equal to -1, it is 1 Standard Deviation below the mean.
  7. Is equal to -2, it is 2 Standard Deviations below the mean.

Z-Scores can help us understand how typical a particular score is within bunch of scores.If data are normally distributed, approximately 95% of the data should have Z-score between -2 and +2. Z-scores that do not fall within this range may be less typical of the data in a bunch of scores.

Z-Scores can help us compare individual scores from different bunches of data. We can use Z-scores to standardize scores from different groups of data. Then we can compare raw scores from different bunches of data. 

Normal Distribution and Skew Distribution were the concepts we learnt next... 

Normal Distribution:

A probability distribution that plots all of its values in a symmetrical fashion and most of the results are situated around the probability's mean. Values are equally likely to plot either above or below the mean. Grouping takes place at values that are close to the mean and then tails off symmetrically away from the mean. 

The normal distribution is the most common type of distribution, and is often found in stock market analysis. Given enough observations within a sample size, it is reasonable to make the assumption that returns follow a normally distributed pattern, but this assumption can be disproved.

Data can be "distributed" (spread out) in different ways.

http://www.mathsisfun.com/data/images/normal-distribution-skew-left.gif

                                    It can be spread out more on the left or more on the right.


http://www.mathsisfun.com/data/images/normal-distribution-skew-right.gif

Or it can be all jumbled up 

http://www.mathsisfun.com/data/images/normal-distribution-random.gif                                                   


But there are many cases where the data tends to be around a central value with no bias left or right, and it  gets close to a "Normal Distribution" 
http://www.mathsisfun.com/data/images/normal-distribution-1.gif                                                                     And the yellow histogram shows some data that follows it closely, but not perfectly (which is usual).
It is often called a "Bell Curve" because it looks like a bell.

The Normal Distribution has:
mean = median = mode
50% of values less than the mean and 50% greater than the mean 

http://www.mathsisfun.com/data/images/normal-distribution-2.gif                                     






Data can be "skewed", meaning it tends to have a long tail on one side or the other.
       
       Negative Skew                          No Skew                         Positive Skew                        


Negative Skew 
It is called the Negative Skew because the long "tail" is on the negative side of the peak. It is referred as "skewed to the left" (the long tail is on the left hand side)
    

Positive Skew                                              

Positive skew is when the long tail is on the positive side of the peak, and it is referred as "skewed to the right”. The mean is on the right of the peak value.
    

A Normal Distribution is not skewed. The Normal Distribution has No Skew.

It is perfectly symmetrical.

And the Mean is exactly at the peak. 
      
Calculating Skewness
"Skewness" (the amount of skew) can be calculated, for example you could use the SKEW() function in Excel or Open Office Calc. 
Before the 7th Session ended we learnt concepts about Mean, Variance and Standard Deviation. 
Mean:
The mean is just the average of the numbers. It is easy to calculate: add up all the numbers, then divide by how many numbers there are.
Variance:
The Variance is defined as the average of the squared differences of the mean
To calculate the variance follow these steps:
  1. Work out the Mean (the simple average of the numbers)
  2. Then for each number: subtract the Mean and square the result (the squared difference).
  3. Then work out the average of those squared differences.
Standard Deviation:
The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ (the Greek letter sigma). The formula is easy: it is the square root of the Variance. 
 In the 8th Session we learnt about Bubble Graphs and developing Bubble Graphs using Excel. 

Bubble Graphs:

Bubble charts or bubble graphs are extremely useful graphs for comparing the relationships between data objects in 3 numeric-data dimensions: the X-axis data, the Y-axis data, and data represented by the bubble size. Essentially, bubble charts are like XY scatter graphs except that each point on the scatter graph has an additional data value associated with it that is represented by the size of a circle or “bubble” centered on the XY point.

  

Bubble Chart: This chart shows the relationship between “Profit” (Y-Axis), “Cost” (X-Axis), and “Probability of Success (%)” (Bubble Size).



Bubble Charts in Business:

Bubble charts are often used in business to visualize the relationships between projects or investment alternatives in dimensions such as cost, value, and risk. By visualizing project portfolios using bubble charts, you can find clusters of relatively attractive projects in one area of the graph, such as areas of high value, low cost, and/or low risk, and compare them with relatively less attractive projects in a different area of the graph, such as an area of low value, high cost, and/or high risk.


Differentiating Bubbles in Bubble Charts

Bubbles are usually differentiated by colour, pattern, number labels, or a combination of these. Colours are usually adequate for small numbers of bubbles, but subtle differences in colours become difficult to distinguish in larger number of projects. Therefore, numbers corresponding to a chart legend becomes a more useful method of distinguishing bubbles. 

The session was very interesting and ended with watching a video from TED where the presenter Hans Rolling explained when India and China would catch up with US and UK economy using Bubble Graphs..


Written by 
Pakala Kalyani 

Group Members:
1. Nishidh Vilas Lad
2. P.Priyatham Kireeti
3. Kartheeki
4. Priyadarshi Tandon 











No comments:

Post a Comment