Tuesday, 2 July 2013

Business Statistics_01_07_2013

                                                             

We as future managers are supposed and expected to be quick decision maker . But even more important would be that we make a correct decision . Be it a product launch pricing policies , market expansion or acquiring assets, decisions must be made quickly and with a clear mind . Gone are the days when these decisions were made purely  based on intuitions .
In today's world where sheer amount of data and scale of operations is mind boggling , we need some tools to aid us in making these decisions .This is where Business statistics steps in.

Statistics has been described as "the science of learning from data".(http://www.buseco.monash.edu.au/ebs/about/busstats.php )

Elaborating on above definition and adding what we learned in the first session to it we can define business statistics as a science where we collect data , process it into meaningful information then analyze and observe this data . Finally we try to find patterns and with the help of these patterns make an informed decision.

Our session started with focusing mainly on observations and things that we take for granted. We learned that even most obvious looking things have some interesting observations attached with them such as orientation of roman numerals in a wrist watch .

There are two main components in any statistics experiment

1 ) Cases : Cases are entities which are described as a set of data . Cases are made up of many variables.

2 ) Variable : is a characteristic of case which is under observation . A variable can take more than one value. Variables don't have to differ necessarily but they can, depending on nature of the characteristic.

Variables can be divided in two parts :

1 ) Continuous : Variables which can take any value is termed as continuous variables . These can be further divided in two parts :
 
   a) Continuous continuous : variables which can also take any value between two discreet values can be termed as continuous continuous variables
    ex : Time take for water to boil
  b)  Continuous Discreet : Variables which can only take two discreet values are termed as Continuous discreet variables .
   ex : Age in years .
   

2 ) Categorical  : Categorical variable is a kind of variable which can only take fixed number of values such as Gender , marital status etc.


There are three levels of data measurement in business statistics :

1 ) Nominal Level : It is the lowest level of data measurement . These numbers don't have any meaning . These can only be used to classify or categorize .
    ex : Employee ID

2 ) Ordinal Level :  It is the second level of data measurement . These numbers can be used to rank or order objects

  Note - Nominal and ordinal data are non metric data and sometimes also called as qualitative data. ( Quoted from book )

3 ) Internal Level : This is the second highest level and here distance between consecutive numbers have some interpretation . Zero is not fixed here .

4 ) Ratio Level : This is the highest level of data measurement . It is very similar to Interval level but here Zero is fixed . There is an absolute zero.
    Zero value in data represents absence of that characteristic . ( quoted from the book )




There can be three kinds of data analysis based on the no. of variables under observations :

1 ) Univariate : As the name suggests there is only one variable included in analysis here . Main purpose of univariate is to describe rather that finding a relationship .
    There are many ways to represent univariate data , some of them are :

    a ) Frequency tables
    b ) Bar Graphs
    c ) Pie Charts

2 ) Bivariate : Here two variables are included in the analysis . Bivariate data is very useful in establishing relationships between two different variables .

    Bivariate data can be presented in the following ways :

    a ) Cross Tabulation : it could be described as a table which simultaneously displays frequency counts of two variables ( quoted from book ) .

    b ) Scatter plot : it is a two dimensional graph plot of points from two numerical variables . ( quoted from book )


3 ) Multivariate : Multivariate data analysis is used when we have to observe and analyze data originating from more than three variables . in today's age of big data
multivariate analysis becomes very important to find complex patterns such as quality control , process optimization etc.

     Multivariate data analysis is a recent development in business statistics . its also very complex . One tool used in multivariate analysis is :

     OLAP ( Online Analytical Processing ) : It is an automated system which can be used in analyzing different patterns arising from multi-dimensional data stored in any database. Main component of OLAP is OLAP server which acts as an interface between client and DBMS. (http://www.webopedia.com/TERM/O/OLAP.html )


Data can also be differentiated on the basis of it's source :

1 ) Primary Data : When data is collected by the researcher himself , its termed as primary data . Normally primary data is collected when using quantitative methods .

2 ) Secondary Data : Data which originates from somewhere else or is collected by someone else but could be used by the researcher in his research is termed as secondary data . Secondary data is usually collected when using qualitative methods.

Ex : Government census , Other professional research companies etc.   (http://socialscience.stow.ac.uk/psychology/psych_A/george/primary_secondary.htm )


Steps for data analysis :

1 ) Analysis : This is the first step which comes once data processing is done . It involves trying to visually plot different combination of variables in order to find the
one relationship which we finally want .

2 ) Observation : Once we have identified the correct variables to plot next step is to observe and search for patterns which might not be very obvious . Many times this step
lead to repetition of first step in order to get better variable combinations.

3 ) Interpretation : When we can see a clear pattern visible we have to interpret its meaning . We have to find out whether it is the problem or just the symptoms . if its the problem but
there is some other factors causing that pattern we will have to keep repeating above steps , each time digging deeper also termed as data mining .

4 ) Strategy : Once we have the problem clearly identified its finally time to find a solution . We have to plan a way to handle that problem but also have to be careful not to raise new problems.




After doing a bit of theory part we moved on to practical application , We used SPSS  ( Statistical Product and Service Solutions ) . Major advantage SPSS has over other tools is that it has
a visual interface . We analyzed some real market data from telecommunication field . It was data of various usage habits of consumers of different carriers in different regions.
First  we processed a frequency table which had details about usage of fixed bill component in various cities . We thought of various hypothesis es and cross checked them with available data.
 


       



After frequency table we plotted histograms of the same data in order to visually see and find patterns . We clearly saw that there was a sharp rise in number of consumers paying Rs 50 as fixed component.





Once we established that , it was time to find who were the outliers and for that we used Box plot .
Box plot includes a median line and two whiskers . Each of its four parts represent 25 % and outliers are clearly shown as a circle or a * .


Submitted by  :

Priyesh Bhadauriya 2013214

Group members :

Nikita Agarwal 2013171
Nimisha Agarwal 2013173
Parth Mehta 2013193
Nihal Moidu


   










1 comment: