Saturday, 31 August 2013

Lecture 17th & 18th

In the 17th and 18th session we covered important tools of statistics for our
 group project. Tools like T- test , Regression , Central Tendencies ( Mean , Median , Mode) etc.
Further Explained : -

T-Test:
A statistical examination of two population means. A two-sample t-test examines whether two samples are different and is commonly used when the variances
of two normal distributions are unknown and when an experiment uses a small
sample size. For example, a t-test could be used to compare the average floor
 routine score of the U.S. women's Olympic gymnastics team to the average
 floor routine score of China's women's team.
The test statistic in the t-test is known as the t-statistic. The t-test looks at the
t-statistic, t-distribution and degrees of freedom to determine a p value
(probability) that can be used to determine whether the population means
differ. The t-test is one of a number of hypothesis tests. To compare three or
 more variables, statisticians use an analysis of variance (ANOVA). If the
 sample size is large, they use a z-test. Other hypothesis tests include the chi-square test and f-test.


Regression:
A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables)
The two basic types of regression are linear regression and multiple regression.
Linear regression: Uses one independent variable to explain and/or predict the outcome of Y, while
Multiple regressions: Uses two or more independent variables to predict the outcome. The general form of each type of regression is:
Where:
Y= the variable that we are trying to predict
X= the variable that we are using to predict Y
a= the intercept
b= the slope
u= the regression residual.
In multiple regression the separate variables are differentiated by using sub scripted numbers.
Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points. Regression is often used to determine how much specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset


Central Tendencies:
An average value of any distribution of data that best represents the middle. Also called centrality. So we need to choose the measure which describes the most
appropriate
Central tendency measure viz. MEAN , MEDIAN , MODE

MEAN:
The mean is the average of the numbers. The mean is the average of the numbers: a calculated "central" value of a set of numbers. The mean is equal to the sum of all the values.
in the data set divided by the number of values in the data set.
** MEAN is to be used and is appropriate when the value of data does not have
Repetition nor extreme values because this won’t give good picture for the central
Value of the data set

MEDIAN:
The Median is the "middle number.The middle number in a sorted list of numbers. To determine the median value in a sequence of numbers, the numbers must first be arranged in value order from lowest to highest. If there is an odd amount of numbers, the median value is the number that is in the middle, with the same amount of numbers below and above. If there is an even amount of numbers in the list, the middle pair must be determined, added together and divided by two to find the median value. The median can be used to determine an approximate average.
** MEDIAN is appropriate when the set of data have outliers i.e the extreme values, Because using any other central tendency measure wont give you appropriate central value tendency of the data set.

MODE:
The mode is the most frequent score in our data set.
It refers to the most frequently occurring number found in a set of numbers. The mode is found by collecting and organising the data in order to count the frequency of each result. The result with the highest occurrences is the mode of the set.
**MODE is appropriate to be used when there is repetition of value in the data set.



Submitted     by :   Parth Mehta ( 2013193)

Group Members:
                               Nikita Agarwal  (2013171)
                               Nimisha Agarwal (2013173)
                               Nihal Moidu  (2013170)
                               Priyesh Bhadauriya (2013214)

No comments:

Post a Comment