Applied Business Statistics: Session 13 and 14 (14 August 2013)

1. Sampling frame:

It’s a list of respondents. Sampling frame (synonyms: "sample frame", "survey frame") is the actual set of units from which a sample has been drawn: in the case of a simple random sample, all units from the sampling frame have an equal chance to be drawn and to occur in the sample. In the ideal case, the sampling frame should coincide with the population of interest.
Consider, for example, a survey aimed at establishing the number of potential customers for a new service in the population of New York City. The research team has drawn 1000 numbers at random from a telephone directory for the city, made 200 calls each day from Monday to Friday from 8am to 5pm and asked some questions.
In this example, population of interest is all inhabitants of the city; the sampling frame includes only those New Your City dwellers who satisfy all the following conditions:

· has a telephone;

· the telephone number is included in the directory;

· likely to be at home from 8am to 5pm from Monday to Friday;

· not a person who refuses to answer all telephone surveys.

2. Probability sampling:

Probability samples are selected in such a way as to be representative of the population. They provide the most valid or credible results because they reflect the characteristics of the population from which they are selected (e.g., residents of a particular community, students at an elementary school, etc.). There are two types of probability samples: random and stratified.

2.1 Random sample

The term random has a very precise meaning. Each individual in the population of interest has an equal likelihood of selection. This is a very strict meaning -- you can't just collect responses on the street and have a random sample.

The assumption of an equal chance of selection means that sources such as a telephone book or voter registration lists are not adequate for providing a random sample of a community. In both these cases there will be a number of residents whose names are not listed. Telephone surveys get around this problem by random-digit dialling -- but that assumes that everyone in the population has a telephone. The key to random selection is that there is no bias involved in the selection of the sample. Any variation between the sample characteristics and the population characteristics is only a matter of chance.

2.2 Stratified sample

Convenience sampling is typically only justified if the researcher wants to study the characteristics of people passing by the street corner at a certain point in time, for example. It can also be used if other sampling methods are not possible. The researcher must also take caution to not use results from a convenience sample to generalize to a wider population.

4. Cluster sampling: Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample.

Cluster sampling is typically used when the researcher cannot get a complete list of the members of a population they wish to study but can get a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, people who live in different postal districts in the UK.

This sampling technique may well be more practical and/or economical than simple random sampling or stratified sampling.

Example:

Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in England. A cluster sample could be taken by identifying the different counties in England as clusters. A sample of these counties (clusters) would then be chosen at random, so all farmers in those counties selected would be included in the sample. It can be seen here then that it is easier to visit several farmers in the same county than it is to travel to each farm in a random sample to observe the use of pesticides.

Psychology:

A psychologist wants to explore levels of stress in farmers in England around the time of the foot and mouth outbreak. A cluster sample could be taken by identifying the different counties in England as clusters. A sample of these counties (clusters) would then be chosen at random, so that all farmers in those counties selected would be included in the sample. It can be seen here that it is easier to visit several farmers in the same county than it is to travel to each farm in a random sample to observe the levels of stress in farmers.

5. Snowball references: A snowball sample is a non-probability sampling technique that is appropriate to use in research when the members of a population are difficult to locate. A snowball sample is one in which the researcher collects data on the few members of the target population he or she can locate, then asks those individuals to provide information needed to locate other members of that population whom they know.

6. Benford’s law: A phenomenological law also called the first digit law, first digit phenomenon, or leading digit phenomenon. Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability , much greater than the expected 11.1% (i.e., one digit out of 9). Benford's law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). While Benford's law unquestionably applies to many situations in the real world, a satisfactory explanation has been given only recently through the work of Hill (1998).

Benford's law was used by the character Charlie Eppes as an analogy to help solve a series of high burglaries in the Season 2 "The Running Man" episode (2006) of the television crime drama NUMB3RS.

Benford's law applies to data that are not dimensionless, so the numerical values of the data depend on the units. If there exists a universal probability distribution over such numbers, then it must be invariant under a change of scale, so

(1)

If ,

then
, and normalization implies . Differentiating with respect to K=1and setting gives

(2)

having solution . Although this is not a proper probability distribution (since it diverges), both the laws of physics and human convention impose cutoffs. For example, randomly selected street addresses obey something close to Benford's law.

If many powers of 10 lie between the cutoffs, then the probability that the first (decimal) digit is

is given by a logarithmic distribution

(3)

for D=1, ..., 9, illustrated above and tabulated below.

D	Pd	D	Pd
1	0.30103	6	0.0669468
2	0.176091	7	0.0579919
3	0.124939	8	0.0511525
4	0.09691	9	0.0457575
5	0.0791812

However, Benford's law applies not only to scale-invariant data, but also to numbers chosen from a variety of different sources. Explaining this fact requires a more rigorous investigation of central limit-like theorems for the mantissas of random variables under multiplication. As the number of variables increases, the density function approaches that of the above logarithmic distribution. Hill (1998) rigorously demonstrated that the "distribution of distributions" given by random samples taken from a variety of different distributions is, in fact, Benford's law (Matthews).

Written By: Nikita Agarwal 2013171

Group Members:

Priyesh Bhadauriya

Nihal Moidu

Parth Mehta

Nimisha Agarwal

Applied Business Statistics

Wednesday, 14 August 2013

Session 13 and 14 (14 August 2013)

No comments:

Post a Comment