1.
Sampling frame:
It’s a list of respondents. Sampling frame (synonyms: "sample
frame", "survey frame") is the actual set of units from which a
sample has been drawn: in the case of a simple random sample, all units from
the sampling frame have an equal chance to be drawn and to occur in the sample.
In the ideal case, the sampling frame should coincide with the population of
interest. Consider, for example, a survey aimed at establishing the number of potential customers for a new service in the population of New York City. The research team has drawn 1000 numbers at random from a telephone directory for the city, made 200 calls each day from Monday to Friday from 8am to 5pm and asked some questions.
In this example, population of interest is all inhabitants of the city; the sampling frame includes only those New Your City dwellers who satisfy all the following conditions:
·
has a telephone;
·
the telephone number is included in the directory;
·
likely to be at home from 8am to 5pm from Monday
to Friday;
·
not a person who refuses to answer all telephone
surveys.
2.
Probability sampling:
Probability
samples are selected in such a way as to be representative of the population.
They provide the most valid or credible results because they reflect the
characteristics of the population from which they are selected (e.g., residents
of a particular community, students at an elementary school, etc.). There are
two types of probability samples: random and stratified.
2.1 Random sample
The
term random has a very precise meaning. Each individual in the population
of interest has an equal likelihood of selection. This is a very strict
meaning -- you can't just collect responses on the street and have a random
sample.
|
The
assumption of an equal chance of selection means that sources such as a
telephone book or voter registration lists are not adequate for providing a
random sample of a community. In both these cases there will be a number of
residents whose names are not listed. Telephone surveys get around this problem
by random-digit dialling -- but that assumes that everyone in the population
has a telephone. The key to random selection is that there is no bias involved
in the selection of the sample. Any variation between the sample
characteristics and the population characteristics is only a matter of chance.
2.2 Stratified sample
Convenience
sampling is typically only justified if the researcher wants to study the
characteristics of people passing by the street corner at a certain point in
time, for example. It can also be used if other sampling methods are not
possible. The researcher must also take caution to not use results from a
convenience sample to generalize to a wider population.
4. Cluster
sampling: Cluster sampling
is a sampling technique where the entire population is divided into groups, or
clusters, and a random sample of these clusters are selected. All observations
in the selected clusters are included in the sample.
Cluster
sampling is typically used when the researcher cannot get a complete list of
the members of a population they wish to study but can get a complete list of
groups or 'clusters' of the population. It is also used when a random sample
would produce a list of subjects so widely scattered that surveying them would
prove to be far too expensive, for example, people who live in different postal
districts in the UK.
This
sampling technique may well be more practical and/or economical than simple
random sampling or stratified sampling.
Example:
Suppose
that the Department of Agriculture wishes to investigate the use of pesticides
by farmers in England. A cluster sample could be taken by identifying the
different counties in England as clusters. A sample of these counties
(clusters) would then be chosen at random, so all farmers in those counties
selected would be included in the sample. It can be seen here then that it is
easier to visit several farmers in the same county than it is to travel to each
farm in a random sample to observe the use of pesticides.
Psychology:
A
psychologist wants to explore levels of stress in farmers in England around the
time of the foot and mouth outbreak. A cluster sample could be taken by
identifying the different counties in England as clusters. A sample of these
counties (clusters) would then be chosen at random, so that all farmers in
those counties selected would be included in the sample. It can be seen here
that it is easier to visit several farmers in the same county than it is to
travel to each farm in a random sample to observe the levels of stress in
farmers.
5.
Snowball references:
A snowball sample is a non-probability sampling technique that is appropriate
to use in research when the members of a population are difficult to locate. A
snowball sample is one in which the researcher collects data on the few members
of the target population he or she can locate, then asks those individuals to
provide information needed to locate other members of that population whom they
know.
6. Benford’s
law: A phenomenological law also called the first digit law, first
digit phenomenon, or leading digit phenomenon. Benford's law states that in
listings, tables of statistics, etc., the digit 1 tends to occur with probability , much greater than the
expected 11.1% (i.e., one digit out of 9). Benford's law can be observed, for
instance, by examining tables of logarithms
and noting that the first pages are much more worn and smudged than
later pages (Newcomb 1881). While Benford's law unquestionably applies to many
situations in the real world, a satisfactory explanation has been given only
recently through the work of Hill (1998).
Benford's
law was used by the character Charlie Eppes as an analogy to help solve a
series of high burglaries in the Season 2 "The Running Man" episode (2006) of the television
crime drama NUMB3RS.
Benford's
law applies to data that are not dimensionless, so the numerical values
of the data depend on the units. If there exists a universal probability
distribution over such numbers, then
it must be invariant under a change of scale, so
(2)
|
having
solution . Although this is
not a proper probability distribution (since it diverges), both the laws of
physics and human convention impose cutoffs. For example, randomly selected
street addresses obey something close to Benford's law.
If many
powers of 10 lie between the cutoffs, then the probability that the first
(decimal) digit is is given by a logarithmic
distribution
for D=1, ..., 9, illustrated
above and tabulated below.
D
|
Pd
|
D
|
Pd
|
1
|
0.30103
|
6
|
0.0669468
|
2
|
0.176091
|
7
|
0.0579919
|
3
|
0.124939
|
8
|
0.0511525
|
4
|
0.09691
|
9
|
0.0457575
|
5
|
0.0791812
|
However,
Benford's law applies not only to scale-invariant data, but also to numbers
chosen from a variety of different sources. Explaining this fact requires a
more rigorous investigation of central limit-like theorems for the mantissas of random variables under multiplication.
As the number of variables increases, the density function approaches that of
the above logarithmic distribution. Hill (1998) rigorously demonstrated that
the "distribution of distributions" given by random samples taken
from a variety of different distributions is, in fact, Benford's law
(Matthews).
Written By: Nikita Agarwal 2013171
Group Members:
Priyesh Bhadauriya
Nihal Moidu
Parth Mehta
Nimisha Agarwal
No comments:
Post a Comment