Statistics: gathering, organization, analysis, and presentation of numerical information

Raw Data: unprocessed information collected for a study

Variable: quantity being measured

Continuous Variable: any value within a given range

Discrete Variable: only certain separate values

Methods of Organization

Histogram: a special bar graph where areas are proportional to frequencies

When the number of measured values is large, data are usually grouped into:

Classes

make tables and graphs easier to construct and interpret

convenient to use from 5 to 20 equal intervals that cover the entire range

Range: the smallest to the largest value of the variable

Intervals

Bar Graph: a chart or diagram that represents quantities with horizontal or vertical bars whose lengths are proportional to the quantities

Frequency Polygon: plotted frequency vs. variable

Cumulative Frequency Graph: show the running total of frequencies from the lowest values up

Relative Frequency: table or diagram that shows the frequency of a data group as fraction or percent

Categorical Data: uses labels rather than numbers to illustrate data

Examples include circle graphs, pie charts, and pictographs

Indices: summarizing data and recognizing trends

Time-series graph: used to show how indices change over time

plots variable values vs. time and join the data points with straight lines.

Consumer Price Index: the most widely reported economic indices because it is an important measure of inflation

Inflation: a general increase in prices, which corresponds to a decrease in the value of money

Cost of Living Index: cost of maintaining a constant standard of living

Sampling: method of choosing specific individuals that are part of the population being studied

Population: all individuals belonging to a group being studied

Example: A population would be all the students in your school

Sampling Frame: group of individuals who actually have a chance of being sampled

Simple Random Sample: every member of the population has an equal chance of being selected

the selection of any particular individual does not affect the chances of any other individual being chosen

Systematic Sample: going through the population sequentially and select members at regular intervals

Interval = Population Size/Sample Size

Stratified Sample: population includes groups of members who share common characteristics

Gender, Age, or Education level

Cluster Sample: certain groups are likely to be representative the entire population

Multi-Stage Sample: uses several levels of random sampling

Voluntary-Response Sample: researcher
simply invites any member of the population to participate in the survey

Convenience Sample: sample is selected simply because it is easily accessible

Bias

Statistical Bias: any factor that favors certain outcomes or responses and hence systematically skews the survey results

Leading Questions: questions that prompt or encourages a desired answer

Loaded Questions: questions that contain wording or information intended to
influence the respondents’ answers

Sampling Bias: occurs when the sampling frame does not reflect the characteristics of the population

Non-response Bias: occurs when particular groups are under-represented in a survey because they choose not to participate

Measurement Bias: occurs when the data collection method consistently either under- or overestimates a characteristic of the population

Response Bias: occurs when participants in a survey deliberately give false or misleading answers

Measures of Central Tendency: different ways to find values around which a set of data tends to cluster

Mean: defined as the sum of the values of a variable divided by the number of values

Weighted Mean: gives a measure of central tendency that reflects the relative importance of the data

Median: the middle value of the data when they are ranked from highest to lowest

Mode: the value that occurs most frequently in a distribution

Outliers: are values distant from the majority of the data

measures of central tendency indicate
the central values of a set of data. Often,
you will also want to know how closely the
data cluster around these centres

Measures of Spread

Deviation: the difference between an individual value in a set of data and the mean for the data

Quartiles: divide a set of ordered data into four groups with equal numbers of values, just as the median divides data into two equally sized groups

Inquartile Range: the range of the middle half of the data

Box-and-Whisker Plot: illustrates these measures

Modified Box-and-Whisker Plot: often used when the data contain outliers

Semi-Interquartile Range: one half of the interquartile range

Percentiles: divide the data into 100 intervals that have equal numbers of values

Scatter Plots and Linear Correlation

Linear Correlation: when the independent and dependent variables are proportional

Perfect Negative (or inverse) Linear Correlation: if Y decreases at a constant rate as
X increases.

Independent Variable: a variable that affects a dependent variable

Perfect Positive (or direct) Linear Correlation: if Y increases at a constant rate as X increases

Dependent Variable: a variable that is affected by an independent variable

Scatter Plot: shows such relationships graphically, usually with the independent variable as the horizontal axis and the dependent variable as the vertical axis

Line of Best Fit: is the straight line that passes as close as possible to all of the points on a scatter plot

Regression: is an analytic technique for
determining the relationship between a dependent variable and an independent variable

Linear Regression

Interpolation

Least-Squares Fit: an analytic method that gives more accurate results for correlations

estimating between data points

Extrapolation

estimating beyond the range of the data

Non-Linear Regression

an analytical technique for finding a curve of best fit for data from such relationships

Exponential Regression: produces equations with the form y = ab x or y = ae kx

e = 2.718 28

Power Regression: the curve of best fit has an equation with the form y = axb

Polynomial Regression: analytic technique used for finding the polynomial equation that best models the relationship between two variables

Cause and Effect Relationships: a change in X produces a change in Y

Common-Cause Factor

an external variable causes two variables to change in the same way

Reverse Cause-and-Effect Relationship

the dependent and independent variables are reversed in the process of establishing causality

Accidental Relationship

a correlation exists without any causal relationship between variables

Presumed Relationship

a correlation does not seem to be accidental even though no cause-and-effect relationship or common-cause factor is apparent

Critical Analysis

Although the networks and major newspapers are reasonably careful about how they present statistics, you should be particularly careful about accepting statistical evidence from sources that could be biased

To judge the conclusions of a study properly, you need information about its sampling and analytical methods

statistics from some sources are sometimes flawed by unintentional or, occasionally, entirely deliberate bias

Hidden Variable

Lurking Variable

extraneous variables that are difficult to recognize

When evaluating claims based on statistical studies, you must assess the methods used for collecting and analyzing the data

- Is the sampling process free from intentional and unintentional bias?
- Could any outliers or extraneous variables influence the results?
- Are there any unusual patterns that suggest the presence of a hidden
variable?
- Has causality been inferred with only correlational evidence?