Statistic
Testing a claim
confidence intervals and two sided test
confidence interval cannot be used in place of a significance test for one sided test
z test form the population mean when sd is known
statiscal significance
significance level
significance test
p value
test statistics
hypothesis
alternative
null
Estimating with confidence
when population sd is known
independence
normality
srs
when population sd is unkown
t distribution,one sample
confidence level C
confidence interval:statistic+,-marginal error
Sampling distribution
bias and variability
variability decribed by spread
larger samples give smaller spread
determined by sampling design
unbiased if u=x
sample proportions
distributions of values taken by the statistic in all possible samples of the same size from the same populations
sample mean
sampling variability
statistic
parameters
Binomial & Geometric distribution
geometric distribution
mean and standard deviation
P(x=n)=(1-p)^n-1p
binomial distribution
nomal approximation
condition
representations
mean and standrad deviation
formulas
B(n,p)
conditions
continuity correction
Bernoulli distribution
x= success1,x=0 failure
two outcomes of interest
random phenomena
More about relationships between two variables
relations in categorical data
simpson's paradox,anassociation holds for all groups
two way table
conditional distribution
entry/row total
entry/column total
marginal distribution
column sums
row sums
establishing causation
criteria for causation
the alleged cause precedes the effect in time
strong association
large values of y
the alleged cause is plausible
consistent association
confounding effect z~y,x?y
common response z~x,y
causation usually from experiment x~y
transforming the variable
power law mode
take logarithm of both sides,lny=lna+plnx
y=ax^p
exponential growth
increase by a fixed percent
lny=lna+xlnb
linear growth
increased by fix amount
y=ab^x
Examing relationship
assessing model quality
residual plot
no obvious pattern
mean of residuals is always 0
assess how well the regression line fits the data
residualsagainst y
coefficient of determination
lurking variable, neither x nor y, but influence the interpretation of relationship among x and y
regression line
extrapolation: predict outside the range of values of x,not accurate
predict y
r^2 percent of variation in y can be explained by the least squares regression line relating y&x
line passes through
correlation
away from 0 to +,-1,relation gets stronger
r has a value of 1, r=+,-1,perfect straight line relation
r,measure direction and strength of linear relationship
scatterplots
different colors or symbols for categories
direction, form, strength, overall pattern, association (+,-),linear,outliers
data
explanatory variable y, response variable x
categorical or quantitative
Describing location in a distribution
density curve
median: balance point
mean:equal areas point
area under the curve
proportion
total is 1
nomal distribution
standard normal distribution
N(0,1)
no shape change for linear transformation
probability density function
empirical rule,N()
99.7% fall within of
95% fall within of
68% fall within of
symmetric,unimodel and bell-shaped
assessing normality
normal probability plot, linear/straight line
proportion of observation, empirical rule
graphical display-bell shaped
meausre of relative standard
percentile (less than or equal to)
z-score
chebyshev's inequality (the distribution most be skewed,100(1-1/k^2)
Exploring Data
Comparing distribution
quantitative values
side by side boxplots
back to back stemplots
categorical data: side by side bar graph
Changing uni of measure
linear transformation x: y=ax+b
IQR bR
standard deviation bs
median a+bM
mean a+bx
Describing graphical displays
Mean and standard deviation (for symmetric distribution, free of outliers)
standard deviation
spread,outliers & skeweness
always positive or 0
variance
Five number summary: box plot (for skewed distribution)
outliers: Q1-1.5IQR,Q3+1.5IQR
Minimum
Maximum
range IQR
median
Q3: 75%
Q1: 25%
shape
bell shaped (inverted bell)
uniformed
skewed
symmetric
mode,center,spread,clusters,gaps,outliers
Display
Tree plot
time on the horizontal axis
variable on vertical axis
Quantitative data
ogive
cululative frequency
histogram
relative frequency
frequency
stem plot
trimming
splitting stem
back to back
Categorical data
dotplot
bar chart
Pie chart