Categorieën: Alle - correlation - data - regression - confidence

door Chengxu Wang 15 jaren geleden

379

Statistic2

The text delves into the statistical analysis of relationships between variables, focusing on correlation and regression techniques. Correlation measures the direction and strength of a linear relationship, where values range from -1 to +1, indicating the strength and direction of the relationship.

Statistic2

Statistic

Testing a claim

confidence intervals and two sided test
confidence interval cannot be used in place of a significance test for one sided test
z test form the population mean when sd is known
statiscal significance
significance level
significance test
p value
test statistics
hypothesis

alternative

null

Estimating with confidence

when population sd is known

independence

normality

srs

when population sd is unkown
t distribution,one sample
confidence level C
confidence interval:statistic+,-marginal error

Sampling distribution

bias and variability
variability decribed by spread

larger samples give smaller spread

determined by sampling design

unbiased if u=x
sample proportions
distributions of values taken by the statistic in all possible samples of the same size from the same populations
sample mean
sampling variability
statistic
parameters

Binomial & Geometric distribution

geometric distribution
mean and standard deviation
P(x=n)=(1-p)^n-1p
binomial distribution
nomal approximation

condition

representations

mean and standrad deviation
formulas
B(n,p)
conditions
continuity correction
Bernoulli distribution
x= success1,x=0 failure
two outcomes of interest
random phenomena

More about relationships between two variables

relations in categorical data
simpson's paradox,anassociation holds for all groups
two way table

conditional distribution

entry/row total

entry/column total

marginal distribution

column sums

row sums

establishing causation
criteria for causation

the alleged cause precedes the effect in time

strong association

large values of y

the alleged cause is plausible

consistent association

confounding effect z~y,x?y
common response z~x,y
causation usually from experiment x~y
transforming the variable
power law mode

take logarithm of both sides,lny=lna+plnx

y=ax^p

exponential growth

increase by a fixed percent

lny=lna+xlnb

linear growth

increased by fix amount

y=ab^x

Examing relationship

assessing model quality
residual plot

no obvious pattern

mean of residuals is always 0

assess how well the regression line fits the data

residualsagainst y

coefficient of determination
lurking variable, neither x nor y, but influence the interpretation of relationship among x and y
regression line
extrapolation: predict outside the range of values of x,not accurate
predict y

r^2 percent of variation in y can be explained by the least squares regression line relating y&x

line passes through

correlation
away from 0 to +,-1,relation gets stronger
r has a value of 1, r=+,-1,perfect straight line relation
r,measure direction and strength of linear relationship
scatterplots
different colors or symbols for categories
direction, form, strength, overall pattern, association (+,-),linear,outliers
data
explanatory variable y, response variable x
categorical or quantitative

Describing location in a distribution

density curve
median: balance point
mean:equal areas point
area under the curve

proportion

total is 1

nomal distribution
standard normal distribution

N(0,1)

no shape change for linear transformation

probability density function
empirical rule,N()

99.7% fall within of

95% fall within of

68% fall within of

symmetric,unimodel and bell-shaped
assessing normality
normal probability plot, linear/straight line
proportion of observation, empirical rule
graphical display-bell shaped
meausre of relative standard
percentile (less than or equal to)
z-score
chebyshev's inequality (the distribution most be skewed,100(1-1/k^2)

Exploring Data

Comparing distribution
quantitative values

side by side boxplots

back to back stemplots

categorical data: side by side bar graph
Changing uni of measure
linear transformation x: y=ax+b

IQR bR

standard deviation bs

median a+bM

mean a+bx

Describing graphical displays
Mean and standard deviation (for symmetric distribution, free of outliers)

standard deviation

spread,outliers & skeweness

always positive or 0

variance

Five number summary: box plot (for skewed distribution)

outliers: Q1-1.5IQR,Q3+1.5IQR

Minimum

Maximum

range IQR

median

Q3: 75%

Q1: 25%

shape

bell shaped (inverted bell)

uniformed

skewed

symmetric

mode,center,spread,clusters,gaps,outliers
Display
Tree plot

time on the horizontal axis

variable on vertical axis

Quantitative data

ogive

cululative frequency

histogram

relative frequency

frequency

stem plot

trimming

splitting stem

back to back

Categorical data

dotplot

bar chart

Pie chart