Ordinary Least Squares (OLS): Simple Regression
The workhorse model: explain an outcome y
with one regressor x
via the line y = β₀ + β₁x + u
. Use it to summarize relationships and—under conditions—draw causal conclusions.
Adapted in part from the course slide deck (Chapter 2). [oai_citation:0‡Ch_02_PPT_2024.pdf](file-service://file-HeMyfNQ2oAuATbBL3YRDF3)
Model & Terms
- Dependent (response, regressand):
y
- Independent (explanatory, regressor):
x
- Intercept
β₀
, Slopeβ₁
, Erroru
x
changes the conditional mean of y
by β₁
.Causal Meaning
To read β₁
causally, we need the **zero conditional mean** / **conditional mean independence**: E[u | x] = 0
. That is, x
contains no information about the mean of omitted factors.
Example: schooling → wage. If education correlates with ability (unobserved), then E[u|x]≠0
and the simple slope is biased.
Population Regression Function (PRF)
Under E[u|x]=0
, the conditional mean is linear: E[y|x] = β₀ + β₁x
. The PRF traces the average of y
at each x
.
OLS = Least Squares
Fit the line that minimizes the **sum of squared residuals**: SSR(β₀,β₁)=∑(yᵢ−β₀−β₁xᵢ)²
. In the fitted sample, residuals sum to zero and are uncorrelated with x
.
Goodness-of-Fit
Decompose variation: SST = SSR + SSE
. The R² = 1 − SSE/SST
is the share of variation explained by the regression. High R²
≠ causal validity.
Functional Forms
- Semi-log (log-level):
log y = β₀ + β₁ x
→β₁
≈ %Δy per 1 unit ofx
. - Log-log:
log y = β₀ + β₁ log x
→β₁
is an elasticity (%Δy per 1%Δx).
Applied Examples
- CEO pay vs. firm performance (low R² can still matter).
- Wage vs. education (returns to schooling; beware ability bias).
- Consumption vs. income (high R²; not necessarily causal).
Standard Assumptions (SLR.1–SLR.5)
- SLR.1 Linearity in parameters
- SLR.2 Random sampling
- SLR.3 Sample variation in
x
- SLR.4 Zero conditional mean:
E[u|x]=0
(⇒ unbiasedness) - SLR.5 Homoskedasticity:
Var(u|x)=σ²
(for classic SE formulas)
If heteroskedasticity is present (Var(u|x)
varies with x
), use **robust SE**.
Sampling Variability & SEs
OLS estimates vary across samples. Variances: increase with noise in u
, decrease with more variation in x
and larger n
. Robust standard errors quantify precision without assuming homoskedasticity.
Tiny Worked Example
Suppose hourly wage on years of education yields wagê = 5.1 + 0.54·educ
with robust SE for 0.54 equal to 0.20.
- Interpretation: +1 year of education ↦ +0.54 in hourly wage (on average).
- 95% CI:
0.54 ± 1.96×0.20 = [0.16, 0.92]
. - Prediction at
educ=12
:5.1 + 0.54×12 = 11.58
(add prediction intervals in practice).
Stata (simple OLS, robust SE)
clear all * Example data sysuse auto, clear // or: use "mydata.dta", clear * Scatter + fit line (as in slides) twoway (scatter price mpg, ytitle("Price")) (lfit price mpg, legend(off)) * Simple OLS with robust SE regress price mpg, vce(robust) * Semi-log or log-log * gen ln_price = ln(price) * regress ln_price mpg, vce(robust) // semi-log: β ≈ %Δprice per 1 unit mpg * gen ln_mpg = ln(mpg) * regress ln_price ln_mpg, vce(robust) // log-log: β is elasticity
Where to find pieces in output: coefficients & SEs; SSE/SSR/SST and R²; residual diagnostics. (Replicates the slide workflow.)
Practice & Go Further
Move from simple to multiple regression; use robust SE; explore logs when relationships are nonlinear.