Ordinary Least Squares (OLS): Simple Regression
The workhorse model: explain an outcome y with one regressor x via the line y = β₀ + β₁x + u. Use it to summarize relationships and—under conditions—draw causal conclusions.
Adapted in part from the course slide deck.
Model & Terms
- Dependent (response, regressand):
y - Independent (explanatory, regressor):
x - Intercept
β₀, Slopeβ₁, Erroru
x changes the conditional mean of y by β₁.Causal Meaning
To read β₁ causally, we need the **zero conditional mean** / **conditional mean independence**: E[u | x] = 0. That is, x contains no information about the mean of omitted factors.
Example: schooling → wage. If education correlates with ability (unobserved), then E[u|x]≠0 and the simple slope is biased.
Population Regression Function (PRF)
Under E[u|x]=0, the conditional mean is linear: E[y|x] = β₀ + β₁x. The PRF traces the average of y at each x.
OLS = Least Squares
Fit the line that minimizes the **sum of squared residuals**: SSR(β₀,β₁)=∑(yᵢ−β₀−β₁xᵢ)². In the fitted sample, residuals sum to zero and are uncorrelated with x.
Goodness-of-Fit
Decompose variation: SST = SSR + SSE. The R² = 1 − SSE/SST is the share of variation explained by the regression. High R² ≠ causal validity.
Functional Forms
- Semi-log (log-level):
log y = β₀ + β₁ x→β₁≈ %Δy per 1 unit ofx. - Log-log:
log y = β₀ + β₁ log x→β₁is an elasticity (%Δy per 1%Δx).
Applied Examples
- CEO pay vs. firm performance (low R² can still matter).
- Wage vs. education (returns to schooling; beware ability bias).
- Consumption vs. income (high R²; not necessarily causal).
Standard Assumptions (SLR.1–SLR.5)
- SLR.1 Linearity in parameters
- SLR.2 Random sampling
- SLR.3 Sample variation in
x - SLR.4 Zero conditional mean:
E[u|x]=0(⇒ unbiasedness) - SLR.5 Homoskedasticity:
Var(u|x)=σ²(for classic SE formulas)
If heteroskedasticity is present (Var(u|x) varies with x), use **robust SE**.
Sampling Variability & SEs
OLS estimates vary across samples. Variances: increase with noise in u, decrease with more variation in x and larger n. Robust standard errors quantify precision without assuming homoskedasticity.
Tiny Worked Example
Suppose hourly wage on years of education yields wagê = 5.1 + 0.54·educ with robust SE for 0.54 equal to 0.20.
- Interpretation: +1 year of education ↦ +0.54 in hourly wage (on average).
- 95% CI:
0.54 ± 1.96×0.20 = [0.16, 0.92]. - Prediction at
educ=12:5.1 + 0.54×12 = 11.58(add prediction intervals in practice).
Stata (simple OLS, robust SE)
clear all
* Example data
sysuse auto, clear // or: use "mydata.dta", clear
* Scatter + fit line (as in slides)
twoway (scatter price mpg, ytitle("Price")) (lfit price mpg, legend(off))
* Simple OLS with robust SE
regress price mpg, vce(robust)
* Semi-log or log-log
* gen ln_price = ln(price)
* regress ln_price mpg, vce(robust) // semi-log: β ≈ %Δprice per 1 unit mpg
* gen ln_mpg = ln(mpg)
* regress ln_price ln_mpg, vce(robust) // log-log: β is elasticityWhere to find pieces in output: coefficients & SEs; SSE/SSR/SST and R²; residual diagnostics. (Replicates the slide workflow.)
Practice & Go Further
Move from simple to multiple regression; use robust SE; explore logs when relationships are nonlinear.