Ordinary Least Squares (OLS): Simple Regression

The workhorse model: explain an outcome y with one regressor x via the line y = β₀ + β₁x + u. Use it to summarize relationships and—under conditions—draw causal conclusions.

Adapted in part from the course slide deck (Chapter 2). [oai_citation:0‡Ch_02_PPT_2024.pdf](file-service://file-HeMyfNQ2oAuATbBL3YRDF3)

Model & Terms

  • Dependent (response, regressand): y
  • Independent (explanatory, regressor): x
  • Intercept β₀, Slope β₁, Error u
Interpretation (holding other factors fixed): a 1-unit increase in x changes the conditional mean of y by β₁.

Causal Meaning

To read β₁ causally, we need the **zero conditional mean** / **conditional mean independence**: E[u | x] = 0. That is, x contains no information about the mean of omitted factors.

Example: schooling → wage. If education correlates with ability (unobserved), then E[u|x]≠0 and the simple slope is biased.

Population Regression Function (PRF)

Under E[u|x]=0, the conditional mean is linear: E[y|x] = β₀ + β₁x. The PRF traces the average of y at each x.

slope β₁ = Cov(x, y) / Var(x);   intercept β₀ = ȳ − β₁x̄

OLS = Least Squares

Fit the line that minimizes the **sum of squared residuals**: SSR(β₀,β₁)=∑(yᵢ−β₀−β₁xᵢ)². In the fitted sample, residuals sum to zero and are uncorrelated with x.

Goodness-of-Fit

Decompose variation: SST = SSR + SSE. The R² = 1 − SSE/SST is the share of variation explained by the regression. High ≠ causal validity.

Functional Forms

  • Semi-log (log-level): log y = β₀ + β₁ xβ₁ ≈ %Δy per 1 unit of x.
  • Log-log: log y = β₀ + β₁ log xβ₁ is an elasticity (%Δy per 1%Δx).

Applied Examples

  • CEO pay vs. firm performance (low R² can still matter).
  • Wage vs. education (returns to schooling; beware ability bias).
  • Consumption vs. income (high R²; not necessarily causal).

Standard Assumptions (SLR.1–SLR.5)

  • SLR.1 Linearity in parameters
  • SLR.2 Random sampling
  • SLR.3 Sample variation in x
  • SLR.4 Zero conditional mean: E[u|x]=0 (⇒ unbiasedness)
  • SLR.5 Homoskedasticity: Var(u|x)=σ² (for classic SE formulas)

If heteroskedasticity is present (Var(u|x) varies with x), use **robust SE**.

Sampling Variability & SEs

OLS estimates vary across samples. Variances: increase with noise in u, decrease with more variation in x and larger n. Robust standard errors quantify precision without assuming homoskedasticity.

Tiny Worked Example

Suppose hourly wage on years of education yields wagê = 5.1 + 0.54·educ with robust SE for 0.54 equal to 0.20.

  • Interpretation: +1 year of education ↦ +0.54 in hourly wage (on average).
  • 95% CI: 0.54 ± 1.96×0.20 = [0.16, 0.92].
  • Prediction at educ=12: 5.1 + 0.54×12 = 11.58 (add prediction intervals in practice).

Stata (simple OLS, robust SE)

clear all
* Example data
sysuse auto, clear    // or: use "mydata.dta", clear

* Scatter + fit line (as in slides)
twoway (scatter price mpg, ytitle("Price")) (lfit price mpg, legend(off))

* Simple OLS with robust SE
regress price mpg, vce(robust)

* Semi-log or log-log
* gen ln_price = ln(price)
* regress ln_price mpg, vce(robust)   // semi-log: β ≈ %Δprice per 1 unit mpg
* gen ln_mpg = ln(mpg)
* regress ln_price ln_mpg, vce(robust)  // log-log: β is elasticity

Where to find pieces in output: coefficients & SEs; SSE/SSR/SST and R²; residual diagnostics. (Replicates the slide workflow.)

Practice & Go Further

Move from simple to multiple regression; use robust SE; explore logs when relationships are nonlinear.