Binary Outcome Link Functions

An interactive exploration of logit, probit, complementary log-log, and why choosing the right link matters (or not)

BIOS 910  ·  Department of Biostatistics & Data Science  ·  University of Kansas Medical Center

What Is a Link Function?

Generalized Linear Models (GLMs) extend linear regression to outcomes that aren't continuous and symmetric. For a binary outcome \(Y \in \{0,1\}\), we model the probability \(p = P(Y=1 \mid \mathbf{X})\) by connecting it to a linear predictor through a link function \(g\):

\[ g(p) = \eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k \]

The link function \(g\) maps the probability (which lives in \([0,1]\)) to the entire real line (where the linear predictor lives). Different choices of \(g\) produce different models with different assumptions, interpretations, and behaviors. These behaviors are most pronounced at the tails of the probability distribution.

Four Link Options
NameLink \(g(p)\)Inverse \(g^{-1}(\eta)\)
Logit \(\log\!\left(\frac{p}{1-p}\right)\) \(\frac{1}{1+e^{-\eta}}\)
Probit \(\Phi^{-1}(p)\) \(\Phi(\eta)\)
Cloglog \(\log(-\log(1-p))\) \(1 - e^{-e^\eta}\)
Cauchit \(\tan(\pi(p - \tfrac{1}{2}))\) \(\frac{1}{2} + \frac{\arctan(\eta)}{\pi}\)
Key Question

The logit is by far the most popular because its coefficients exponentiate to odds ratios — a familiar quantity. But odds ratios aren't always the right "lens" for every data type. When is it worth giving up this interpretability?

The answer almost always comes down to what happens at the tails (very small or very large probabilities) and what the underlying process actually looks like.

Comparing the Link Functions

All four links map the linear predictor \(\eta\) to a probability. They look similar in the middle — but diverge substantially near 0 and 1. Use the controls below to explore how changing the intercept \(\beta_0\) and slope \(\beta_1\) (where \(\eta = \beta_0 + \beta_1 X\)) moves and stretches the curves. Hover over the chart to read off exact values for each model.

X-axis: covariate X (range −5 to 5). Y-axis: P(Y=1 | X). The linear predictor is η = β₀ + β₁·X.

Notice: In the center of the curve (probabilities between ~0.1 and ~0.9), logit and probit are nearly indistinguishable. The differences become dramatic as you zoom toward the tails. Cloglog is asymmetric — it reaches high probabilities faster than it reaches low ones. Cauchit has extremely heavy tails.

Tolerance Framework

One of the most intuitive ways to understand link functions is through the individual tolerance model. Imagine each subject has an unknown threshold \(T_i\) — the dose at which they will experience the event. We observe \(Y_i = 1\) when the stimulus \(x\) exceeds that threshold.

\[ P(Y=1 \mid x) = P(T \le x) = F(x) \]

The distribution of tolerances across individuals informs the link function:

Distribution → Link Function
Tolerance distributionLink function
NormalProbit
LogisticLogit
Minimum extreme value (Gumbel)Cloglog
CauchyCauchit
Why Does This Matter?

The tolerance model gives you a justification for choosing a link function. Instead of asking "which link fits the data?", ask: "what distribution do individual thresholds follow in this population?" If thresholds are plausibly normally distributed (as in many psychophysical and genetic settings), the probit is the correct model — even if you can't report odds ratios.

Real-World Tolerance Examples

Each distribution shape arises naturally from a specific kind of biological or physical process. Click a row to see a concrete example where individual thresholds plausibly follow that distribution.

Normal → Probit Auditory detection threshold in psychophysics

In a hearing test, each person has a threshold \(T_i\) — the faintest tone they can detect. Because threshold is determined by many small, additive biological factors (cochlear hair-cell density, neural firing rate, attention), the central limit theorem drives the population distribution of \(T_i\) toward normal.

The probit model is therefore the natural choice: \(P(\text{detect} \mid x) = \Phi\!\left(\frac{x - \mu}{\sigma}\right)\), where \(x\) is sound intensity in dB. The same logic applies to LD50 dose-response curves in toxicology, where lethal thresholds across animals are approximately normally distributed.

Logistic → Logit Blood-pressure threshold for hypertensive crisis

Suppose each patient has a threshold systolic pressure \(T_i\) above which they experience a hypertensive crisis. If tolerance is shaped by a multiplicative cascade of many physiological odds — kidney function, vascular stiffness, renin–angiotensin activity — then the log-odds of exceeding the threshold accumulates additively, producing a logistic distribution for \(T_i\).

The logistic distribution is nearly identical to the normal but has slightly heavier tails, making it more forgiving of patients with unusually high or low baseline tolerance. This is why logistic regression is a sensible default when no strong mechanistic argument favors probit.

Gumbel (min) → Cloglog Infection onset after pathogen exposure

After exposure to a pathogen, infection occurs when at least one viable organism successfully evades immune defenses. If there are many independent chances for infection (each virion, each exposure site), the event occurs at the minimum of many independent random thresholds — a classic extreme-value (Gumbel min) scenario.

This produces a right-skewed tolerance distribution: most people are infected at low-to-moderate doses, but a long tail of highly resistant individuals exists. The complementary log-log link captures this asymmetry — it rises sharply at low doses and approaches 1 slowly — matching the biology of infectious disease dose-response.

Cauchy → Cauchit Tumor drug sensitivity in targeted oncology

In targeted cancer therapy, most tumors respond at moderate drug concentrations, but a small fraction are dramatically hypersensitive (e.g., activating mutations) while others are completely resistant (e.g., bypass pathway activation). This bimodal extreme-outlier pattern produces very heavy tails in the tolerance distribution.

The Cauchy distribution — which has undefined mean and variance — captures exactly this shape. The cauchit link is rarely the first choice in practice because the heavy tails make estimates sensitive to extreme observations, but it is the appropriate model when you have mechanistic reason to expect a non-negligible fraction of extreme responders far from the population center.

Interactive: Visualizing Individual Tolerances

The curve below shows the distribution of individual thresholds in the population. The vertical line is the current stimulus level. The shaded area = P(Y=1) — the proportion of people whose threshold lies below the stimulus.

Note: the Gumbel distribution is right-skewed (its median ≈ −0.37), which is why the cloglog model is asymmetric — it approaches 1 faster than it approaches 0. The Cauchy distribution has very heavy tails, making cauchit sensitive to extreme observations.

The Interpretability Tradeoff

The logit link's main advantage is concrete coefficient interpretation. When you exponentiate a logit coefficient, you get an odds ratio: a one-unit increase in X multiplies the odds of Y=1 by \(e^{\beta}\). Clinicians and epidemiologists recognize this quantity immediately.

So why would anyone give that up?

What the Probit Coefficient Means

In the probit model, \(\beta\) has a natural interpretation in the tolerance / latent variable framework: a one-unit increase in X shifts the underlying threshold distribution by \(\beta\) standard deviations. When individual thresholds follow a normal distribution, this is the correct effect size measure.

\[ \text{Logit:} \quad e^\beta = \text{odds ratio (multiplicative)} \qquad\qquad \text{Probit:} \quad \beta = \text{shift in latent variable (SD units)} \]

The Comparison Problem in Multi-Group Studies

Critical issue for logit: In multilevel and multi-group studies, logit coefficients are confounded with the residual variance of the latent variable — which differs across groups even when the true effect is the same. Probit coefficients don't have this problem. This is why structural equation models for binary outcomes almost always use probit.
When logit is the right choice
  • Odds ratios are clinically meaningful in your field
  • Probabilities stay well away from 0 and 1
  • You need compatibility with existing literature
  • Case-control study designs
  • Logistic regression is standard in your field
When probit is the right choice
  • Latent normal threshold model is scientifically justified
  • Comparing coefficients across groups / models
  • GWAS and heritability analysis (liability threshold model)
  • Psychophysics and signal detection theory
  • Toxicology (LD50, bioassay) with normal assumptions

The Scale Factor Approximation

Because the logistic distribution has variance \(\pi^2/3\) while the standard normal has variance 1, the two models give related (but not identical) coefficient estimates. A rough conversion:

\[ \hat\beta_\text{logit} \approx \frac{\pi}{\sqrt{3}}\, \hat\beta_\text{probit} \approx 1.81\, \hat\beta_\text{probit} \]

This approximation holds well in the center of the distribution but breaks down at the tails — which is exactly where the choice of link function matters most.

Real-World Examples

The dose-response plot below shows how probit and logit produce nearly identical fitted curves for observed data in the mid-range — but diverge sharply when predicting at extreme doses. Hover over the plot to compare predictions.

The probit model was developed precisely for bioassay and toxicology. At each dose, a group of animals is exposed and the number responding is recorded. The research question often involves estimating the LD50 (lethal dose for 50% of the population) and the LD1 / LD99 at extreme quantiles. At those extremes, the choice of link function changes the estimate substantially.

Simulated Dose-Response Study (n = 20 per group)

True model: probit with LD50 = 10 mg/kg, slope = 2 on log₁₀ scale. The logit fit uses ≈ 1.81× the probit slope (scale conversion). Both curves look almost identical on screen — until you look at extreme doses.

Prediction at extreme doses differs. At dose = 0.1 mg/kg (far below LD50): probit predicts ≈ 0.003%, logit predicts ≈ 0.07%. At dose = 1000 mg/kg (far above LD50): probit predicts ≈ 99.997%, logit predicts ≈ 99.93%. For LD1/LD99 calculations in drug safety, this difference can matter.

Psychophysics studies how the senses respond to stimuli. The psychometric function relates stimulus intensity to the probability of detection. Signal detection theory posits that the internal response to a stimulus is normally distributed, making the probit model the theoretically correct choice.

Example: Auditory Detection Threshold

A listener is presented tones at different intensities. At each intensity, the proportion of "detected" responses is recorded. The underlying model assumes the internal evidence (neural activation) is normally distributed with mean proportional to stimulus intensity.

\[ P(\text{detect} \mid x) = \Phi\!\left(\frac{x - \mu}{\sigma}\right) \]

Here \(\mu\) is the threshold (just-detectable intensity) and \(\sigma\) is the spread of individual variation. The probit coefficient directly estimates \(1/\sigma\): how quickly sensitivity rises with intensity. This has direct physical meaning. The logit coefficient has no analogous physical interpretation.

d-prime (d′) in signal detection theory is exactly a probit-scale effect size. The logit scale has no equivalent. If you want to connect to SDT, you must use probit.

In genome-wide association studies (GWAS) and heritability analysis, the liability threshold model assumes a normally distributed underlying liability \(L\). Disease occurs when \(L\) exceeds a threshold — giving the probit model.

Why Genetics Prefers Probit

Heritability on the liability scale

The transformation from observed-scale heritability to liability-scale heritability requires the probit (normal CDF). The formula uses \(\phi(\Phi^{-1}(K))\) where \(K\) is disease prevalence — quantities defined only for the normal distribution.

\[ h^2_\text{liab} = h^2_\text{obs} \cdot \frac{K^2(1-K)^2}{\phi(t)^2} \]

Cross-group comparison

Comparing GWAS results between studies with different disease prevalence is straightforward on the liability scale. Logit-scale effect sizes (log-odds ratios) depend on prevalence in a non-linear way, making meta-analysis more complex.

Tools like LDSC, BOLT-LMM, and SAIGE use probit-based liability models as their default for binary outcomes.

For rare events and survival-linked binary outcomes, neither logit nor probit may be best. The complementary log-log (cloglog) link arises naturally when the outcome is "did at least one event occur in a period?" with events following a Poisson process.

When cloglog is the natural choice

Grouped survival data

If individuals are followed over time and we record whether an event occurred in each interval, the probability of the event in interval \(t\) given the hazard \(h(t)\) follows:

\[ P(\text{event in } [t, t+1)) = 1 - e^{-h(t)} \]

This is exactly the cloglog inverse. The cloglog model is the discrete-time analogue of the continuous Cox proportional hazards model — and the coefficients approximate Cox partial likelihood coefficients.

Why not logit here?

Logit applied to grouped survival data doesn't correspond to any standard continuous-time survival model. The hazard ratios you recover won't match Cox model estimates. If you plan to compare your results to survival analysis literature, cloglog is the correct link.

The cloglog model is also not symmetric around p = 0.5: it reaches high probabilities faster than low ones — appropriate for settings where most follow-up periods result in no event.

Decision Guide

Use this table to choose the appropriate link function for your analysis.

Link Underlying model Tail behavior Coefficient interpretation Best for
Logit Logistic latent variable Symmetric; heavier than normal Log-odds ratio: \(e^\beta\) = odds ratio Case-control studies; when OR is the target; general epidemiology
Probit Normal latent variable (liability) Symmetric; lighter tails than logit SD shift on latent scale; converts to d′ Psychophysics; genetics/GWAS; multi-group comparisons; toxicology (LD50)
Cloglog Poisson / Gumbel extreme value Asymmetric; reaches 1 fast, 0 slowly Log of negative log of survival; ≈ Cox log-HR Grouped survival data; rare events; time-to-event binary outcomes
Cauchit Cauchy latent variable Extremely heavy tails; slowest to reach 0/1 Limited; median shift on Cauchy scale Robust alternatives when extreme observations dominate; rare in practice
Practical note: When probabilities are well within [0.1, 0.9] and you're not extrapolating to extremes, logit and probit will give nearly identical predictions. The choice matters most when: (1) you're extrapolating to very low or high probabilities, (2) the scientific mechanism implies a specific distribution, or (3) you need to compare coefficients across groups or models.