What Is a Link Function?
Generalized Linear Models (GLMs) extend linear regression to outcomes that aren't continuous and symmetric. For a binary outcome \(Y \in \{0,1\}\), we model the probability \(p = P(Y=1 \mid \mathbf{X})\) by connecting it to a linear predictor through a link function \(g\):
The link function \(g\) maps the probability (which lives in \([0,1]\)) to the entire real line (where the linear predictor lives). Different choices of \(g\) produce different models with different assumptions, interpretations, and behaviors. These behaviors are most pronounced at the tails of the probability distribution.
| Name | Link \(g(p)\) | Inverse \(g^{-1}(\eta)\) |
|---|---|---|
| Logit | \(\log\!\left(\frac{p}{1-p}\right)\) | \(\frac{1}{1+e^{-\eta}}\) |
| Probit | \(\Phi^{-1}(p)\) | \(\Phi(\eta)\) |
| Cloglog | \(\log(-\log(1-p))\) | \(1 - e^{-e^\eta}\) |
| Cauchit | \(\tan(\pi(p - \tfrac{1}{2}))\) | \(\frac{1}{2} + \frac{\arctan(\eta)}{\pi}\) |
The logit is by far the most popular because its coefficients exponentiate to odds ratios — a familiar quantity. But odds ratios aren't always the right "lens" for every data type. When is it worth giving up this interpretability?
Comparing the Link Functions
All four links map the linear predictor \(\eta\) to a probability. They look similar in the middle — but diverge substantially near 0 and 1. Use the controls below to explore how changing the intercept \(\beta_0\) and slope \(\beta_1\) (where \(\eta = \beta_0 + \beta_1 X\)) moves and stretches the curves. Hover over the chart to read off exact values for each model.
X-axis: covariate X (range −5 to 5). Y-axis: P(Y=1 | X). The linear predictor is η = β₀ + β₁·X.
Tolerance Framework
One of the most intuitive ways to understand link functions is through the individual tolerance model. Imagine each subject has an unknown threshold \(T_i\) — the dose at which they will experience the event. We observe \(Y_i = 1\) when the stimulus \(x\) exceeds that threshold.
The distribution of tolerances across individuals informs the link function:
| Tolerance distribution | Link function |
|---|---|
| Normal | Probit |
| Logistic | Logit |
| Minimum extreme value (Gumbel) | Cloglog |
| Cauchy | Cauchit |
The tolerance model gives you a justification for choosing a link function. Instead of asking "which link fits the data?", ask: "what distribution do individual thresholds follow in this population?" If thresholds are plausibly normally distributed (as in many psychophysical and genetic settings), the probit is the correct model — even if you can't report odds ratios.
Each distribution shape arises naturally from a specific kind of biological or physical process. Click a row to see a concrete example where individual thresholds plausibly follow that distribution.
Normal → Probit Auditory detection threshold in psychophysics
In a hearing test, each person has a threshold \(T_i\) — the faintest tone they can detect. Because threshold is determined by many small, additive biological factors (cochlear hair-cell density, neural firing rate, attention), the central limit theorem drives the population distribution of \(T_i\) toward normal.
The probit model is therefore the natural choice: \(P(\text{detect} \mid x) = \Phi\!\left(\frac{x - \mu}{\sigma}\right)\), where \(x\) is sound intensity in dB. The same logic applies to LD50 dose-response curves in toxicology, where lethal thresholds across animals are approximately normally distributed.
Logistic → Logit Blood-pressure threshold for hypertensive crisis
Suppose each patient has a threshold systolic pressure \(T_i\) above which they experience a hypertensive crisis. If tolerance is shaped by a multiplicative cascade of many physiological odds — kidney function, vascular stiffness, renin–angiotensin activity — then the log-odds of exceeding the threshold accumulates additively, producing a logistic distribution for \(T_i\).
The logistic distribution is nearly identical to the normal but has slightly heavier tails, making it more forgiving of patients with unusually high or low baseline tolerance. This is why logistic regression is a sensible default when no strong mechanistic argument favors probit.
Gumbel (min) → Cloglog Infection onset after pathogen exposure
After exposure to a pathogen, infection occurs when at least one viable organism successfully evades immune defenses. If there are many independent chances for infection (each virion, each exposure site), the event occurs at the minimum of many independent random thresholds — a classic extreme-value (Gumbel min) scenario.
This produces a right-skewed tolerance distribution: most people are infected at low-to-moderate doses, but a long tail of highly resistant individuals exists. The complementary log-log link captures this asymmetry — it rises sharply at low doses and approaches 1 slowly — matching the biology of infectious disease dose-response.
Cauchy → Cauchit Tumor drug sensitivity in targeted oncology
In targeted cancer therapy, most tumors respond at moderate drug concentrations, but a small fraction are dramatically hypersensitive (e.g., activating mutations) while others are completely resistant (e.g., bypass pathway activation). This bimodal extreme-outlier pattern produces very heavy tails in the tolerance distribution.
The Cauchy distribution — which has undefined mean and variance — captures exactly this shape. The cauchit link is rarely the first choice in practice because the heavy tails make estimates sensitive to extreme observations, but it is the appropriate model when you have mechanistic reason to expect a non-negligible fraction of extreme responders far from the population center.
The curve below shows the distribution of individual thresholds in the population. The vertical line is the current stimulus level. The shaded area = P(Y=1) — the proportion of people whose threshold lies below the stimulus.
Note: the Gumbel distribution is right-skewed (its median ≈ −0.37), which is why the cloglog model is asymmetric — it approaches 1 faster than it approaches 0. The Cauchy distribution has very heavy tails, making cauchit sensitive to extreme observations.
The Interpretability Tradeoff
The logit link's main advantage is concrete coefficient interpretation. When you exponentiate a logit coefficient, you get an odds ratio: a one-unit increase in X multiplies the odds of Y=1 by \(e^{\beta}\). Clinicians and epidemiologists recognize this quantity immediately.
So why would anyone give that up?
What the Probit Coefficient Means
In the probit model, \(\beta\) has a natural interpretation in the tolerance / latent variable framework: a one-unit increase in X shifts the underlying threshold distribution by \(\beta\) standard deviations. When individual thresholds follow a normal distribution, this is the correct effect size measure.
The Comparison Problem in Multi-Group Studies
- Odds ratios are clinically meaningful in your field
- Probabilities stay well away from 0 and 1
- You need compatibility with existing literature
- Case-control study designs
- Logistic regression is standard in your field
- Latent normal threshold model is scientifically justified
- Comparing coefficients across groups / models
- GWAS and heritability analysis (liability threshold model)
- Psychophysics and signal detection theory
- Toxicology (LD50, bioassay) with normal assumptions
The Scale Factor Approximation
Because the logistic distribution has variance \(\pi^2/3\) while the standard normal has variance 1, the two models give related (but not identical) coefficient estimates. A rough conversion:
This approximation holds well in the center of the distribution but breaks down at the tails — which is exactly where the choice of link function matters most.
Real-World Examples
The dose-response plot below shows how probit and logit produce nearly identical fitted curves for observed data in the mid-range — but diverge sharply when predicting at extreme doses. Hover over the plot to compare predictions.
The probit model was developed precisely for bioassay and toxicology. At each dose, a group of animals is exposed and the number responding is recorded. The research question often involves estimating the LD50 (lethal dose for 50% of the population) and the LD1 / LD99 at extreme quantiles. At those extremes, the choice of link function changes the estimate substantially.
True model: probit with LD50 = 10 mg/kg, slope = 2 on log₁₀ scale. The logit fit uses ≈ 1.81× the probit slope (scale conversion). Both curves look almost identical on screen — until you look at extreme doses.
Psychophysics studies how the senses respond to stimuli. The psychometric function relates stimulus intensity to the probability of detection. Signal detection theory posits that the internal response to a stimulus is normally distributed, making the probit model the theoretically correct choice.
A listener is presented tones at different intensities. At each intensity, the proportion of "detected" responses is recorded. The underlying model assumes the internal evidence (neural activation) is normally distributed with mean proportional to stimulus intensity.
Here \(\mu\) is the threshold (just-detectable intensity) and \(\sigma\) is the spread of individual variation. The probit coefficient directly estimates \(1/\sigma\): how quickly sensitivity rises with intensity. This has direct physical meaning. The logit coefficient has no analogous physical interpretation.
In genome-wide association studies (GWAS) and heritability analysis, the liability threshold model assumes a normally distributed underlying liability \(L\). Disease occurs when \(L\) exceeds a threshold — giving the probit model.
Heritability on the liability scale
The transformation from observed-scale heritability to liability-scale heritability requires the probit (normal CDF). The formula uses \(\phi(\Phi^{-1}(K))\) where \(K\) is disease prevalence — quantities defined only for the normal distribution.
Cross-group comparison
Comparing GWAS results between studies with different disease prevalence is straightforward on the liability scale. Logit-scale effect sizes (log-odds ratios) depend on prevalence in a non-linear way, making meta-analysis more complex.
For rare events and survival-linked binary outcomes, neither logit nor probit may be best. The complementary log-log (cloglog) link arises naturally when the outcome is "did at least one event occur in a period?" with events following a Poisson process.
Grouped survival data
If individuals are followed over time and we record whether an event occurred in each interval, the probability of the event in interval \(t\) given the hazard \(h(t)\) follows:
This is exactly the cloglog inverse. The cloglog model is the discrete-time analogue of the continuous Cox proportional hazards model — and the coefficients approximate Cox partial likelihood coefficients.
Why not logit here?
Logit applied to grouped survival data doesn't correspond to any standard continuous-time survival model. The hazard ratios you recover won't match Cox model estimates. If you plan to compare your results to survival analysis literature, cloglog is the correct link.
Decision Guide
Use this table to choose the appropriate link function for your analysis.
| Link | Underlying model | Tail behavior | Coefficient interpretation | Best for |
|---|---|---|---|---|
| Logit | Logistic latent variable | Symmetric; heavier than normal | Log-odds ratio: \(e^\beta\) = odds ratio | Case-control studies; when OR is the target; general epidemiology |
| Probit | Normal latent variable (liability) | Symmetric; lighter tails than logit | SD shift on latent scale; converts to d′ | Psychophysics; genetics/GWAS; multi-group comparisons; toxicology (LD50) |
| Cloglog | Poisson / Gumbel extreme value | Asymmetric; reaches 1 fast, 0 slowly | Log of negative log of survival; ≈ Cox log-HR | Grouped survival data; rare events; time-to-event binary outcomes |
| Cauchit | Cauchy latent variable | Extremely heavy tails; slowest to reach 0/1 | Limited; median shift on Cauchy scale | Robust alternatives when extreme observations dominate; rare in practice |