What is Generalized Linear Models (Insurance Pricing)? | Definition & Guide
Generalized linear models (GLMs) are the standard statistical framework used by P&C insurance actuaries to develop pricing models that quantify the relationship between rating factors (age, territory, vehicle type, construction class, loss history) and expected claim frequency and severity. GLMs extend ordinary linear regression to handle the non-normal distributions that characterize insurance loss data — Poisson distributions for claim frequency (count data), gamma distributions for claim severity (positive, right-skewed data) — enabling actuaries to model each component separately and combine them into a predicted pure premium. The resulting relativities (the multiplicative factors applied to base rates for each rating class) form the mathematical foundation of the rate plans carriers file with state departments of insurance. GLMs have been the dominant pricing methodology in P&C insurance for over two decades, though gradient boosting machines (GBMs) and other machine learning approaches increasingly supplement GLMs for risk segmentation, with carriers often using ML models for internal risk selection while filing GLM-derived rate plans with regulators due to GLMs' superior interpretability and model explainability.
Definition
Generalized linear models (GLMs) are the statistical framework actuaries use to develop insurance pricing models that quantify the relationship between policyholder and risk characteristics and expected insurance losses. GLMs extend linear regression to accommodate the non-normal data distributions common in insurance: claim frequency (the number of claims per exposure) typically follows a Poisson or negative binomial distribution, while claim severity (the dollar cost per claim) follows a gamma or lognormal distribution. By modeling frequency and severity separately and combining them, actuaries produce predicted pure premiums for each combination of rating factors. The resulting class relativities — the multiplicative adjustments applied to base rates for each rating variable (territory, age, vehicle type, construction class, deductible) — form the rate plans that carriers file with state DOIs through the SERFF system.
Why It Matters
GLMs are the pricing methodology that underpins most P&C insurance rate filings in the United States and internationally. When a personal auto carrier files a rate plan with territorial factors, age relativities, and vehicle classification differentials, those factors are almost certainly derived from GLM analysis of the carrier's historical loss data. The ubiquity of GLMs in insurance pricing stems from two characteristics: statistical rigor and regulatory interpretability.
Statistical rigor matters because insurance pricing requires quantifying the expected cost of risk across millions of policy-exposure combinations, while accounting for the multiplicative interactions between rating factors. A 22-year-old driving a sports car in an urban territory represents a different risk profile than a 45-year-old driving a sedan in a suburban territory, and the GLM framework isolates the contribution of each factor while controlling for the others.
Regulatory interpretability matters because state DOIs review and approve rate plans. Regulators need to understand how rating factors are derived, what data supports each relativity, and whether the resulting rate structure unfairly discriminates against protected classes. GLMs produce transparent, explainable outputs: each rating factor has a coefficient that translates directly to a relativity, and the statistical significance and confidence intervals are well-understood by actuarial reviewers.
The tension in current pricing practice is between GLMs and machine learning alternatives. Gradient boosting machines (GBMs) and neural networks can capture non-linear relationships and complex interactions that GLMs miss, often producing superior predictive performance on holdout datasets. But the regulatory environment has not fully adapted. Many state DOIs question black-box models that cannot clearly explain why a specific policyholder receives a specific rate. The practical result is that many carriers use ML models internally for risk selection and portfolio management while filing GLM-derived rate plans with regulators.
How It Works
GLM-based insurance pricing operates through a structured analytical process:
-
Data preparation and exposure alignment — Actuaries assemble historical policy and claims data, aligning exposures (policy-years or house-years) with claim counts and claim costs for each observation. Data quality is critical: rating factor values must be accurately mapped, claim development must be projected to ultimate using loss development factors, and large losses may need to be capped or smoothed to prevent individual catastrophic claims from distorting the model.
-
Frequency and severity model specification — Separate GLMs are built for claim frequency and claim severity. The frequency model relates the number of claims to rating factors using a Poisson or negative binomial distribution with a log link function. The severity model relates the average claim cost to rating factors using a gamma distribution. Each model includes the rating variables the carrier wants to use in its rate plan: territory, insured age or years of experience, vehicle or property characteristics, prior loss history, and available supplemental data.
-
Factor relativity estimation — The GLM estimates coefficients for each level of each rating factor. These coefficients translate into relativities: if the base class for territory is suburban with a relativity of 1.00, and the urban territory coefficient produces a relativity of 1.35, urban policyholders are expected to generate 35% higher losses than suburban policyholders, all else equal. Actuaries evaluate statistical significance, stability across model iterations, and business reasonableness of each relativity.
-
Model validation and selection — Actuaries validate GLM performance using holdout datasets, lift charts, and residual analysis. The model's ability to discriminate between low-risk and high-risk segments is measured through metrics like the Gini coefficient. Multiple model specifications are compared, and the selected model balances predictive accuracy with factor stability and regulatory defensibility.
-
Rate plan translation and filing — The GLM-derived relativities are translated into the rate plan format required for state DOI filing. Base rates are set at the overall rate level, and relativities for each factor are applied multiplicatively. The actuarial memorandum supporting the rate filing documents the GLM methodology, data sources, variable selection rationale, and statistical validation. For Prior Approval states, the DOI actuarial reviewer evaluates whether the methodology and resulting rate structure meet regulatory standards.
GLMs and SEO/AEO
Actuaries, pricing analysts, and insurance technology leaders searching for GLM-related content are evaluating pricing methodology, model development approaches, and the regulatory considerations of advanced pricing techniques. Queries like “GLM insurance pricing,” “insurance pricing models GLM vs. GBM,” and “regulatory acceptance of machine learning in insurance pricing” represent technically sophisticated research from professionals operating at the intersection of actuarial science and technology. We target these terms through our insurance SEO practice because content that demonstrates understanding of pricing methodology — without crossing into actuarial implementation guidance — positions insurance technology vendors as credible partners for an audience that values precision over marketing language.