Problems due to small samples and sparse data in conditional logistic regression analysis

S Greenland; J A Schwartzbaum; W D Finkle

doi:10.1093/oxfordjournals.aje.a010240

Problems due to small samples and sparse data in conditional logistic regression analysis

Am J Epidemiol. 2000 Mar 1;151(5):531-9. doi: 10.1093/oxfordjournals.aje.a010240.

Authors

S Greenland¹, J A Schwartzbaum, W D Finkle

Affiliation

¹ Department of Epidemiology, School of Public Health, University of California at Los Angeles, USA.

PMID: 10707923
DOI: 10.1093/oxfordjournals.aje.a010240

Abstract

Conditional logistic regression was developed to avoid "sparse-data" biases that can arise in ordinary logistic regression analysis. Nonetheless, it is a large-sample method that can exhibit considerable bias when certain types of matched sets are infrequent or when the model contains too many parameters. Sparse-data bias can cause misleading inferences about confounding, effect modification, dose response, and induction periods, and can interact with other biases. In this paper, the authors describe these problems in the context of matched case-control analysis and provide examples from a study of electrical wiring and childhood leukemia and a study of diet and glioma. The same problems can arise in any likelihood-based analysis, including ordinary logistic regression. The problems can be detected by careful inspection of data and by examining the sensitivity of estimates to category boundaries, variables in the model, and transformations of those variables. One can also apply various bias corrections or turn to methods less sensitive to sparse data than conditional likelihood, such as Bayesian and empirical-Bayes (hierarchical regression) methods.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Bias*
Case-Control Studies
Central Nervous System Neoplasms / epidemiology
Central Nervous System Neoplasms / etiology
Child
Diet
Electromagnetic Fields / adverse effects
Epidemiologic Methods*
Glioma / epidemiology
Glioma / etiology
Humans
Leukemia / epidemiology
Leukemia / etiology
Likelihood Functions
Logistic Models*
Matched-Pair Analysis
Odds Ratio
Regression Analysis*
Risk Assessment