[14] - Quantifying Complexity and Regularity of Neurobiological Systems

https://doi.org/10.1016/S1043-9471(06)80040-6Get rights and content

Introduction

Time series are encountered frequently in analysis of biological signals. Within endocrinology, hormone concentration time series that are based on frequent, fixed-increment samples have been the subject of intensive study (1); heart rate and the EEG (electroencephalogram) are two further examples of physiological time series. The biologist is often interested in time series for either of two important purposes: (i) to distinguish (discriminate) systems, on the basis of statistical characteristics; (ii) to model systems mathematically. In both cases, effective statistics and models need to account for the sequential interrelationships among the data: the study of autocorrelation and of power spectra are motivated by this recognition. Below, we focus on the first of these purposes, statistical discrimination, via a quantification of regularity of a time series. This approach also calibrates the extent of sequential interrelationships, from a relatively new perspective, based on quantifying a notion of orderliness (as opposed to randomness) of the data.

Before presenting a detailed discussion of regularity, we consider three sets of time series (Fig. 1, Fig. 2, Fig. 3) to illustrate what we are trying to measure. In Fig. 1, the data represent growth hormone (GH) levels from three subjects, taken at 5-min intervals during a fed state (2): Fig. 1A,B shows data from normal subjects, and Fig. 1C data are from an acromegalic (giant). The mean levels associated with the three subjects are similar, yet the series in Fig. 1A,B appear to be regular, less random than that in Fig. 1C. In Fig. 2, the data represent the beat-to-beat heart rate, in beats per minute, at equally spaced time intervals. Figure 2A is from an infant who had an aborted SIDS (sudden infant death syndrome) episode 1 week prior to the recording, and Fig. 2B is from a healthy infant (3). The standard deviations (SD) of these two tracings are approximately equal, and although the SIDS infant has a somewhat higher mean heart rate the data from both subjects both are well within the normal range. Yet the tracing in Fig. 2A appears to be more regular than the tracing in Fig. 2B. Figure 3, taken from the mathematical MIX(p) process discussed below, shows a sequence of four time series, each of which have mean 0 and SD 1; yet it appears that the time series are becoming increasingly irregular as we proceed from Fig. 3A to Fig. 3D).1 In each of these instances, we ask the following questions, (i) How do we quantify this apparent difference in regularity? (ii) Do the regularity values significantly distinguish the data sets? (iii) How do inherent limitations posed by moderate length time series, with noise and measurement inaccuracy present as in Fig. 1, Fig. 2, affect statistical analyses? (iv) Is there some general mechanistic hypothesis, applicable to diverse contexts, that might explain such regularity differences?

A new mathematical approach and formula, Approximate Entropy (ApEn), has been introduced as a quantification of regularity in data, motivated by the questions above (4). Mathematically, ApEn is part of a general theoretical development, as the rate of entropy for an approximating Markov chain to a process (5). In applications to a range of medical settings, findings have discriminated groups of subjects via ApEn, applied to heart rate time series, in instances where classic statistics (mean, SD) did not show clear group distinctions (3., 6., 7., 8., 9.).

The development of ApEn evolved as follows. To quantify time-series regularity (and randomness), we initially applied the Kolmogorov–Sinai (K-S) entropy (10) to clinically derived data sets. The application of a formula for K-S entropy (11., 12.) yielded intuitively incorrect results. Closer inspection of the formula showed that the low-magnitude noise present in the data greatly affected the calculation results. It also became apparent that to attempt to achieve convergence of this entropy measure, extremely long time series would be required (often 100,000 or more points), which even if available, would then place extraordinary time demands on the computational resources. The challenge was to determine a suitable formula to quantify the concept of regularity in moderate length, somewhat noisy data sets, in a manner thematically similar to the approach given by the K-S entropy.

Historical context further frames this effort. The K-S entropy was developed for and is properly employed on truly chaotic processes (time series). Chaos refers to output from deterministic dynamical systems, where the output is bounded and aperiodic, thus appearing partially random. There have been myriad claims of chaos based on analysis of experimental time-series data, in which correlation between successive measurements have been observed. As chaotic systems represent only one of many paradigms that can produce serial correlation, it is generally inappropriate to infer chaos from the correlation alone. The mislabeling of correlated data as chaotic is a relatively benign offense. Of greater significance, complexity statistics that were developed for application to chaotic systems and are relatively limited in scope have been commonly misapplied to finite, noisy, and/or stochastically derived time series, frequently with confounding and nonreplicable results. This caveat is particularly germane to biological signals, especially those taken in vivo, as such signals usually represent the output of a complicated network with both stochastic and deterministic components. We elaborate on these points below, in the Statistics Related to Chaos section. With the development of ApEn, we can now successfully handle the noise, data length, and stochastic/composite model constraints in statistical applications. We describe this below, and with analysis of the MIX(p) family of processes, explicitly consider a composite stochastic/deterministic process, and the performance of both ApEn and chaos statistics in attempting to distinguish members of this family from one another. We thus emphasize that in applying ApEn, we are not testing for chaos.

We observe a fundamental difference between regularity statistics, such as ApEn, and variability measures: Most short- and long-term variability measures take raw data, preprocess the data, and then apply a calculation of SD (or a similar, nonparametric variation) to the processed data (13). The means of preprocessing the raw data varies substantially with the different variability algorithms, giving rise to many distinct versions. However, once preprocessing the raw data is completed, the processed data are input for an algorithm for which the order of the data is immaterial. For ApEn, the order of the data is the essential factor; discerning changes in order from apparently random to very regular is the primary focus of this statistic.

Finally, an absolutely primary concern in any practical time-series analysis is the presence of either artifacts or nonstationarities, particularly clear trending. If a time series is nonstationary or is riddled with artifacts, little can be inferred from moment, ApEn, or power spectral calculations, as these effects tend to dominate all other features. In practice, data with trends suggest a collection of heterogeneous epochs, as opposed to a single homogeneous state. From the statistical perspective, it is imperative that artifacts and trends first be removed before meaningful interpretation can be made from further statistical calculations.

Section snippets

Definition of Approximate Entropy

Approximate entropy measures the logarithmic likelihood that runs of patterns that are close for m observations remain close on the next incremental comparisons. Greater likelihood of remaining close, regularity, produces smaller ApEn values, and conversely. From the perspective of a statistician, ApEn can often be regarded as an ensemble parameter of process autocorrelation: smaller ApEn values correspond to greater positive autocorrelation, larger ApEn values indicate greater independence.

Choice of m, r, and N

The value of N, the number of input data points for ApEn computations is typically between 100 and 5000. This constraint is usually imposed by experimental considerations, not algorithmic limitations, to insure a single homogeneous epoch. Based on calculations that included both theoretical analysis (4., 15., 16.) and clinical applications (3., 6., 7., 8., 9.) we have concluded that for m = 2 and N = 1000, values of r between 0.1 and 0.25 SD of the u(i) data produce good statistical validity of

Applicability to Endocrine Hormone Secretion Data

In Pincus and Keefe (16), the potential applicability of ApEn to endocrinology was examined, to discern abnormal pulsatility in hormone secretion time-series data. We concluded that ApEn is able to discern subtle system changes and to provide insights separate from those given by a number of widely employed pulse detection algorithms (discussed further below), thus providing a complementary statistic to such algorithms. In particular, it was shown that ApEn can potentially distinguish systems

Moment Statistics

ApEn is a regularity, not a magnitude statistic. Although it affords a new approach to data analysis, it does not replace moment statistics, such as the mean and SD. Epistemologically, ApEn addresses the change from orderly to random, not changes in average (mean) level, or the degree of spread about a central value (SD). As such, we recommend use of ApEn in conjunction with other statistics, not as a sole indicator of system characteristics.

Feature Recognition Algorithms

The orientation of ApEn is to quantify the amount of

MIX(p)—A Family of Stochastic Processes with Increasing Irregularity

Above, we indicated that statistics developed for truly chaotic settings are often inappropriate for general time-series application. Analysis of the MIX(p) processes (4) vividly indicates some of the difficulties realized in applying such statistics out of context, and emphasizes the need to calibrate statistical analysis to intuitive sensibility. MIX, a family of stochastic processes that samples a sine wave for p = 0, consists of independent (IID) samples selected uniformly (completely

Mechanistic Hypothesis for Altered Regularity

It seems important to determine a unifying theme suggesting greater signal regularity in a diverse range of complicated neuroendocrine systems. We would hardly expect a single mathematical model, or even a single family of models, to govern a wide range of systems; furthermore, we would expect that in vivo, each physiologic signal would usually represent the output of a complex, multinodal network with both stochastic and deterministic components. Our mechanistic hypothesis is that in a variety

Summary and Conclusion

The principal focus of this chapter has been the description of a recently introduced regularity statistic, ApEn, that quantifies the continuum from perfectly orderly to completely random in time-series data. Several properties of ApEn facilitate its utility for biological time-series analysis: (i) ApEn is nearly unaffected by noise of magnitude below a de facto specified filter level; (ii) ApEn is robust to outliers; (iii) ApEn can be applied to time series of 100 or more points, with good

First page preview

First page preview
Click to open first page preview

References (30)

  • D.T. Kaplan et al.

    Biophys. J.

    (1991)
  • W.J. Parer et al.

    Am. J. Obstet. Gynecol.

    (1985)
  • A. Wolf et al.

    Physica D

    (1985)
  • P. Grassberger et al.

    Physica D

    (1983)
  • S.M. Pincus

    Math. Biosci.

    (1994)
  • D.S. Broomhead et al.

    Physica D

    (1986)
  • G. Mayer-Kress et al.

    Math. Biosci.

    (1988)
  • M. Casdagli

    Physica D

    (1989)
  • R.J. Urban et al.

    Endocr. Rev.

    (1988)
  • M.L. Hartman et al.

    J. Clin. Invest.

    (1994)
  • S.M. Pincus et al.

    Am. J. Physiol.

    (1993)
  • S.M. Pincus

    Proc. Natl. Acad. Sci. U.S.A.

    (1991)
  • S.M. Pincus

    Proc. Natl. Acad. Sci. U.S.A.

    (1992)
  • L.A. Fleisher et al.

    Anesthesiology

    (1993)
  • S.M. Pincus et al.

    J. Clin. Monit.

    (1991)
  • Cited by (0)

    View full text