ANALYSING CYCLES in Biology and Medicinea practical introduction to Circular Variables & Periodic Regression

2nd edition, 2008: ISBN 978-0-9736209-2-4
Author: K.N.I. Bell,   B.Sc., M.Sc., Ph.D.    

About 180 pages, 70,000 words, copiously illustrated. See sample excerpt (below) for an idea of what is in it; and see (download) selected papers (1995, 1997, 2001a, 2001b) for examples of its application.

See review by F. J. Rohlf (of "Biometry", Sokal & Rohlf) in Quarterly Review of Biology 85 (1):123 (March 2010)

 

 

Motivation for this book: to improve how biologists typically handle cyclic or periodic data.

Cycles are pervasive in biology, yet often improperly analysed (e.g. by dividing a year into seasons and analysing via factorial Anova that by definition cannot use their sequentiality), or more generally ignored and treated as no more than a nuisance. That is for lack of knowing the proper analytical methods. C.I. Bliss stated the problem:

   “Periodic phenomena in biology and climatology occur so widely that we tend either to adapt to them as unavoidable nuisances or are overimpressed by their day to day deviations. We can’t see the forest for the trees” -- C.I. Bliss, 1958.

To overcome that problem, the book explains the techniques and the background vividly and graphically, making periodic regression accessible to all readers.

 

Back Cover text:

Cycles surround us. Indeed, they are the essence of life, and critically important in biology and medicine. We need to know how to analyse and understand them.

But, too often, researchers are given ugly advice: to keep cycles out of data—by restricting sampling to the same time of day, tide, etc. That’s a bad plan; it’s costly because you have to wait for your chosen special times (e.g. 1100h), and even more costly with multiple cycles because you have to wait for rare conjunctions of special times (e.g. 1100h + high tide) in each. The final consequence is: no matter how carefully you worked, the opportunity to describe key cycles is lost, so your findings are virtually meaningless because they can’t be generalised outside the special times you chose.

In fact, it is easier, more useful, and much more beautiful, to put cycles into the analysis than to keep them out of the data. This book makes it easy.

Written in a relaxed style, the book anticipates readers ranging from apprehensive to advanced. It is copiously illustrated with conceptual diagrams and worked-out examples. It contains all that’s needed to get started: even the basic trigonometry and a crisp stats refresher. All you need is the book, your data, and Excel or a stats package.

Periodic regression is so sparsely known to biologists (writ large) that to many it seems almost witchcraft. This situation will no doubt continue for some years, and during that time it will be necessary to reference it carefully or even explain it directly so that readers (and reviewers) can understand that it is a direct extension of regression, and understand its application to removing common sets of periodic trends from parallel sets of data to enable comparisons that would be impossible or unreliable otherwise.

No comparable book exists. Its key precedents (Bliss 1958, 1970 and Batschelet 1981) are out of print. Bliss (1958) is a beautiful paper that comes the closest to making periodic regression readily accessible but, like the others and understandably for its time, does not anticipate the availability of software for analysis and data handling. "Analysing Cycles ..." graphically presents the conceptual framework and simplifies notation for expression and implementation of periodic regression..

  Excerpt:

Figure 6-3. Conceptualising periodic regression: Y vs. circular X. A flat graph of periodic data in a wave-like pattern is “printed” onto a cylindric graph rolled across it, or vice versa. The x-axis of the flat graph is cyclic format of circular X (this example has period of 24 units), while the cylinder base has the (sin`x,cos`x) coordinates (trigonometric format) of X, the transforms that are essential for analysis. The cylinder has radius 1.0 (unit circle). Mesor, Phase angle (∂) of the peak, and Amplitude (A) are marked.

In more detail: a sinusoidal function, or data, is visualised as a sinusoidal curve ‘peeled’ or ‘printed’ from a flat plot onto a cylinder (or vice versa).  A sine curve results from the intersection of a flat plane and a cylinder; that's why it is the simplest possible repeating curve and the reasonable (most parsimonious) periodic model form to assume (examination of residuals will show whether other models need to be explored). Periodic regression is easiest to conceptualise by visualising how the data actually exist on the cycle.
   For now, imagine the data are temperatures (Y) over (X) the 24h daily cycle. X (0≤x≤24) is a circular variable; we can tell it's circular because on the cycle 24=0. Midnight is 24h or 0h*. Same thing. In fact, in the cycle, any angle plus one complete revolution is the same angle, e.g. time xh = xh+24, 10 a.m. plus one day is still 10 a.m. If the number of days, or number of turns of a screw, or the number of years, etc., is important then that number is usually treated as a linear variable separate from, or even instead of, the angle.
      * For simplicity there we only used whole hours; why? Because the so-called 'military' or '24-h' time as commonly used is not a proper number. It's two different kinds of number stuck together: 1025h does not mean 10.25 hours, it means 10hours25minutes, or 10+25/60 hours, =10.41666 hours. But the regularity still prevails: on the daily cycle, 10.4166h=34.4166h ... if plotted, the angle is the same, and the sines and cosines are the same. That added 24h is one whole day, and if the data contain time indices spanning several days and various times of day, and if the day number is important to include in analysis it must be treated as a linear variable while the times of day are (unavoidably) circular. For more on time measurement conventions, see the ISO 8601 standard. Our normal time language such as "twenty to five" or "four-thirty" is actually a host of conventions, only in special cases (whole hours) being proper numbers.
   Circular variables require special handling so that, in effect, the analysis does not treat x=24 differently from x=0. We can't simply regress Y vs. X (0≤x≤24); that would give us nonsense, partly because on that cycle 24=0, and because that would fit a straight line rather than a repeating function. Instead, we have to first decompose the circular X into its linear components: then we can analyse Y vs. the proper sine and cosine of X. We indicate the proper sine etc by a following grave [pronounced grahv] accent, as when we write Y = B0 + B1*sin` X + B2*cos`X.   The grave accent ` indicates "proper" sine and cosine as obtained after converting X to degrees or radians (units that allow taking sine or cosine). Using the grave accent ` lets us focus on the issue of interest rather than the machinery and units. This notation is quicker and more general than writing Y = B0 + B1*sin(X*2*π/k) + B2*cos(X*2*π/k), or --- because this long form uses radians and doesn't acknowledge that we would equally well use other units --- writing X*360/k instead of X*2*π/k if using degrees instead of radians, where k is the number of units in the cycle (here 24 h).
   A further notational economy, useful when discussing coordinates of vectors implied by circular variables, is x" and y" for sin` X and cos`X, i.e. the pair transforms that 'translate' X, which is a circular index variable (time, direction, etc.), into linear components that can be analysed in a regression. There may be other Xs that are not circular and these are not transformed but can exist in the same regression with circular Xs.  Do not confuse y" with the independent variable Y.  We first map the cycle (X) as a circle at the base of the figure. To save us trouble, we define the the circle as having radius=1. Now, the circle representing X has two dimensions (sine and cosine, which we can call X" and Y"), and the positions of each observation are readily seen to equate to coordinates that equate to the proper sine and cosine of X. Next, we map Y as height above the circle (alternative representations are possible, e.g. the polar plot), but using height lets us "print" the data from its cylindrical arrangement onto a flat page, or vice versa, as the diagram shows. In a publication the data might often be plotted in the flat configuration, i.e. Y vs. a linear format of X, or as a polar plot, but rarely in this kind of perspective cylindrical form that links them all.
   The proxy variables x" and y" thus are transforms of the independent variable X (pseudolinear, e.g. degrees of a circle, hours of a day, days of a year) axis. These proxy variables form axes of coordinate space (indicated by the double-prime mark ") on which the angles of the linear X variable are expressed; this ‘trigonometric’ is required for analysis of a cycle (e.g. by regression).
   Mesor, Phase, and Amplitude (A) are the three key parameters of a cycle. These relate to the graph, with the mesor being at the center of the intersection of plane and cylinder or halfway between the lowest and highest expected values, the phase being the angle or direction corresponding to the highest expected value, and amplitude being the difference between the mesor and the highest or lowest expected value.
   Although we usually obtain sine and cosine from tables or functions on our calculators, it's nice to know that we could do it graphically. If we declare the radius of the rolled-up graph to be 1.0 and thus create a unit circle on which any angle can be expressed, and if we then draw coordinate axes X" and Y" through the centre of the unit circle, a point (x",y") represents the sine (X"=sin`x=sin(X*2π/k, where k is the number of angular units into which we've divided the cycle of interest, e.g. 360 degrees, or 24h, or 365 days) and cosine (Y"=cos`x=cos(X*2π/k)) transform of the original time scale. Expressions like sin`x thus use the grave mark to mean "proper", i.e. the proper sine, taken after x has been properly transformed to a standard angular system like radians. Thus (x",y") are paired transforms of the periodic X variable, in this example a 24h cycle, and the transforms are proxy X-variables on which a regression can be conducted.
    The phasing of the data peak relative to the nominal zero of the cycle is ∂ = P-t(0). ∂ can be identified iteratively, or graphically as below, or explicitly by regression using X" and Y" to model the variation in the dependent variable Y. Statistical significance is a separate issue from fitting the function, and cautions against some tempting errors are given in the book.

Previous editions

The First Edition (2004) was an e-book titled: Introduction to Circular Variables & Periodic Regression in Biology (e-book in PDF format): ISBN 0-9736209-0-0. It is on deposit at the National Library of Canada, but that version is no longer distributed and at present no electronic version of the 2nd Edition is planned.

Literature Cited

Batschelet, E. 1981. Circular statistics in biology. Mathematics in Biology, R. Sibson & J. Cohen (Ser. Ed.). London: Academic Press. xvi+371 pp.
Bliss, C. I. 1958. Periodic regression in biology and climatology. Bull. Conn. Agric. Exp. Station, New Haven 615: 1-55.
Bliss, C. I. 1970. Statistics in Biology, Vol. 2. New York: McGraw-Hill. 639 pp.