|
Overview Theory Techniques Input Form References |
Overview Welcome to the non-linear prediction home page. Simplex projection, originally proposed by George Sugihara and Bob May in 1990 is a powerful tool for [1] detecting patterns (chaos) in what otherwise appears to be randomness (noise). Moreover, it will also tell you [2] how complicated this pattern is, and then [3] predict the future, often with far greater accuracy than any other technique. Its twin technique, the S-Map (Sugihara 1994) can do the same things and works especially well when the time series has a part-nonlinear and a part-linear signature. It also provides an index for the degree of non-linearity of the system.These techniques are great diagnostic tools for illuminating the black box that produced a time series, and have been used in fields as diverse as epidemiology, cardiology, neuroscience, atmospheric science, finance, engineering (acoustic noise reduction), ecology and fisheries. Despite their power and potential generality, nonlinear forecasting methods have not been as widely used as linear regression for example, even though a great many things in the world are better described by non-linear or chaotic regression techniques. Some of this inertia comes from the veil of arcane mathematics and software; the overhead required to understand and apply them has certainly slowed entry to using these tools. This site is an attempt to change that, by providing intuitive explanations and wider access to the technique in a format that does not require any initial time investment. Admittedly, the methods are complicated; and not only that, they are not appropriate for all data. We realize it takes some time to learn these techniques, and that the motivation may not be there, especially if you do not know if your data are amenable to the methods in the first place. Therefore, to eliminate barriers to entry (and make sure the analysis is done correctly), we will provide an initial analysis of your data to determine if the methods apply, and email you the results. If you are happy with the black-box for now, skip to the input form. Theory Part 1: Detecting ChaosBy definition, chaotic systems appear random even though they can be governed by extremely simple equations. The trick is being able to tell the difference between deterministic chaos and noise (also known as stochasticity, high dimensional noise, true randomness, etc). Compare the two series below.
Part 2: Predicting the Future "The way to predict the future is to look to the past. The key to predicting the future well is knowing the "dimensionality" of the past, present and future." --G.S. 1990 This transformation, plotting a time series against past values of itself, is called a "lagged-coordinate embedding". The graph that comes from this technique, a strange time series with no axis for time, is called a "phase portrait." The embeddings in figure 2 plot X(t) into X(t+1), and are therefore sometimes referred to as "two dimensional" and "return maps". The theory behind this trick comes from an amazing result used first by Jim Crutchfield (1978, in an undergraduate thesis) and later proven by Floris Takens (1981). Simply put, the idea is that past values can be used to reconstruct a shadow picture of the thing that generated those values. In other words, even though we don't know the equations that govern the time series, we can still draw it accurately. The shape that emerges from this procedure is either called a "manifold" or an "attractor", because when the system is disturbed away from this pattern, it will be "attracted" back to it. The simplest kind of attractor is a "point equilibrium". Imagine a marble sitting at the bottom of a bowl. If you push the marble, it will roll around for a while before returning to its point equilibrium at the bottom of the bowl. As a slightly more biologically useful example, imagine a population of elk living on a deserted island with enough grass to feed a herd of 100. If something happens to kill off 20 of these, the population will rebuild to 100. If elk mothers have unusually high numbers of offspring, some individuals will starve and the population will return to 100. These situations are extremely easy to represent mathematically, but usually are too simple to accurately represent nature. There are also "stable limit cycles", which are more like a toy airplane tied to a string that flies forever in the same circle. If pushed inwards, the centripital force of the airpline will move it out; if moved outwards, gravity causes the plane to move down to it's previous orbit. Another scenario for our elk on the island is that a herd of 80 elk allows grass to grow abundantly, supporting a herd of 120 elk, which overgraze until the herd has dwindled to 80 again, and so forth. Finally, there are "strange attractors". The butterfly-shaped Lorenz attractor at the top of this web page is an example: the path of the marble or airplane is constrained to certain limits and has a charcteristic trajectory, but it never quite follows the exact same path twice. Following our analogy, a great many factors are responsible for changing the elk population: though it never goes extinct--nor increases to infinity--the simple swing circle from 120 to 80 and back again is disrupted and the peaks and valleys are a bit different each time. Of course, this is what is generally seen in nature and thre reason why chaos is so appealing: it at least has a hope of replicating observed phenomena. Finding patterns in noise always seems amazing. Martin Casdagli, a former post doc in this lab, coined the term "embeddology" to highlight its significance, and Edward Lorenz describes this as the most remarkable breakthrough in nonlinear science. Returning to the manifold in (2A), notice two important things. First, a straight line through these points would miss a lot of the detail; and if we were trying to use the x-axis alone to predict y, our predictions would be weak. This "straight line" approach is essentially what an "auto-regressive" linear model does. More sophisticated versions (such as ARMA or ARIMA) are not qualitatively different. A better approach is to let the data determine the shape of the line (attractor). That is essentially what these two techniques do. Second, some regions can be more predictable than others. Referring to figure 2, think of the y-axis as now corresponding to the predicted value ("tomorrow"), and the x-axis corresponding to the current value ("today"). A scientist is often faced with the task: "given today's value, forecast the value tomorrow." If today the value is -1.0, then you can be pretty sure the next value ("tomorrow", y-axis) will be about 0.8. However, if the value was 0.0 today, then tomorrow might be -1.0, or it might be 0.0 again. Regions of the graph where several predicted values are likely are called "singularities". To quantify how well the program makes predictions, we use a technique called "cross-validation." To do this, half of the points in the time series are given to the program. These points are "embedded" to create an "attractor", which is then used to predict the other half of the time series. In the example above, our forecasts correlate 56% with the actual observed values. That is, a 1D model has 56% forecast skill. However, there is no reason why we should limit ourselves to using only today's value to predict tomorrow. Why not make a 2D prediction by including "yesterday" as well as "today" to predict "tomorrow"? The embedding in that case is represented by the 3D figure below. If you were to "look" straight down on this figure from above, you would see fig 2A projected on the xy-plane (or "yesterday"-"today" plane, and the 3rd dimension, z-axis or Xt+1 is value one tries to predict for "tomorrow"). Where before there was a singularity 0.0 (that is, we were unable to say whether tomorrow's value would be 0.0 or -1.0) this singularity is now resolved since the value yesterday will determine which of our two candidate forecasts is correct. As a result our predictions are much stronger. With just 25 data points, we can forecast 975 out-of-sample with 90% skill. With 500 points, we can predict 500 more with 99.8% skill. Remember, this is not how well the model "fits" the data, this is blind prediction of data. If, as in the 1D case above, we give the model 500 points and ask it to predict 500 out of sample points (500 points the model has not seen), then predictability jumps to 99.8%.
Scaling is a serious problem with nonlinear phenomena, so that unless the data are collected at exactly the correct scale the misaggregated output from nonlinear sources may falsely appear linear stochastic (Sugihara 1997). Imagine (2A), but instead of Xt+1 on the y-axis, Xt+35 or Xt+.001 instead. Nonlinearity is a fragile thing. Indeed it could be used to determine the optimal scale. Part 3: Complexity For the final useful part of this theory, imagine now that we use more and more data to try to make predictions of the future. That is, we try to predict the value "tomorrow" using "today", "yesterday", "the day before yesterday", and so on. As the time series is embedded in a higher and higher dimensional manifold (which is hard to draw, but mathematically simple), we quickly lose predictive power, as shown below. ![]() At first, it seems counterintuitive that predictions would degrade when given more data. However the reason soon becomes clear: something occuring in the distant past is unlikely to contribute materially to the future. While it is very useful to know what happened "today" and "yesterday", it may not be useful to know what happened "fifteen days ago". In fact, when one attempts to make predictions based on information 15-days old, this irrelevant data overwhelms the useful data contained in the first 2-3 days and predictability tails off quickly. The value of the best embedding dimension is of great interest, because it correlates with the number of variables causing a time series to behave as it does. According to a mathematical law known as Whitney's theorem, the embedding dimension E will always lie between the number of variables in the system (n) and 2n+1.
Why does this theorem work? That's much trickier. Let's go all the way back to the time series in figure (1A). Before we do any analysis, we don't know how many variables are responsible for its behavior. But we do know that every point in the time series is being influenced by the same number of variables. So we need at least that many points to capture all the variables (the lower limit on E). The upper limit is set by the number of orthogonal axes you can have in n-space. Once one uses more than 2n+1 points to describe an n-dimensional object, one is necessarily using redundant information. Techniques Now we describe the two methods that put this theory into practice. The first is called Simplex projection. Here, one selects the b points on your attractor with histories MOST SIMILAR to the one you are trying to predict (Yt). These points form a "simplex" around your point. You then track each one of those b points one step into the future, deforming and moving the simplex. The forecasted value (Yt+1) is the point in the middle of this "projected simplex", or rather the point that is proprtionally as close to each point in the projected simplex as it was in the original simplex. Thus if Yt was very close to one of the three vertices of the original simplex, Yt+1 remains closest to that vertex in the projection. This is conveniently done by taking a weighted average of the projected vertices, where the weight of each vertex (wj) is proportional to the proximity of that vertex to the predictee.
The second technique is called an S-Map, and it is similar to simplex projection. Again one selects points with similar histories to the one you are trying to predict, that is points that lie near you predictee on the attractor. But now, rather than projecting just a few of them forward one time step and taking the average, all the points are projected and a line fit to all of them, again exponentially weighting the points so those near the predictee have more influence on the shape and direction of the line. When one does this for a large range of values, then the lines together begin to form a curve (multi-dimensional spline) that follows the curve of the attractor. Again, for every prediction a new regression is computed so that vectors most similar to the predictee get weighted more heavily.
So, one theory, two techniques, three results:
Input Form References Simplex Projection
| |||||||||||||||||||||||