Data Science for Six World Series-Time Series Analysis and Forecasting

Last modified


Data Science DC Meetup, Thursday, October 29, 2015


For our October Data Science DC Meetup, we're talking about the World Series. No, not that World Series* -- time-series data about the world! Long-time DSDC attendee Lee De Cola will be showing how to think about, analyze, and forecast important world-wide metrics that have been collected for over 50 years. Expect to learn about rigorous ways to analyze temporal data. And maybe there'll be baseball puns.


This presentation will use statistical and visualization features of R to explore yearly time series that characterize key global changes since 1950, a period sometimes called the Great Acceleration. Linear regression supplemented with autocorrelation diagnostics can provide most of the key descriptive information about these data, while nonlinear estimation and exponential smoothing can be used to provide forecasts. However, forecasting – in the sense of providing point predictions of future values – should be used descriptively and as providing warnings. Widespread understanding of these data is of surpassing importance to the future welfare of life on Earth.

Participants who would like to follow along are welcome to bring computers loaded with R and to download in advance (WiFi will not be available) the data used for this presentation at . 


Lee De Cola runs DATA to Insight a data visualization consulting and training enterprise. For 21 years he was a research scientist at the U.S. Geological Survey in Reston Virginia, where he used GIS and applied statistics to understand landscape dynamics and the health of regions and their inhabitants. Lee has published on land cover analysis, spatial epidemiology, urban systems complexity, fractals in geography, and urbanization in Africa. He has taught at a number of local institutions of higher education as well as in Nigeria, Vermont, West Virginia, and California. Lee volunteers at local public schools, and enjoys playing the clarinet, kayaking, and sailing. Follow him on Twitter@ldecola


Slides PDF

Slide 1 Six World Series


Slide 2 What To Measure?


Slide 3 NASA (1972 Dec 7) Apollo 17


Slide 5 Metadata


Slide 6 Are the Data True?


Slide 8 Years


Slide 9 V-Axis: range (var)


Slide 10 V-Axis: [0,2 x mean]


Slide 11 Linear Fit and R-Square


Slide 12 Residuals


Slide 13 Autocorrelation


Slide 14 Autocorrelation of Differences


Slide 15 Exponential Smoothing Forecasts


Slide 16 Nonlinear Fit of Solar


Slide 17 What Causes What?


Slide 18 Precip Predicted by Temp


Slide 19 Per Capita


Slide 20 Units


Slide 21 Causal Paths


Slide 22 What Is To Be Done?


Slide 23 Carbon Emissions


Slide 24 Key Questions


Slide 25 References


Slide 26 Contact Information


Data Science for Six World Series: Time Series Analysis and Forecasting

Data Science was possible because:

1. The 6 data sets were readily downloadable as a CSV and correctly formatted for time series analysis (time as rows and parameters as columns;

2, The 6 data sets were readily imported into Spotfire with 7 tabs as shown in the screen captures slides below; and

3. The Holt-Winters Forecast uses TIBCO Spotfire Enterprise Runtime for R to compute the Holt-Winters filtering of a time series or anything that can be coerced to a time series. This is an exponentially weighted moving average filter of the level, trend, and seasonal components of a time series. The smoothing parameters are chosen to minimize the sum of the squared one-step ahead prediction errors.

The output of a Holt-Winters Forecast is three different curves: a fitted curve showing the general variation of the measure of interest, a forecast curve predicting the future trend and a confidence interval showing how the insecurity increases the further away from the known values the prediction reaches.

TIBCO Enterprise Runtime for R and open-source R return different prediction intervals for multiplicative seasonal models. TIBCO Enterprise Runtime for R assumes that the seasonal and error components are multiplicative in effect and it uses the formula for prediction variance found in section 6.4.2 of Hyndman, et al, 2008. See the references listed in the References section.

The used parameters can be shown in labels or tooltips.

References for Holt-Winters Forecast

Rob J Hyndman and George Athanasopoulos (2013), Forecasting: principles and practice.

Rob J. Hyndman, Anne B Koehler, J. Keith Ord, and Ralph D. Snyder (2008), Forecasting with Exponential Smoothing: the state space approach, Springer.

Please see our November 2, 2015 Data Science for Random Forests: TIBCO Enterprise Runtime for R

Thank you, Lee for an excellent data set! I have enjoyed our association in the past and wish you well with your Data to Insight work and meetup. Best regards, Brand


Six world series data

Six world series data.png

total solar irradiance

total solar irradiance.png

temperature anomaly

temperature anomaly.png



world population

world population.png

carbon emissions

carbon emissions.png

atmospheric CO2

atmospheric CO2.png

Spotfire Dashboard

Please Note: The Holt-Winters Forecast lines/curves shown in the Spotfire screen captures above do not appear in the Spotfire Web Player 

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Error: Embedded data could not be displayed. Use Google Chrome

Research Notes

Page statistics
1406 view(s) and 8 edit(s)
Social share
Share this page?


This page has no custom tags.
This page has no classifications.