Data Science for Six World Series-Time Series Analysis and Forecasting

Last modified

Story

Data Science DC Meetup, Thursday, October 29, 2015

Introduction

For our October Data Science DC Meetup, we're talking about the World Series. No, not that World Series* -- time-series data about the world! Long-time DSDC attendee Lee De Cola will be showing how to think about, analyze, and forecast important world-wide metrics that have been collected for over 50 years. Expect to learn about rigorous ways to analyze temporal data. And maybe there'll be baseball puns.

Abstract

This presentation will use statistical and visualization features of R to explore yearly time series that characterize key global changes since 1950, a period sometimes called the Great Acceleration. Linear regression supplemented with autocorrelation diagnostics can provide most of the key descriptive information about these data, while nonlinear estimation and exponential smoothing can be used to provide forecasts. However, forecasting – in the sense of providing point predictions of future values – should be used descriptively and as providing warnings. Widespread understanding of these data is of surpassing importance to the future welfare of life on Earth.

Participants who would like to follow along are welcome to bring computers loaded with R and to download in advance (WiFi will not be available) the data used for this presentation at ldecola.net/projects/global/ . 

Bio

Lee De Cola runs DATA to Insight a data visualization consulting and training enterprise. For 21 years he was a research scientist at the U.S. Geological Survey in Reston Virginia, where he used GIS and applied statistics to understand landscape dynamics and the health of regions and their inhabitants. Lee has published on land cover analysis, spatial epidemiology, urban systems complexity, fractals in geography, and urbanization in Africa. He has taught at a number of local institutions of higher education as well as in Nigeria, Vermont, West Virginia, and California. Lee volunteers at local public schools, and enjoys playing the clarinet, kayaking, and sailing. Follow him on Twitter@ldecola

Slides

Slides PDF

Slide 1 Six World Series

ldecola@comcast.net

LeeDeCola10292015Slide1.PNG

Slide 2 What To Measure?

LeeDeCola10292015Slide2.PNG

Slide 3 NASA (1972 Dec 7) Apollo 17

LeeDeCola10292015Slide3.PNG

Slide 5 Metadata

LeeDeCola10292015Slide5.PNG

Slide 6 Are the Data True?

LeeDeCola10292015Slide6.PNG

Slide 8 Years

LeeDeCola10292015Slide8.PNG

Slide 9 V-Axis: range (var)

LeeDeCola10292015Slide9.PNG

Slide 10 V-Axis: [0,2 x mean]

LeeDeCola10292015Slide10.PNG

Slide 11 Linear Fit and R-Square

LeeDeCola10292015Slide11.PNG

Slide 12 Residuals

LeeDeCola10292015Slide12.PNG

Slide 13 Autocorrelation

LeeDeCola10292015Slide13.PNG

Slide 14 Autocorrelation of Differences

LeeDeCola10292015Slide14.PNG

Slide 15 Exponential Smoothing Forecasts

LeeDeCola10292015Slide15.PNG

Slide 16 Nonlinear Fit of Solar

LeeDeCola10292015Slide16.PNG

Slide 17 What Causes What?

LeeDeCola10292015Slide17.PNG

Slide 18 Precip Predicted by Temp

LeeDeCola10292015Slide18.PNG

Slide 19 Per Capita

LeeDeCola10292015Slide19.PNG

Slide 20 Units

LeeDeCola10292015Slide20.PNG

Slide 21 Causal Paths

LeeDeCola10292015Slide21.PNG

Slide 22 What Is To Be Done?

LeeDeCola10292015Slide22.PNG

Slide 23 Carbon Emissions

LeeDeCola10292015Slide23.PNG

Slide 24 Key Questions

LeeDeCola10292015Slide24.PNG

Slide 25 References

LeeDeCola10292015Slide25.PNG

Slide 26 Contact Information

ldecola@comcast.net

LeeDeCola10292015Slide26.PNG

Data Science for Six World Series: Time Series Analysis and Forecasting

Data Science was possible because:

1. The 6 data sets were readily downloadable as a CSV and correctly formatted for time series analysis (time as rows and parameters as columns;

2, The 6 data sets were readily imported into Spotfire with 7 tabs as shown in the screen captures slides below; and

3. The Holt-Winters Forecast uses TIBCO Spotfire Enterprise Runtime for R to compute the Holt-Winters filtering of a time series or anything that can be coerced to a time series. This is an exponentially weighted moving average filter of the level, trend, and seasonal components of a time series. The smoothing parameters are chosen to minimize the sum of the squared one-step ahead prediction errors.

The output of a Holt-Winters Forecast is three different curves: a fitted curve showing the general variation of the measure of interest, a forecast curve predicting the future trend and a confidence interval showing how the insecurity increases the further away from the known values the prediction reaches.

TIBCO Enterprise Runtime for R and open-source R return different prediction intervals for multiplicative seasonal models. TIBCO Enterprise Runtime for R assumes that the seasonal and error components are multiplicative in effect and it uses the formula for prediction variance found in section 6.4.2 of Hyndman, et al, 2008. See the references listed in the References section.

The used parameters can be shown in labels or tooltips.

References for Holt-Winters Forecast

Rob J Hyndman and George Athanasopoulos (2013), Forecasting: principles and practice. http://otests.com/fpp/7/1.

Rob J. Hyndman, Anne B Koehler, J. Keith Ord, and Ralph D. Snyder (2008), Forecasting with Exponential Smoothing: the state space approach, Springer.

Please see our November 2, 2015 Data Science for Random Forests: TIBCO Enterprise Runtime for R

Thank you, Lee for an excellent data set! I have enjoyed our association in the past and wish you well with your Data to Insight work and meetup. Best regards, Brand

Slides

Six world series data

Six world series data.png

total solar irradiance

total solar irradiance.png

temperature anomaly

temperature anomaly.png

precipitation

precipitation.png

world population

world population.png

carbon emissions

carbon emissions.png

atmospheric CO2

atmospheric CO2.png

Spotfire Dashboard

Please Note: The Holt-Winters Forecast lines/curves shown in the Spotfire screen captures above do not appear in the Spotfire Web Player 

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Error: Embedded data could not be displayed. Use Google Chrome

Research Notes

Page statistics
1164 view(s) and 8 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments