Tableau Public Data Sets

Last modified

Story

Tableau Public Data Sets for DC Data Science

The recent DC Data Science meetup was July 30, 6:30 PM to 8:30 PM, at GWU, Funger Hall, Room 103, 2201 G St. NW, Washington, DC. The July Meetup included the work of not one, not two, but nine members (actually only 8) of the Data Science community giving lightning talks that included highlights of work presented at recent or upcoming academic conferences, demos of tools and techniques that people are excited about, and more, all in bite-sized packages of less than 10 minutes.

The Agenda was:

6:30pm -- Networking, Empenadas, and Refreshments
7:00pm -- Introduction
7:15pm -- Presentations
8:30pm -- Post presentation conversations
8:45pm -- Adjourn for Data Drinks (Tonic, 22nd & G St.)

Before and during the Data Science DC Meetup, several suggested the presenters provide their slides and data sets (unless classified or confidential) to review and audit their results (see my Research Notes). I summarized the status of this below.

 

Title Presenter Slides Data Comments
Next-Gen Sequencing Data Analysis Harsha Rajasimha, Jeeva Informatics NA NA Did not present
Fundamental Graphic Types Jon Schwabish, CBO Used None Mostly entertainment
Growth Hacker Robert Dempsey, Intridea Used LinkedIn company data See Footnotes Below
Hadoop Summit / GraphLab Review Kevin Coogan, Amalgamood Used None Excellent summary of two conference
GitHub API and Graph Databases Patrick Wagstrom, IBM Research Used Julia download (360 MB executable file) Interesting network analysis
Incentivized Sharing in Social Networks Elena Zheleva, LivingSocial Used Confidential Interesting statistical work
Random Forests and Credit Scoring Dhruv Sharma, FDIC Used ? Interesting statistical work
Analyzing Data Exhaust Natalie Robb, WaveLength Analytics Used Confidential Interesting data science work
Consensus Clustering Tommy Jones, STPI Used ? Interesting statistical work

The one that stands out is highlighted in bold above and will be analyzed in a future blog.

As a Tableau Public user, I also received an email and link to a web page on August 2nd announcing::

"We’re excited to announce today that we’ve extended the capabilities of Tableau Public tenfold. You can now connect to files and spreadsheets that contain up to 1 million rows of data.

Additionally, to accommodate these larger data files, your account’s storage space has been increased from 50 megabytes to 1 gigabyte. And there’s no action required on your part – these changes have already gone into effect. We like you that much.

We’ve taken this step because we noticed that many of the publicly available data sources go beyond the previous limit of 100,000 rows, including:

I decided to work with these publicly available data sources to make them analytic-ready for the DC Data Science Community as follows:

 

Data Sets Downloads (MB) Rows Columns
2011 NYPD “Stop-and-Frisk” data (685K incidents) 197 MB Excel CSV 685,724 99
2012 UK Road Safety data (145K accidents)

 

20 MB Excel CSV

8 MB Excel CSV

15 MB Excel CSV

 

145,571

195,723

265,877

32

14

21

2011 U.S. Medicare Inpatient Charge data (170K rows)

0.3 MB Excel Worksheet

13 MB Excel Worksheet

 

100 and 5,025

8 and 13 and 163,065

4 and 5

1 and 2 and 11

Totals

253.3

1,456,086  
Spotfire File Size 50.1    

 

Last month, I worked with a suggested big data set posted at the Data Science DC and presented the data science results in my Wiki. I plan to do this as a regular service to the data science community.

The results of working in Spotfire with all three of the Tableau publicly available data sources that are nearly 1.5 million rows are shown in the slides below. An example of small adjacent multiple visualizations on the same page that are dynamically linked is shown below.

TableauPublicDataSetsforDCDataScience-SpotfireSlide9.png

 

This shows that the highest number of dealths (42) (Scatter Plot at top) involved two vehicles near the center of England (Map Chart in the middle) around 6 am in the morning (Line Chart on the bottom).

IN PROCESS

Footnotes

Robert Dempsey

Good morning everyone,

Thanks for all of your fantastic questions after the lightning talks. Here are links to everything I mentioned in my talk:

Slides - http://www.slideshare.net/intri...

Surfiki - http://surfiki.com/

Refine (Open Source) - https://github.com/intridea/surf...

Python script for scraping LinkedIn company data (Open Source) - https://github.com/intridea/link...

If you have any questions, give me a call: 888-968-4332 x517 robert.dempsey@intridea.com

Slides

Slide 1 NYC Stop-and-Frisk 2011

TableauPublicDataSetsforDCDataScience-SpotfireSlide1.png

Slide 2 UK Data.gov Road Safety 1

TableauPublicDataSetsforDCDataScience-SpotfireSlide2.png

Slide 3 UK Data.gov Road Safety 2

TableauPublicDataSetsforDCDataScience-SpotfireSlide3.png

Slide 4 Medicare Charge Inpatient

TableauPublicDataSetsforDCDataScience-SpotfireSlide4.png

Slide 5 Medicare Provider

TableauPublicDataSetsforDCDataScience-SpotfireSlide5.png

Slide 6 State Boundaries

TableauPublicDataSetsforDCDataScience-SpotfireSlide6.png

Slide 7 HRR Boundaries

TableauPublicDataSetsforDCDataScience-SpotfireSlide7.png

Slide 8 European Countries and Departments Boundaries

TableauPublicDataSetsforDCDataScience-SpotfireSlide8.png

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Research Notes

Upload UK (43 MB) and Medicare (13.3 MB)

Upload Stop-and-Frisk (197 MB)?

Organize Knowledge Base
Download Files and Examine
Data Dictionaries? Codebook Source: http://www.nyclu.org/files/SQF_Codebook.pdf (PDF)
Need Boundary Files - 
Import All Files With Maximum Memory Available (Mashup)

Google Search for Definitions for UK Road Safety Data Since UK Data.gov Did Not Provide That

Found: https://www.gov.uk/government/organi...out/statistics

Scroll Down: Statistical series
Found: https://www.gov.uk/government/organi...ety-statistics

Found: See key releases, notes and guidance, the pre-release access list and technical information

https://www.gov.uk/transport-statist...ent-and-safety

Found: STATS19 road accident injury statistics – report form [PDF, 628KB, 4 pages]

https://www.gov.uk/government/upload...23/stats19.pdf

Found: Reported Road Casualties in Great Britain: notes, definitions, symbols and conventions [PDF, 23.4KB]

https://www.gov.uk/government/upload...efinitions.pdf

Edward Tufte

Source: http://www.ft.com/intl/cms/s/2/dda1cb5c-f4c0-11e2-a62e-00144feabdc0.html#axzz2aBV8tVGi

By Clive Cookson, FT science editor
The graphics guru has shown us how to visualise data with simplicity, clarity and elegance

http://worrydream.com/

http://d3js.org/

http://www.edwardtufte.com/tufte/

https://twitter.com/EdwardTufte

The 5 Most Influential Data Visualizations of All Time

Source: http://www.tableausoftware.com/asset/5-most-influential-visualizations (PDF)

Discover 5 powerful, beautiful visualizations that changed how people thought about the world.

Data visualization allows us all to see and understand our data more deeply. That understanding breeds good decisions.

"The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey, 1977

Without data visualization and data analysis, we are all more prone to misunderstandings and missed opportunities.

This visual presentation will show show you 5 powerful, beautiful visualizations that changed how people thought about the world.

Slide 1 The 5 Most Influential Data Visualizations of All Time

Tableau5MostInfluentialVisualizationsSlide1.png

Slide 2 About These Visualizations

Tableau5MostInfluentialVisualizationsSlide2.png

Slide 3 5 London Cholera Map - John Snow 1

http://en.wikipedia.org/wiki/John_Snow_(physician)

Tableau5MostInfluentialVisualizationsSlide3.png

Slide 4 5 London Cholera Map - John Snow 2

Tableau5MostInfluentialVisualizationsSlide4.png

Slide 6 3 March on Moscow - Charles Minard 1

Tableau5MostInfluentialVisualizationsSlide6.png

Slide 7 3 March on Moscow - Charles Minard 2

http://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png

Tableau5MostInfluentialVisualizationsSlide7.png

Slide 8 2 War Mortality - Florene Nightingale 1Tableau5MostInfluentialVisualizationsSlide8.png

Slide 9 2 War Mortality - Florene Nightingale 2

http://en.wikipedia.org/wiki/Florence_Nightengale

Tableau5MostInfluentialVisualizationsSlide9.png

Slide 10 1 Chart of Bibography - Joseph Priestley 1

Tableau5MostInfluentialVisualizationsSlide10.png

Slide 11 1 Chart of Bibography - Joseph Priestley 2

http://en.wikipedia.org/wiki/Joseph_priestley

Tableau5MostInfluentialVisualizationsSlide11.png

Slide 12 1 Chart of Bibography - Joseph Priestley 3Tableau5MostInfluentialVisualizationsSlide12.png

Slide 13 1 Chart of Bibography - Joseph Priestley 4

Tableau5MostInfluentialVisualizationsSlide13.png

Slide 14 1 Chart of Bibography - Joseph Priestley 5

Tableau5MostInfluentialVisualizationsSlide14.png

Slide 15 John Tukey 1977

Tableau5MostInfluentialVisualizationsSlide15.png

Slide 17 Learn More

http://tableausoftware.com

Tableau5MostInfluentialVisualizationsSlide17.png

 

Expanding to 1 Million Rows!

Sources: Email and Web: http://www.tableausoftware.com/public/blog/2013/08/one-million-rows-2072

From: Tableau Public [mailto:tableau-public@tableausoftware.com]
Sent: Friday, August 02, 2013 9:32 AM
To: bniemann@cox.net
Subject: Expanding to 1 Million Rows!

Tableau Public

Dear Tableau Public Author,

We’re excited to announce today that we’ve extended the capabilities of Tableau Public tenfold. You can now connect to files and spreadsheets that contain up to 1 million rows of data.

Additionally, to accommodate these larger data files, your account’s storage space has been increased from 50 megabytes to 1 gigabyte. And there’s no action required on your part – these changes have already gone into effect. We like you that much.

Excited? Tweet all about it using the hashtag #AMillionRows and we’ll choose three of you to receive a special prize as a way of saying thanks for your exuberance.

We’ve taken this step because we noticed that many of the publicly available data sources go beyond the previous limit of 100,000 rows, including:

Read more about this exciting change on our blog, and keep a close eye on our Viz of the Day page to see examples of “vizzes” from authors like you making good use of all the extra horsepower. MY NOTE: See Below

Thanks,
Ben Jones
Tableau Public Product Marketing Manager

 

   

What else do we have in store for “Data Month”?
In celebration of “Data Month”, the Tableau Public team has a great line-up of useful blog post for you around the subject of data. Keep an eye on this blog as we go in-depth on the following subjects:

  1. Online data sources
  2. Data scraping
  3. Data shaping
  4. Best practices for 1M rows

We'll publish a few "vizzes" in the coming weeks that put the new data limit to the test, and you can bet our Viz of the Day page will include some great examples from the community of authors, too.

As always, thanks for stopping by, Ben

Viz of the Day

Sharing one beautiful visual story a day.

How It Works

Source: http://www.tableausoftware.com/public/how-it-works

Tableau Public is a free tool that brings data to life. Easy to use. Spectacularly powerful. Data In. Brilliance Out.

Gallery

Source: http://www.tableausoftware.com/public/gallery

Stunning data visualization examples on the web, created with Tableau Public.

Community

Source: http://www.tableausoftware.com/public/community

Join the Conversation

These files provide detailed road safety data about the circumstances of personal injury road accidents in GB from 2005, the types (including Make and Model) of vehicles involved and the consequential casualties. The statistics relate only to personal injury accidents on public roads that are reported to the police, and subsequently recorded, using the STATS19 accident reporting form.

Also includes: Results of breath-test screening data from recently introduced digital breath testing devices, as provided by Police Authorities in England and Wales

Results of blood alcohol levels (milligrams / 100 millilitres of blood) provided by matching coroners’ data (provided by Coroners in England and Wales and by Procurators Fiscal in Scotland) with fatality data from the STATS19 police data of road accidents in Great Britain. For cases when the Blood Alcohol Levels for a fatality are "unknown" are a consequence of an unsuccessful match between the two data sets.

Selected

Road Safety - Accidents 2012 My Note: Excel
Road Safety - Casualties 2012 My Note: Excel
Road Safety - Vehicles 2012 My Note: Excel

Publisher

Department for Transport

Enquiries: FOI Contact:

About this dataset

Additional Information

Developer Tools

The information on this page (the dataset metadata) is also available in JSON format:/api/2/rest/package/road-accidents-safety-data

Read more about this site's CKAN API: About the API

This dataset has a permanent URI:http://data.gov.uk//dataset/road-accidents-safety-data

Medicare Provider Charge Data: Inpatient

The data provided here include hospital-specific charges for the more than 3,000 U.S. hospitals that receive Medicare Inpatient Prospective Payment System (IPPS) payments for the top 100 most frequently billed discharges, paid under Medicare based on a rate per discharge using the Medicare Severity Diagnosis Related Group (MS-DRG) for Fiscal Year (FY) 2011. These DRGs represent almost 7 million discharges or 60 percent of total Medicare IPPS discharges.

Hospitals determine what they will charge for items and services provided to patients and these charges are the amount the hospital bills for an item or service. The Total Payment amount includes the MS-DRG amount, bill total per diem, beneficiary primary payer claim payment amount, beneficiary Part A coinsurance amount, beneficiary deductible amount, beneficiary blood deducible amount and DRG outlier amount.

For these DRGs, average charges and average Medicare payments are calculated at the individual hospital level. Users will be able to make comparisons between the amount charged by individual hospitals within local markets, and nationwide, for services that might be furnished in connection with a particular inpatient stay.

Data are being made available in Microsoft Excel (.xlsx) format and comma separated values (.csv) format.

Inpatient Charge Data, FY2011, Microsoft Excel version My Note: Excel
Inpatient Charge Data, FY2011, Comma Separated Values (CSV) version

National and State level summaries are also available here:

National and State Summaries of Inpatient Charge Data, FY2011, Microsoft Excel version My Note: Excel
National and State Summaries of Inpatient Charge Data, FY2011, Comma Separated Values (CSV) version
 

Inquiries regarding this data can be sent to MedicareProviderChargeData@cms.hhs.gov.

Stop-and-Frisk Data

Source: http://www.nyclu.org/content/stop-and-frisk-data

Summary

An analysis by the NYCLU revealed that innocent New Yorkers have been subjected to police stops and street interrogations more than 4 million times since 2002, and that black and Latino communities continue to be the overwhelming target of these tactics. Nearly nine out of 10 stopped-and-frisked New Yorkers have been completely innocent, according to the NYPD’s own reports:

  • In 2002, New Yorkers were stopped by the police 97,296 times.
    80,176 were totally innocent (82 percent).
  • In 2003, New Yorkers were stopped by the police 160,851 times.
    140,442 were totally innocent (87 percent).
    77,704 were black (54 percent).
    44,581 were Latino (31 percent).
    17,623 were white (12 percent).
    83,499 were aged 14-24 (55 percent).
  • In 2004, New Yorkers were stopped by the police 313,523 times.
    278,933 were totally innocent (89 percent).
    155,033 were black (55 percent).
    89,937 were Latino (32 percent).
    28,913 were white (10 percent).
    152,196 were aged 14-24 (52 percent).
  • In 2005, New Yorkers were stopped by the police 398,191 times.
    352,348 were totally innocent (89 percent).
    196,570 were black (54 percent).
    115,088 were Latino (32 percent).
    40,713 were white (11 percent).
    189,854 were aged 14-24 (51 percent).
  • In 2006, New Yorkers were stopped by the police 506,491 times.
    457,163 were totally innocent (90 percent).
    267,468 were black (53 percent).
    147,862 were Latino (29 percent).
    53,500 were white (11 percent).
    247,691 were aged 14-24 (50 percent).
  • In 2007, New Yorkers were stopped by the police 472,096 times.
    410,936 were totally innocent (87 percent).
    243,766 were black (54 percent).
    141,868 were Latino (31 percent).
    52,887 were white (12 percent).
    223,783 were aged 14-24 (48 percent).
  • In 2008, New Yorkers were stopped by the police 540,302 times.
    474,387 were totally innocent (88 percent).
    275,588 were black (53 percent).
    168,475 were Latino (32 percent).
    57,650 were white (11 percent).
    263,408 were aged 14-24 (49 percent).
  • In 2009, New Yorkers were stopped by the police 581,168 times.
    510,742 were totally innocent (88 percent).
    310,611 were black (55 percent).
    180,055 were Latino (32 percent).
    53,601 were white (10 percent).
    289,602 were aged 14-24 (50 percent).
  • In 2010, New Yorkers were stopped by the police 601,285 times.
    518,849 were totally innocent (86 percent).
    315,083 were black (54 percent).
    189,326 were Latino (33 percent).
    54,810 were white (9 percent).
    295,902 were aged 14-24 (49 percent).
  • In 2011, New Yorkers were stopped by the police 685,724 times.
    605,328 were totally innocent (88 percent).
    350,743 were black (53 percent).
    223,740 were Latino (34 percent).
    61,805 were white (9 percent).
    341,581 were aged 14-24 (51 percent).
  • In 2012, New Yorkers were stopped by the police 532,911 times
    473,644 were totally innocent (89 percent).
    284,229 were black (55 percent).
    165,140 were Latino (32 percent).
    50,366 were white (10 percent).

    About the Data

    Every time a police officer stops a person in NYC, the officer is supposed to fill out a form to record the details of the stop. Officers fill out the forms by hand, and then the forms are entered manually into a database. There are 2 ways the NYPD reports this stop-and-frisk data: a paper report released quarterly and an electronic database released annually.

    The paper reports – which the N.Y.C.L.U. releases every three months – include data on stops, arrests, and summonses. The data are broken down by precinct of the stop and race and gender of the person stopped. The paper reports provide a basic snapshot on stop-and-frisk activity by precinct and are available here.

    The electronic database includes nearly all of the data recorded by the police officer after a stop. The data include the age of person stopped, if a person was frisked, if there was a weapon or firearm recovered, if physical force was used, and the exact location of the stop within the precinct. Having the electronic database allows researchers to look in greater detail at what happens during a stop. Below are CSV files containing data from the 2011 electronic database.

    Downloadable Files

    Click here to download a compressed (.zip) CSV file of the 2012 database. This file is easily imported into most statistical packages, including the freeware R. It contains 101 variables and 532,911 observations, each of which represents a stop conducted by an NYPD officer. Variables include race, gender and age of the person stopped as well as the location, time and date of the stop.

    Click here to download a PDF file of documents and notes that may clarify the dataset. The PDF includes a list and description of variables, a blank stop-and-frisk reporting form (UF-250) and other notes provided by the NYPD.

SQF Codebook

Source: http://www.nyclu.org/files/SQF_Codebook.pdf (PDF)

List of Variables

See Spreadsheet

Crime Codes

See Spreadsheet

Form Page 1

 

SQFCodebookFormPage1.png

Form Page 2

SQFCodebookFormPage2.png

Further Notes

Stop-and Frisk 2003-2012
 
1. Location Coordinates
 
The x- and y-coordinates contained in the 2006 through 2012 data files are coded using the State Plane Coordinate System of 1983. All five boroughs are located in the Long Island Zone (3104). Conversion calculators from these to latitude and longitude coordinates are available online. See example link below. http://www.earthpoint.us/StatePlane.aspx
 
2. Automation of Reports
 
Stop-and-Frisk reports from 2003-2012 were prepared by officers using the attached UF-250 form. The automation and processing of these forms has varied and produced differences in the content of the data files. See below for further details.
 
The NYPD deployed a city-wide, entry-based records management system for the collection of Stop-and-Frisk data in January of 2006. Records from that point on have system-generated location coordinates, a suspected crime field, and a system-generated code for the officer preparing the report.
 
Prior to the city-wide implementation of the entry-based system, selected precincts participated in testing the system during 2004 and 2005. Records from these precincts will have the suspected crime field as well as the system-generated code for the officer preparing the report.
 
Records from part of the last quarter of 2003 and from all precincts not participating in the testing of the records management system were automated by a vendor. Entry fields such as the address of the stop, crime suspected, and stopping officer’s command for these periods were
entered as found on the form by vendor coders.
 
Records from the first three quarters of 2003 and part of the fourth quarter of 2003 were entered by a central processing unit and do not contain location coordinates or a suspected crime field.
 
3. Duplicate Records
 
Content duplicate records have been removed from all of the files.
 
4. Matching Statistical Snapshot
 
File record counts may not match statistical snapshots from the NYPD website due to differences in the date of file creation and the removal of duplicate records from the downloadable files.
 
*This text adapted from the NYPD Stop-and-Frisk Notes.
Page statistics
7716 view(s) and 30 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments