Table of contents
  1. Story
    1. National Priorities for Big Data Science
    2. Wilson Center Crowdsourcing and Citizen Science Projects Map
    3. Alaska Volcano Observatory Citizen Network Ash Collection and Observation Program
      1. Alaska Volcano Observatory Geochemical Database
        1. What is in the database?
        2. What's not (yet) included
        3. Data entry protocols
        4. How to search
        5. Search tips
        6. How to read the .html or .csv report
        7. Sample metadata columns
        8. Reading the rest of the columns
        9. Citing results from this database
        10. Recommended citation for data extracted from the database
        11. Recommended citation for the database as a whole
        12. Where to get more information
    4. Answering Your Questions with My Preferences and Choices
    5. United Nations Millennium Development Goals
      1. UNITE Ideas GitHuh
    6. DARPA PAHO Indicator Series Table
    7. MORE TO FOLLOW
  2. NSF Big Data Regional Innovation Hubs National Meeting
    1. Overview
    2. Public Webinar and Q&A Session
    3. Definition of Regional Hubs
  3. National Priority Challenge Matrix for Big Data Regional Innovation Hubs
  4. Slides
    1. Slide 1 National Priorities for Big Data
    2. Slide 2 Summary
    3. Slide 3 National Data Science Organizers Workshop
    4. Slide 4 Agenda
    5. Slide 5 Speakers
    6. Slide 6 Purpose and Organizers
    7. Slide 7 Key Points
    8. Slide 8 Because I am a Data Scientist and Data Journalist
    9. Slide 9 Who we are?: Definitions
    10. Slide 10 What we do?: October 19th Meetup
    11. Slide 11 Where we do it?: Locations
    12. Slide 12 When we do it?: Meetup Calendar Schedule
    13. Slide 13 Why we do it?: Use Federal Big Data Examples and Technology
    14. Slide 14 How we do it?: Like the NIH Data Commons
    15. Slide 15 How we do it?: OSTP/NSF National Data Science Organizers Workshop
    16. Slide 16 How we do it?: We Already Do This!
    17. Slide 17 How we do it?: Data Mining - Science - Questions - Publication Process
    18. Slide 18 How we do it?: Collaboration for Data Science Win-Wins
    19. Slide 19 Specific Example: Data Science for the Map of Federal Crowdsourcing and Citizen Science Projects for the NDSO Challenge
    20. Slide 20 Federal Crowdsourcing and Citizen Science Toolkit
    21. Slide 21 Map of Federal Crowdsourcing and Citizen Science Projects
    22. Slide 22 Database of Federal Crowdsourcing and Citizen Science Projects: All
    23. Slide 23 Database of Federal Crowdsourcing and Citizen Science Projects: AVO
    24. Slide 24 Submit a New Project
    25. Slide 25 CCS Inventory in Excel Spreadsheet
    26. Slide 26 Spotfire Imports Boundary Files and Spotfire Geocodes Data
    27. Slide 27 CCS Inventory in Spotfire
    28. Slide 28 Anne Bowser
    29. Slide 29 Commons Lab Database
    30. Slide 30 Goal: International network of citizen science data
  5. Spotfire Dashboard
  6. Research Notes for National Data Science Organizers Workshop
    1. Day 1: November 5, 2015:
      1. 12:00 pm – 1:00 pm  Lunch with the Big Data Regional Innovation Hubs Leaders (limited seating)
      2. 2:00 pm – 2:30 pm Opening Keynote: What are the National Priorities? by Thomas Kalil
      3. 2:30 pm – 3:30 pm Session 1: Leadership Panel on Data Science Innovation and Collaboration
      4. 3:30 pm – 5:00 pm Grassroots Data Science Across the Nation with Lightning Talks
      5. 5:00 pm – 5:15 pm Support of grassroots data science, crowd sourcing, and challenges
      6. 6:00 pm – 8:00 pm PUBLIC: Data Drinks: National Data Community Happy Hour!
    2. Day 2: November 6, 2015:
      1. 8:00 am – 10:00 am Session 2: Exposing Data
      2. 10:30 am – 11:30 am Session 3: Coordination and Support of Data Science Meetups
      3. 12:30 pm – 1:30 pm  Lunch Keynote: Data Science in the Government by D.J. Patil
      4. 1:30 pm – 5:30 pm Session 4: The National Priority Challenge
      5. 5:45 pm – 6:00 pm Closing Remarks
  7. 3rd Annual Big Data for Intelligence Symposium, Nov. 17-18, 2015
  8. Big Data and Data Science at UN
  9. Exclusive Interview: Big Data and Data Science at UN
  10. NEXT

National Priorities for Big Data Science

Last modified
Table of contents
  1. Story
    1. National Priorities for Big Data Science
    2. Wilson Center Crowdsourcing and Citizen Science Projects Map
    3. Alaska Volcano Observatory Citizen Network Ash Collection and Observation Program
      1. Alaska Volcano Observatory Geochemical Database
        1. What is in the database?
        2. What's not (yet) included
        3. Data entry protocols
        4. How to search
        5. Search tips
        6. How to read the .html or .csv report
        7. Sample metadata columns
        8. Reading the rest of the columns
        9. Citing results from this database
        10. Recommended citation for data extracted from the database
        11. Recommended citation for the database as a whole
        12. Where to get more information
    4. Answering Your Questions with My Preferences and Choices
    5. United Nations Millennium Development Goals
      1. UNITE Ideas GitHuh
    6. DARPA PAHO Indicator Series Table
    7. MORE TO FOLLOW
  2. NSF Big Data Regional Innovation Hubs National Meeting
    1. Overview
    2. Public Webinar and Q&A Session
    3. Definition of Regional Hubs
  3. National Priority Challenge Matrix for Big Data Regional Innovation Hubs
  4. Slides
    1. Slide 1 National Priorities for Big Data
    2. Slide 2 Summary
    3. Slide 3 National Data Science Organizers Workshop
    4. Slide 4 Agenda
    5. Slide 5 Speakers
    6. Slide 6 Purpose and Organizers
    7. Slide 7 Key Points
    8. Slide 8 Because I am a Data Scientist and Data Journalist
    9. Slide 9 Who we are?: Definitions
    10. Slide 10 What we do?: October 19th Meetup
    11. Slide 11 Where we do it?: Locations
    12. Slide 12 When we do it?: Meetup Calendar Schedule
    13. Slide 13 Why we do it?: Use Federal Big Data Examples and Technology
    14. Slide 14 How we do it?: Like the NIH Data Commons
    15. Slide 15 How we do it?: OSTP/NSF National Data Science Organizers Workshop
    16. Slide 16 How we do it?: We Already Do This!
    17. Slide 17 How we do it?: Data Mining - Science - Questions - Publication Process
    18. Slide 18 How we do it?: Collaboration for Data Science Win-Wins
    19. Slide 19 Specific Example: Data Science for the Map of Federal Crowdsourcing and Citizen Science Projects for the NDSO Challenge
    20. Slide 20 Federal Crowdsourcing and Citizen Science Toolkit
    21. Slide 21 Map of Federal Crowdsourcing and Citizen Science Projects
    22. Slide 22 Database of Federal Crowdsourcing and Citizen Science Projects: All
    23. Slide 23 Database of Federal Crowdsourcing and Citizen Science Projects: AVO
    24. Slide 24 Submit a New Project
    25. Slide 25 CCS Inventory in Excel Spreadsheet
    26. Slide 26 Spotfire Imports Boundary Files and Spotfire Geocodes Data
    27. Slide 27 CCS Inventory in Spotfire
    28. Slide 28 Anne Bowser
    29. Slide 29 Commons Lab Database
    30. Slide 30 Goal: International network of citizen science data
  5. Spotfire Dashboard
  6. Research Notes for National Data Science Organizers Workshop
    1. Day 1: November 5, 2015:
      1. 12:00 pm – 1:00 pm  Lunch with the Big Data Regional Innovation Hubs Leaders (limited seating)
      2. 2:00 pm – 2:30 pm Opening Keynote: What are the National Priorities? by Thomas Kalil
      3. 2:30 pm – 3:30 pm Session 1: Leadership Panel on Data Science Innovation and Collaboration
      4. 3:30 pm – 5:00 pm Grassroots Data Science Across the Nation with Lightning Talks
      5. 5:00 pm – 5:15 pm Support of grassroots data science, crowd sourcing, and challenges
      6. 6:00 pm – 8:00 pm PUBLIC: Data Drinks: National Data Community Happy Hour!
    2. Day 2: November 6, 2015:
      1. 8:00 am – 10:00 am Session 2: Exposing Data
      2. 10:30 am – 11:30 am Session 3: Coordination and Support of Data Science Meetups
      3. 12:30 pm – 1:30 pm  Lunch Keynote: Data Science in the Government by D.J. Patil
      4. 1:30 pm – 5:30 pm Session 4: The National Priority Challenge
      5. 5:45 pm – 6:00 pm Closing Remarks
  7. 3rd Annual Big Data for Intelligence Symposium, Nov. 17-18, 2015
  8. Big Data and Data Science at UN
  9. Exclusive Interview: Big Data and Data Science at UN
  10. NEXT

  1. Story
    1. National Priorities for Big Data Science
    2. Wilson Center Crowdsourcing and Citizen Science Projects Map
    3. Alaska Volcano Observatory Citizen Network Ash Collection and Observation Program
      1. Alaska Volcano Observatory Geochemical Database
        1. What is in the database?
        2. What's not (yet) included
        3. Data entry protocols
        4. How to search
        5. Search tips
        6. How to read the .html or .csv report
        7. Sample metadata columns
        8. Reading the rest of the columns
        9. Citing results from this database
        10. Recommended citation for data extracted from the database
        11. Recommended citation for the database as a whole
        12. Where to get more information
    4. Answering Your Questions with My Preferences and Choices
    5. United Nations Millennium Development Goals
      1. UNITE Ideas GitHuh
    6. DARPA PAHO Indicator Series Table
    7. MORE TO FOLLOW
  2. NSF Big Data Regional Innovation Hubs National Meeting
    1. Overview
    2. Public Webinar and Q&A Session
    3. Definition of Regional Hubs
  3. National Priority Challenge Matrix for Big Data Regional Innovation Hubs
  4. Slides
    1. Slide 1 National Priorities for Big Data
    2. Slide 2 Summary
    3. Slide 3 National Data Science Organizers Workshop
    4. Slide 4 Agenda
    5. Slide 5 Speakers
    6. Slide 6 Purpose and Organizers
    7. Slide 7 Key Points
    8. Slide 8 Because I am a Data Scientist and Data Journalist
    9. Slide 9 Who we are?: Definitions
    10. Slide 10 What we do?: October 19th Meetup
    11. Slide 11 Where we do it?: Locations
    12. Slide 12 When we do it?: Meetup Calendar Schedule
    13. Slide 13 Why we do it?: Use Federal Big Data Examples and Technology
    14. Slide 14 How we do it?: Like the NIH Data Commons
    15. Slide 15 How we do it?: OSTP/NSF National Data Science Organizers Workshop
    16. Slide 16 How we do it?: We Already Do This!
    17. Slide 17 How we do it?: Data Mining - Science - Questions - Publication Process
    18. Slide 18 How we do it?: Collaboration for Data Science Win-Wins
    19. Slide 19 Specific Example: Data Science for the Map of Federal Crowdsourcing and Citizen Science Projects for the NDSO Challenge
    20. Slide 20 Federal Crowdsourcing and Citizen Science Toolkit
    21. Slide 21 Map of Federal Crowdsourcing and Citizen Science Projects
    22. Slide 22 Database of Federal Crowdsourcing and Citizen Science Projects: All
    23. Slide 23 Database of Federal Crowdsourcing and Citizen Science Projects: AVO
    24. Slide 24 Submit a New Project
    25. Slide 25 CCS Inventory in Excel Spreadsheet
    26. Slide 26 Spotfire Imports Boundary Files and Spotfire Geocodes Data
    27. Slide 27 CCS Inventory in Spotfire
    28. Slide 28 Anne Bowser
    29. Slide 29 Commons Lab Database
    30. Slide 30 Goal: International network of citizen science data
  5. Spotfire Dashboard
  6. Research Notes for National Data Science Organizers Workshop
    1. Day 1: November 5, 2015:
      1. 12:00 pm – 1:00 pm  Lunch with the Big Data Regional Innovation Hubs Leaders (limited seating)
      2. 2:00 pm – 2:30 pm Opening Keynote: What are the National Priorities? by Thomas Kalil
      3. 2:30 pm – 3:30 pm Session 1: Leadership Panel on Data Science Innovation and Collaboration
      4. 3:30 pm – 5:00 pm Grassroots Data Science Across the Nation with Lightning Talks
      5. 5:00 pm – 5:15 pm Support of grassroots data science, crowd sourcing, and challenges
      6. 6:00 pm – 8:00 pm PUBLIC: Data Drinks: National Data Community Happy Hour!
    2. Day 2: November 6, 2015:
      1. 8:00 am – 10:00 am Session 2: Exposing Data
      2. 10:30 am – 11:30 am Session 3: Coordination and Support of Data Science Meetups
      3. 12:30 pm – 1:30 pm  Lunch Keynote: Data Science in the Government by D.J. Patil
      4. 1:30 pm – 5:30 pm Session 4: The National Priority Challenge
      5. 5:45 pm – 6:00 pm Closing Remarks
  7. 3rd Annual Big Data for Intelligence Symposium, Nov. 17-18, 2015
  8. Big Data and Data Science at UN
  9. Exclusive Interview: Big Data and Data Science at UN
  10. NEXT

Story

National Priorities for Big Data Science

The National Data Science Organizers Workshop Registration says: Each attending organization must submit a white paper or memo (~500 words) describing their organization and some aspect of their organization's process that would be of value to other organizers. Please link us to it here (identical submissions by members of the same organization allowed), provide the contents here, or submit directly to webmaster@ndso.io and fill this field with the email address you sent it from. My Note: I submitted the link to this story.

The National Data Science Organizers Workshop The National Priority Challenge says: Our responsibility, as grassroots organizers for data science across the nation, is to foster civic engagement, provide valuable solutions, engender stronger community, and jumpstart data science education and recruitment in the United States. During this working session the NDSO Steering Committee will lead everyone in a working session to formalize the National Priority Challenge for 2016.

My National Data Science Organizers Workshop White Paper contains:

I would like to see three things come out the NDSO Workshop:

  1. Government and Meetups identify and curate government and non-government data sets for the Wilson Center Citizen Science Project Map.
  2. Population of progress matrices at the 4 Data Hubs over the next year by their outreach efforts to the Meetups like our example for the South Region Data Hub
  3. A monthly or bi-monthly joint Federal and Data Science/Big Data Meetups led by you and Kristen like we have tried to do in the Federal Big Data Working Group Meetup the past two years (identify a data set, data science the agency needs, and data science the meetups can do on it)

These three things are described in my two presentations to the NSDO Workshop

Wilson Center Crowdsourcing and Citizen Science Projects Map

The Federal Big Data Working Group Meetup and the Wilson Center are working together to create a column or two that provides links to actual data sets for the 104 projects for 20 agencies.

Anne Browser: Thanks very much for inviting me to your meetup last night. I found the talks and attendees very interesting, especially Peter Morosoff, who already followed up with some advice on developing our citizen science ontology. I also enjoyed learning more about Spotfire; it looks like an interesting tool (I wonder how many hours it would take to learn how to use it...)

The Wilson Center welcomes any improvements you and your colleagues would like to make to our database of crowdsourcing and citizen science, for example by adding links to data sets or by creating visualizations such as the one you presented.

Please let me know how we can facilitate this work. I know that while you have numerous tools to scrape data, there are also things that we can do to be helpful. For example:

  • Would it be helpful for us to upload the source code onto GitHub?
  • Would it be helpful for us to create a parallel website for others to modify directly (e.g., to add in different fields to the database)?
  • Would it be helpful to share login credentials to CartoDB, which hosts our data (and allows the direct addition of different fields)?
  • Are the APIs helpful- or are your tools better?
  • Because we are hoping to share projects with other databases, we have an evolving list of project metadata. We hope that any new DB fields would build off (and potentially improve) this work in progress.  What is the best way to share these efforts?

Brand Niemann: Anne, Thank you for your participation and followup email.

I have found Spotifre to be very easy to learn because I learned and have used spreadsheets for many years. Once the data is in a proper spreadsheet format (e.g. tall and thin for time data by row and single row columns for column headers), then you import it into Spotifre and pick from about a dozen icons for statistics and display.

You actually get an initial visualization upon import to start with and then you can use a tool to explore other possible visualizations of the same data.

You can get a 30-day free download and a free version for non-profits, universities, etc. like I have.

See http://spotfire.tibco.com/

So adding links for actual data sets is, like I said last night,  going to your new first item: Advancing Energy Efficiency in Buildings at http://web.ornl.gov/sci/buildings/jump/

and looking for actual data sets. I do not see an obvious one here, but know there is one for the next:

Alaska Volcano Observatory Citizen Network Ash Collection and Observation Program

http://www.avo.alaska.edu/ashfall/ashreport.php

I looked under Library: http://www.avo.alaska.edu/downloads/index.php

And found: AVO's Geochemical Database - contains published whole-rock data for Quaternary volcanic rocks in Alaska, linked to geologist, publication, source volcano (where possible), and other sample and analysis metadata.

Which goes to: https://www.avo.alaska.edu/geochem/

Which then has a Spreadsheet (CSV): https://www.avo.alaska.edu/geochem/data.php?t=1445359879&view=csv

Which I have looked at in Spotfire.

So I suggest you download and install the 30 day free trial of Spotfire and import this CSV file (do File, Open, and go to the directory with that CVS file and Click Open and see what you get and try various visualizations in the icons in the Tool bar on Top.

Alaska Volcano Observatory's geochemical database contains published whole-rock data for Quaternary volcanic rocks in Alaska, linked to geologist, publication, source volcano (where possible), and other sample and analysis metadata. This database also contains water cation and anion data for recent AVO studies. This website allows users to query the database and return datasets as fully-documented .html or .csv tables. It is our intention to update this dataset as new volcano-related geochemical data is published.

Although we have made an extensive effort to provide the best data possible for each sample and analysis—often locating little-used references and untangling changing sample nomenclatures through time—we expect that routine use of the database will still uncover some errors. We ask users to please let us know of any they might find (cheryl.cameron@alaska.gov or seth.snedigar@alaska.gov), and to always check the original reference as the final authority.

Alaska Volcano Observatory Geochemical Database

Source: https://www.avo.alaska.edu/geochem/help.php

Cheryl Cameron, Seth Snedigar, and Chris Nye


This document provides information necessary to use the Alaska Volcano Observatory's (AVO) geochemical database including search functions, limitations, and an explanation of data reporting formats and data entry practices, as well as how to reference the data used.

Although we have made an extensive effort to provide the best data possible for each sample and analysis—often locating little-used references and untangling changing sample nomenclatures through time—we expect that routine use of the database will still uncover some errors. We ask users to please let us know of any they might find (cheryl.cameron@alaska.gov), and to always check the original reference as the final authority.

What is in the database?
Alaska Volcano Observatory's geochemical database contains published whole-rock data for Quaternary volcanic rocks in Alaska, linked to geologist, publication, source volcano (where possible), and other sample and analysis metadata. This database also contains water cation and anion data for recent AVO studies. This website allows users to query the database and return datasets as fully-documented .html or .csv tables. It is our intention to update this dataset as new volcano-related geochemical data is published.
What's not (yet) included
  • Isotope data
  • Individual grain microprobe data
  • Mineral analyses
  • Robust collection of glass analyses. Some glass analyses have been entered into the database as a test of how well this current database structure stores glass analyses and their metadata. Further glass data analyses and metadata will be added as part AVO's tephra database effort (starting in 2014).
Data entry protocols
  • If an analysis is republished, in whole or in part (e.g., someone publishes Robert Coats' wet chemistry Fe analyses from 1961 and re-analyzes the sample for other major oxides), the republished analysis is not re-entered in the database. This may result in major oxide analyses that appear to be missing elements (they are!) and other questions that are explained in the metadata for that particular analysis.
  • We store analyses in the database as they were published, but do apply a standard normalization routine (normalize all elements to 100% without volatiles, and convert all Fe to FeOT) for consistent output. If you need non-normalized results, please contact us. A notable exception to this practice is data published by Wes Hildreth and Judy Fierstein – they publish analyses as normalized to 99.6%, leaving 0.4% for halogens and other unanalyzed elements. In this database, their data are presented as published.
  • Handling iron values: When an analysis reports Fe2O3, we convert Fe2O3 to FeO with a conversion factor of 0.8998.
  • We have adjusted the results for nearly all samples analyzed by inductively coupled plasma mass spectrometry (ICPMS) at Washington State University prior to 2007, correcting calibration errors that were present in the original data reports. This often explains why trace element data in a publication may differ from trace element data stored in the database. The recalibrated values are the most accurate values.
  • Obvious typographical errors in original publications (Ti2O instead of TiO2, major-oxide analysis missing one element but having duplicates of another, misplaced decimal points, etc.) have been corrected at the time of database entry. Less decipherable errors (such as a value greater than 700 weight percent for TiO2) have been entered as is.
  • When entering sample metadata, we parse the published text into searchable database fields; this works better on some descriptions than others.
  • Significant figures: Because of the nature of spreadsheets and database field formats, trailing zeros are truncated. Other than trailing zeros, we store significant figures as the values were published. There are two exceptions to this practice: (1) in the case of recalibrated ICP-MS data from Washington State University GeoAnalytical Laboratory (WSU), we store the values (often up to 13 decimal places) as they were returned from the lab with recalculation; and (2) for publications containing WSU XRF and ICP data obtained by AVO, where the geochemical data were entered into the database ahead of the publication, data are entered into the database as they arrive from the lab and may appear here with more significant figures than authors chose to publish. For information on the precision of WSU analyses, please contact their laboratory (http://soe.wsu.edu/facilities/geolab/technotes/ ).
How to search
To create a search query: Click the "Set" links to the right of each search parameter to create search constraints.

Searching within a sub-category uses an "OR" search; thus, searching for two volcanoes (for instance, Augustine OR Redoubt) will return results from both volcanoes. Searching two or more categories at the same time uses an "AND" search. Searching "[Augustine OR Redoubt] AND Year = 2009" will return samples from Augustine collected in 2009 and samples from Redoubt collected in 2009. Searches may return an empty results set—no samples match the parameters entered. If this happens to you, try broadening your search terms. If you want to view the results directly in your browser, click the "HTML" link, under the Data Available heading [View data as HTML or Spreadsheet (CSV)]. If you view data as .html webpage, the results contain links to further information. For example, clicking on the Sample ID will bring up the full sample description, clicking on an eruption date will bring a description of the eruption, and clicking on a reference number will link to a page containing the full citation information). If your search parameters return a large number of samples (more than 100) the size of the resulting table may significantly slow your browser or cause it to crash completely. We recommend using the Spreadsheet (CSV) option for larger result sets.

Search tips
  • Sample ID
    • Clicking the "Set" prompt to the right of "Sample ID" opens a new window to enter a complete or partial Sample ID. This query automatically looks for Sample IDs that match a partial entry. For example, to search all samples beginning with 09RDKLW enter 09RDKLW. Small differences in Sample ID characters may foil this search as it is character-sensitive. For example, entering 09-RD-KLW would return no results.
  • Author
    • To search by author, click "Set" next to the "Author" search parameter, and then select one or more authors. For example, selecting "Bean Kirby Wendell" would return all analyses contained in publications where Kirby Bean was an author. Some authors have multiple name variants stored in the database; you may need to select more than one name per person.
  • Reference
    • To search by more detailed reference information, click "Set" next to the "Reference" search option, and then locate your reference of interest by typing a partial title in the "TITLE" field, an author name in the "AUTHOR'S LAST NAME" field, or a publication year or range of years in "PUBLICATION YEAR", "EXACT YEAR", "MIN" and "MAX".
  • Eruption
    • To search samples from a specific eruption, click "Set" by the "Eruption" search parameter. Then choose one or more historical eruptions from the list. AVO's database does not yet catalog prehistoric eruptions. This search will retrieve samples erupted during a specific historical eruption. Keep in mind that many eruptions do not have a known end date. Please also note that this list of eruptions is restricted to eruptions with samples, and is not a comprehensive list of all historical eruptions in Alaska.
  • Material
    • To search by the material used in the analysis, click "Set" next to the "Material" search parameter. This parameter is the only mandatory parameter and has a default search of "Whole Rock". At this time, the database contains a robust set of whole rock analyses, but only very incomplete analyses of other materials.
  • Chemistry
    • To constrain search results to only those samples that meet particular chemical constraints, click "Set" next to the "Chemistry" option. In the next window, choose a major element from the drop-down list and enter a minimum or maximum percentage for that element. Although you can set constraints for multiple major elements in this search, at this point it functions like other sub-category searches and will join multiple elements using "OR". A last caveat about chemistry searches: this search uses values as entered in the database, normalized or not normalized, although all search results are returned as normalized values. Some search results may include normalized values that were not in your search parameter because their non-normalized value is within search parameters.
  • Volcano(es)
    • To search by volcano, click "Set" by the "Volcano(es)" search option. Then select one or more volcanoes. This database contains some samples that are not linked to a specific source volcano.
  • Location
    • To search samples based on geographic location, click "Set" next to the "Location" search parameter. After the map loads, zoom to your area of interest, and click "Start Drawing Polygon" to begin drawing a polygon on the map using mouse clicks to create polygon vertices. Double-click to close the polygon. Click "Submit" when you are satisfied with your polygon. Samples that do not have a specific latitude and longitude have been assigned the default latitude and longitude of their source volcano.
At the top of the HTML search results are three links:
  • Download this Data:will give you a spreadsheet (csv) of the current result set
  • Edit Search:will take you back to the search form, saving your current selected parameters
  • New Search:will reset all the search parameters, letting you start a new search from scratch
How to read the .html or .csv report
Although the database contains whole-rock data from a variety of sources, with different methods, analytes, standards, and precisions, we recognize that many users will want to obtain a large dataset assembled from many analyses and attempt to directly compare samples within a grouping of the user's own making. Because many samples have several analyses by different methods, it is not a straightforward task of simply creating an output and ordering all the rows. Having many rows per sample (say, one per analysis) is not helpful, and having one row per sample but with many columns for the same analyte is also not preferred. In this database we evaluate the analyte and the method, and preferentially display those analyte-method pairs that we believe are likely to present the "best" analysis for a given sample. Multiple rows per sample may be returned when a sample has been analyzed multiple times for element-method combinations that our output file places in the same grouping. For example, publication A may report major oxides by XRF, which would display in the first section of major oxide data. Publication B reports major oxide data for the same sample, analyzed by EMP, also displayed in the first section of major oxide data. This sample would then have two output rows. We welcome user thoughts and suggestions regarding our data output process – please let us know what would be helpful to you. For all cases, we currently store much more metadata for the samples and analyses than are displayed, but are trying to determine the best ways of presenting them to users.

References for each analysis are noted in results columns beginning with "Ref" (such as REF majors and Ref trace 1). Numbers found within these columns are keyed to a list of references at the bottom of the .html or .csv output table, along with information about the methods used to obtain the data.

Sample metadata columns
Column Label Description Comments
StationID he alphanumeric descriptor of the sample's station. For samples published without a station identifier, we use the sample's id as a station id. Because this database contains station and sample data from geologists working for many different organizations over more than 100 years, station and sample identifiers are not unique. For samples with published analyses with different names, we have attempted to consolidate all analyses under the first-used identifier or, in the case of very non-unique identifiers, the most descriptive.
Latitude Latitude of sample location, if known (WGS84 datum). Some samples are given a location determined either from sample notes or, if no sample-specific location is available, the default location of the volcano itself.
Longitude Longitude of sample location, if known (WGS84 datum).  
geologist Name of the geologist who collected the sample, if known.  
DateVisited Date the sample was collected. In cases where only the year the sample was collected is known, samples may be assigned a "default" collection date of January 1 for that year.
Volcano Volcano from which the sample was collected, if known.  
Eruption Start date of the eruption that produced this sample, if known.  
LocationDesc Text description of the sample locality, if published.  
TextDesc Text description of the sample, if published.  
SampleID Alphanumeric descriptor of the sample, if published. For samples published with a very non-unique sample id (e.g., several "1"s in the same publication) we have begun adding more information to the sample id, such as "Kosco_Katmai_1".
Material Material from the sample that was used for the analysis.  
Reading the rest of the columns
Column Labels Description Methods Comments
Major Oxides (first grouping) SiO2, TiO2, Al2O3, FeOT, MnO, MgO, CaO, Na2O, K2O, P2O5 XRF, DCP, NN (unknown), RAPIDA, WET, ICP, EMP, ICPAE, AAS Major oxides analyzed by methods not listed here are reported farther to the right (see "other majors" row below) in the results spreadsheet. Major oxide results are normalized to 100%, volatile free, with all Fe represented as FeOT.
Major Oxides supplemental data Total-majors, METH majors, Fe2O3/Fe2O3T orig, FeO/FeOT orig, Volatiles csv, METH volatiles   Total-majors: the non-normalized analytical total; METH majors: analysis method(s) for that row; Fe2O3/Fe2O3T orig: the original Fe2O3 value reported; FeO/FeOT orig: the original FeO value reported; Volatiles csv: H2O, H2OM, H2OP, LOI, and CO2 reported within a text string in this column; METH volatiles: analysis method(s) for volatiles.
Major Oxides references REF majors   Citation identification number for each analysis. Numbers contained in this column are keyed to full citation information found at the bottom of the results table.
First grouping of trace elements (trace1) Cs, Rb, Ba, Sr, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Y, Zr, Nb, Hf, Ta, Pb, Th, U, Sc, V, Cr, Fe, Co, Ni, Cu, Zn, Ga, Mo, As, Na, K ICPMS, INAA, SS-ID  
Trace1 references and methods REF trace1, METH trace1    
Second grouping of trace elements (trace2) Rb, Ba, Sr, La, Ce, Nd, Sm, Eu, Gd, Dy, Er, Yb, Lu, Y, Zr, Nb, Pb, Th, U, Sc, Ti, V, Cr, Ni, Cu, Zn, Ga XRF, DCP  
Trace2 references and methods REF trace2, METH trace2    
Light elements Light csv Light elements analyzed by a method not listed in trace1 or trace 2 Li, Li2O, Be, B, reported as a text string with value, method, and reference for each.
Halogen elements Halogen csv Halogen elements analyzed by a method not listed in trace1 or trace 2 F, S, SO3, Cl, Cl2, and Br, reported as a text string with value, method, and reference for each.
Other majors Other majors csv Major oxides analyzed by a method not listed in the first major oxides grouping SiO2, TiO2, Al2O3, FeOT, Fe2O3 orig, MnO, MgO, CaO, Na2O, K2O, P2O5, and Total-majors reported as text string with value, method, and references.
Other lile Other lile csv LILE elements analyzed by a method not listed in trace1 or trace 2 K, Rb, Sr, SrO, Cs, and Ba, reported as a text string with value, method, and reference for each.
Other REE Other ree csv REE elements analyzed by a method not listed above in trace1 or trace 2 La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, reported as a text string with value, method, and reference for each.
Other HFSE Other hsfe csv HFSE elements analyzed by a method not listed in trace1 or trace 2 Y, Zr, ZrO, Nb, Hf, and Ta reported as a text string, with value, method, and reference for each.
Other HPE Other hpe csv HPE elements analyzed by a method not listed in trace1 or trace 2 Pb, Th, and U, reported as a text string, with value, method, and reference for each.
Other TM Other tm csv TM elements analyzed by a method not listed in trace1 or trace 2 Mg, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Mo, and Bi, reported as a text string with value, method, and reference for each.
Other misc Other misc csv Miscellaneous elements analyzed by a method not listed in trace1 or trace 2 Ge, As, Pd, Ag, Cd, In, Sn, Sb, W, Pt, Au, Hg, Tl, reported as a text string with value, method, and reference for each.
Citing results from this database
For each sample analysis, a reference id is given in columns that begin with "ref" (see the tables above in "How to read the .html or .csv report" section. These reference ids are keyed to full citation information in lists at the bottom of the .html page or .csv spreadsheet. For .html reports, the Citation IDs are also clickable links to publication information. When citing data from this database, all original references and this database should be cited.
Recommended citation for data extracted from the database

<full source citation from reference list for analysis(ses) used>in Cameron, C.E., Snedigar, S.F., and Nye, C.J., 2014–, Alaska Volcano Observatory Geochemical Database: Alaska Division of Geological & Geophysical Surveys Digital Data Series 8, https://www.avo.alaska.edu/geochem/index.php, doi:10.14509/29120

Recommended citation for the database as a whole

Cameron, C.E., Snedigar, S.F., and Nye, C.J., 2014, Alaska Volcano Observatory Geochemical Database: Alaska Division of Geological & Geophysical Surveys Digital Data Series 8, https://www.avo.alaska.edu/geochem/index.php, doi:10.14509/29120

Where to get more information

Please contact Cheryl Cameron (cheryl.cameron@alaska.gov) or Seth Snedigar (seth.snedigar@alaska.gov) if you have further questions or comments about this dataset or the web interface.

We thank Barbel Sarbas (GEOROC), Kerstin Lehnert (earthchem.org), and the DGGS GERILA team for sharing their database knowledge and assisting with our data table construction.

Answering Your Questions with My Preferences and Choices

  • Would it be helpful for us to upload the source code onto GitHub?
    • I do not use GiHub because my MindTouch and Spotfire provide a GitHub and more.
  • Would it be helpful for us to create a parallel website for others to modify directly (e.g., to add in different fields to the database)?
    • I use a state-of-the-art Wiki (MindTouch) where all of my files are attached at the bottom of the page and versioned, including all the CSV/XLSX files so this can be done by anybody that wants to. MindTouch also provides 4 user options that I can set for collaboration or not:
      • Public: everybody can view and edit
      • Semi-Public: everybody can view, but only selected users can edit
      • Semi-Private: everybody can access content, but only selected users can find and edit
      • Private: only selected users can view and edit this page
  • Would it be helpful to share login credentials to CartoDB, which hosts our data (and allows the direct addition of different fields)?
    • MindTouch (AWS Cloud Library) and Spotfire (AWS Cloud Library) does this for me because Spotfire contains the Open Street map for the entire world and many other boundary files and automates changes to the CSV/XLSX showing up in Spotfire. Just add or change the data in the CSV/XLSL and refresh Spotifre.
  • Are the APIs helpful- or are your tools better?
    • MindTouch and Spotfire have APIs "out of the box" so you do not have to "create them so to speak", but it is even better because it provides what the new NIH Digital Commons is  trying to accomplish, namely: data, tools, and results all together in the cloud that supports the new FAIR Principles (Findable, Accessible, Interoperable, and Reusable).
    • See our August 17th Meetup: A NIH – Semantic Medline Data Science Data Publication Commons http://www.meetup.com/Federal-Big-Data-Working-Group/events/223222934/
    • Because we are hoping to share projects with other databases, we have an evolving list of project metadata. We hope that any new DB fields would build off (and potentially improve) this work in progress.  What is the best way to share these efforts?
    • For me and our 1000+ member meetup, it has been MindTouch and Spotfire like we use than at: http://semanticommunity.info/

DataDriven.org: I registered, created a Profile, and looked for an interesting challenge and found one with actual data:

United Nations Millennium Development Goals

Source: http://www.drivendata.org/competitions/1/

See Big Data and Data Science at UN (in process)

  • Background

In the year 2000, the member states of the United Nations agreed to a set of goals to measure the progress of global development. The aim of these goals was to increase standards of living around the world by emphasizing human capital, infrastructure, and human rights.

The eight goals are:

  1. To eradicate extreme poverty and hunger
  2. To achieve universal primary education
  3. To promote gender equality and empower women
  4. To reduce child mortality
  5. To improve maternal health
  6. To combat HIV/AIDS, malaria, and other diseases
  7. To ensure environmental sustainability
  8. To develop a global partnership for development
  • Competition End Date: Feb. 12, 2016, noon

This competition is for learning and exploring, so the deadline may be extended in the future.

  • Task

The UN measures progress towards these goals using indicators such as percent of the population making over one dollar per day. Your task is to predict the change in these indicators one year and five years into the future. Predicting future progress will help us to understand how we achieve these goals by uncovering complex relations between these goals and other economic indicators. The UN set 2015 as the target for measurable progress. Given the data from 1972 - 2007, you need to predict a specific indicator for each of these goals in 2008 and 2012.

Remember, this competition is just for fun. While it would be trivial to get the actual results for the years you are predicting, our goal is to build the best predictive models. We ask that you limit yourself to the data for download here when tackling this challenge.

  • Next Steps

It's time to get down to business. You can start out by researching the issues using the resources below. You could dive headfirst into the data after signing up to compete. You could explore the lay of the land by seeing who has built the best model so far. Or, you can team up with your friends and colleagues to tackle the challenge together.

  • Resources

For more information on the Millennium Development Goals, visit one of the following resources:

http://www.un.org/millenniumgoals/
http://en.wikipedia.org/wiki/Millenn...elopment_Goals
http://unstats.un.org/unsd/mdg/Host....ficialList.htm

For more information on the World Bank data, please visit their data portal: http://data.worldbank.org/

UNITE Ideas GitHuh

Source: https://github.com/UniteIdeas?tab=repositories

Please Note: I added 6 of the 19 data sets to the Spreadsheet as the best examples.

Achieving-which-target-helps-a-country-not-being-LDC

2 data sets
Ms. Yilin Wei - Solution for #WSD2015
Updated 6 days ago
R  0   0
 

Global-Trends-in-HIVAIDS

3 data sets
Ms.Annette Brook - Solution for #WSD2015
Updated 6 days ago
 0   0


Exploring-changes-in-primary-education

0 data sets
Mr. Kidus Asfaw - Solution for #WSD2015
Updated 6 days ago
CSS  0   0


We-will-be

0 data sets
Mr. Piero Savastano - Solution for #WSD2015
Updated 6 days ago
JavaScript  0   0


Is-the-world-a-better-place-today

3 data sets
Dr. Jeremy Boy - Solution for #WSD2015
Updated 6 days ago
JavaScript  0   1


Correlation-of-Child-Mortality-with-World-Bank-Development-Indicators

0 data sets
Mr. Sherif Mostafa - Solution for #WSD2015
Updated 6 days ago
JavaScript  0   0


Parallel-coordinates-visualization

1 data sets
Mr. Dominik Cygalski - Solution for #WSD2015
Updated 6 days ago
JavaScript  0   0


UN_challenge_HIV-master

0 data sets
Ms. Emily Schuch - Solution for #WSD2015
Updated 6 days ago
JavaScript  0   1


Behind-the-scenes-of-the-UN-Millennium-Goal-Development-Report

0 data sets
Dr. Katharina Rasch - Solution for #WSD2015
Updated 6 days ago
R  0   1


ExploreChange

5 data sets
Author: Mr. Stephan Schlögl - Solution for #VisualizeChange
Updated 15 days ago
HTML  0   0


WHS-documents-navigator

0 data sets
Author:Mr. Thomas Fournaise - Solution for #VisualizeChange
Updated 15 days ago
JavaScript  0   0


investigation_of_humanitarian_topics

1 data sets
Author: Ms. Yilin Wei - Solution for #VisualizeChange
Updated 15 days ago
JavaScript  0   1


WHSConsultationsBrowser-gh-pages

0 data sets
Author: Mr. Dominik Cygalski - Solution for #VisualizeChange
Updated 15 days ago
JavaScript  0   0


Partner-Concentration-of-Trade

3 data sets
A visualization showing international trade between countries.
Updated on Jul 26
HTML  0   0


Staff-of-the-Secretariat-and-Related-United-Nations-Entities

0 data sets
A graphical approach to display staff headcount at the United Nations Secretariat
Updated on Jul 26
JavaScript  0   0


Security-Council-Resolution-References-Arcs-

0 data sets
This data visualization shows how security council resolutions reference each other. The full text data was parsed from PDFs available through the UN Official Document System.
Updated on Jul 9
JavaScript  0   0


UN_Security_Council_Resolutions_Relationships_Explorer

1 data sets

Updated on Jul 9

DARPA PAHO Indicator Series Table

http://www.darpa.mil/news-events/2015-05-27

http://www.cdc.gov/chikungunya/

http://www.cdc.gov/chikungunya/geo/index.html

http://www.paho.org/hq/index.php?option=com_content&view=article&id=2470&Itemid=2003&lang=en

http://ais.paho.org/phip/viz/mfr_indicatorserietable.asp

To download data, go to Export tool (tool bar at the bottom of the table) and select option Crosstab. Selected All Indicators, Country Name, and Years

Exports as a PDF and Convert to Excel.

The Excel file is essentially unusable.

MORE TO FOLLOW

NSF Big Data Regional Innovation Hubs National Meeting

Source: https://www.usenix.org/conference/bdhubs15

Overview

The NSF Big Data Regional Innovation Hubs National Meeting is taking place November 3-5, 2015, in Arlington, Virginia.

In mid-2014, the National Science Foundation (NSF) issued a Request for Input entitled, Accelerating the Big Data Innovation Ecosystem, to explore the establishment of a national network of ”Big Data Regional Innovation Hubs”. These hubs would help scale up the activities and partnerships established over the past 3 years by the National Big Data R&D Initiative, as well as stimulate new regional and grassroots partnerships in this field. They could help accelerate Big Data solutions to global and societal challenges by convening stakeholders across sectors to partner in results-driven programs and projects; act as a matchmaker among the various academic, industry, and community stakeholders to help drive successful pilot programs; help share best practices; help accelerate technology transfer between universities, public and private research centers and laboratories, large enterprises, and small- and medium-sized businesses; facilitate engagement with opinion and thought leaders on the societal impact of Big Data technologies; and support education and training in the new interdisciplinary field of Data Science.

A series of four intensive planning activities (charrettes) were held earlier this year with the objective of bringing together various stakeholders in order to map solutions paths. USENIX conducted these charrettes to bring together academic, non-profit, governmental, and business communities throughout the country to form grassroots regional partnerships to foster and propel Big Data approaches across all sectors. These communities represented stakeholders in the Big Data ecosystem, including corporations, universities, philanthropies, non-profits, and state and local governments.

Each charrette was an intensive, one-day design and planning workshop with the objective of convening stakeholders in that region around a common set of Big Data challenges —particularly those that were especially relevant to that region. Each charrette helped establish a regional consortium to build upon existing efforts within the region.

The charrettes took place within the defined regions of the West, Midwest, South, and Northeast (see below).

Public Webinar and Q&A Session

Learn more about the Big Data Regional Innovation Hubs on Thursday, November 5, 2015, at 11am (ET) on a public webinar and Q&A session.

Leaders from all four hubs will present their next steps and future plans.

For those not yet involved with the Big Data Hubs, this is a great opportunity to join in the conversation!

Audio will be via an operator assisted conference call at:

  • Telephone Number: 1-517-308-9353 or 888-282-0568 (Toll Free)
  • Telephone Participant Passcode: CISE

Questions? Contact bdhubs15questions@usenix.org.

Definition of Regional Hubs

The regional breakdown of US states and the District of Columbia is as follows (adapted from https://www.census.gov/econ/census/help/geography/regions_and_divisions.html)

  • NORTHEAST: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont;
  • MIDWEST: Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, and Wisconsin;
  • SOUTH: Alabama, Arkansas, Delaware, District of Columbia, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, and West Virginia; and
  • WEST: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming.

National Priority Challenge Matrix for Big Data Regional Innovation Hubs

The Wilson Center Map of Citizen Science Projects could become part of the BDHubs2015 activity and individuals and Meetups would get credit for three things: finding a data set, analyzing/visualizing that data set, and doing a data science story for their work.

The National Priority Challenge would be to complete this matrix (104+ projects by three tasks) within the next year. The Hub could host the National Priority Challenge Matrix and coordinate the work across meetups.

 

Project Name Finding a Data Set Analyzing/Visualizing Data Set Data Science Story For Data Set
Advancing Energy Efficiency in Buildings      
Alaska Volcano Observatory Citizen Network Ash Collection and Observation Program Spreadsheet Spotfire Dashboard Slide 21 EarthCube2015 Data Science Publication: Spotfire Volcanoes AKO & Augustine
Aurorasaurus      
Citizen Archivist       
Coastal Observation And Seabird Survey Team (COASST)      
Common Loon Citizen Science Project at Glacier National Park      
CrowdHydrology      
CrowdMag      
Cyclone Center      
DC/Baltimore Cricket Crawl      
DISCOVER-AQ      
Did You Feel It?      
Disk Detective      
Dolphin & Whale 911 Smartphone App      
EDDMapS      
EcoCast: Improving Ecological and Economic Sustainability of Marine Fisheries Using Remotely-sensed Oceanographic Data      
Elkhorn Slough Volunteer Water Quality Monitoring      
Engaging Communities; Using Citizen Science to Assess and Address Children’s Environmental Health from Transit and Air Pollution      
Evaluating the Ecological and Social Outcomes of Neighborhood and Nonprofit Urban Forestry      
Evaluating the Ecological and Social Outcomes of Neighborhood and Nonprofit Urban Forestry      
Florida Keys Water Watch      
Florida Microplastic Awareness Project      
Forecasting Harmful Algal Blooms in California      
Georgia Adopt-a-Stream      
Great Lakes Worm Watch      
Greater Atlanta Pollinator Partnership: A model for urban pollinator conservation      
Gros Ventre Project      
High Country Citizen Science Project at Glacier National Park      
Hummingbird Survey      
IDAH2O Master Water Stewards      
IHO Crowd Source Bathymetry Pilot Project      
Image Detective      
Indigenous Observation Network      
Indigenous Observation Network (ION)      
Invasive Plant Citizen Science Project at Glacier National Park      
Ironbound Community Corporation (ICC) Environmental Monitoring      
Lantern Live      
Local Environmental Observer Network      
Long-term Monitoring Program and Experiential Training for Students      
Long-term Monitoring Program and Experiential Training for Students (LiMPETS)      
Lunar Mapping and Modeling Project      
MapGive      
Mapping Application for Penguin Populations and Projected Dynamics (MAPPPD)      
Mapping Application for Penguin Populations and Projected Dynamics (MAPPPD)      
Mapping for Resilience      
Marine Debris Monitoring and Assessment Project (MDMAP)      
Marine Debris Tracker      
Marine Mammal Sightings Data - Channel Islands National Marine Sanctuary       
Meterologial Phenomena Identification Near the Ground (mPING)      
Monitoring Deer Impacts to Vegetation      
Mussel Watch Contaminant Monitoring      
NAIP 2015 Imagery Feedback      
NOAA NWS SKYWARN® Weather Spotter Program      
NYC Cricket Crawl      
National Broadband Map      
National Weather Service Cooperative Observer Program      
Nature's Notebook      
Neighborhood Nestwatch      
Neighborhood Nestwatch      
Nonindigenous Aquatic Species (NAS) Online Sighting      
North American Bird Phenology Program      
OpenTreeMap      
Phytoplankton Monitoring Network      
Plains to the Park      
Quake Catcher Network      
Report a Landslide      
Right Whale Sighting Advisory System      
Rocky Mountain Butterfly Project      
Rocky Mountain National Park Christmas Bird Count      
Rocky Mountain National Park Summer Bird Count      
San Gabriel River Sea Turtle Monitoring      
San Gabriel River Sea Turtle Monitoring      
SciCast      
Sea Ice for Walrus Outlook (SIWO)      
Shoreline Debris Monitoring      
Sky Science S’COOL (Student Cloud Observations On-Line)      
Smithsonian Transcription Center      
Social Science Study: What do you value on Georgia’s Coast? Measuring and Mapping the Social Values of Ecosystem Services off of Georgia’s Coast      
Social.Water      
Southern Maine Volunteer Beach Profile Monitoring Program      
Stellwagen Sanctuary Seabird Stewards      
Student Watershed Research Project      
Students' Cloud Observations On-Line      
Summer Science in New England      
Supporting Communities using EPA Science Tools      
System for Mapping and Predicting Species of Concern      
Team Ocean Science Diver Program      
The Advanced Rapid Imaging and Analysis Project      
The Community Collaborative Rain, Hail, and Snow Network (CoCoRaHS)      
The Costa Rica Science Exchange      
The Delaware Bay Horseshoe Crab Spawning Survey      
The GLOBE Program      
The Hudson River Eel Project      
The National Map Corps      
The National Map Corps      
The North American Breeding Bird Survey      
The Open PV Project      
TreesCount! 2015      
TreesCount! 2015      
Use of early-successional forests by birds during the post-fledging period      
Volunteer Cattail Monitoring Project      
Whale Alert West Coast      
Whale Alert app      
iCoast- Did the Coast Change?      

Slides

Slides

Slide 2 Summary

BrandNiemann11172015Slide2.PNG

 

Slide 3 National Data Science Organizers Workshop

http://www.nationalprioritychallenge.org

BrandNiemann11172015Slide3.PNG

Slide 7 Key Points

BrandNiemann11172015Slide7.PNG

Slide 8 Because I am a Data Scientist and Data Journalist

6 questions that can help journalists find a focus, tell better stories

BrandNiemann11172015Slide8.PNG

Slide 11 Where we do it?: Locations

BrandNiemann11172015Slide11.PNG

Slide 13 Why we do it?: Use Federal Big Data Examples and Technology

BrandNiemann11172015Slide13.PNG

Slide 14 How we do it?: Like the NIH Data Commons

A NIH – Semantic Medline Data Science Data Publication Commons

BrandNiemann11172015Slide14.PNG

Slide 15 How we do it?: OSTP/NSF National Data Science Organizers Workshop

BrandNiemann11172015Slide15.PNG

Slide 16 How we do it?: We Already Do This!

BrandNiemann11172015Slide16.PNG

Slide 17 How we do it?: Data Mining - Science - Questions - Publication Process

BrandNiemann11172015Slide17.PNG

Slide 19 Specific Example: Data Science for the Map of Federal Crowdsourcing and Citizen Science Projects for the NDSO Challenge

BrandNiemann11172015Slide19.PNG

Slide 20 Federal Crowdsourcing and Citizen Science Toolkit

https://crowdsourcing-toolkit.sites.usa.gov/

BrandNiemann11172015Slide20.PNG

Slide 21 Map of Federal Crowdsourcing and Citizen Science Projects

https://crowdsourcing-toolkit.sites.usa.gov/

BrandNiemann11172015Slide21.PNG

Slide 22 Database of Federal Crowdsourcing and Citizen Science Projects: All

https://ccsinventory.wilsoncenter.org/

BrandNiemann11172015Slide22.PNG

Slide 23 Database of Federal Crowdsourcing and Citizen Science Projects: AVO

https://ccsinventory.wilsoncenter.org/#projectId/101

BrandNiemann11172015Slide23.PNG

Slide 24 Submit a New Project

https://ccsinventory.wilsoncenter.org/add.html

BrandNiemann11172015Slide24.PNG

Slide 25 CCS Inventory in Excel Spreadsheet

CCSInventory.xlsx

BrandNiemann11172015Slide25.PNG

Slide 26 Spotfire Imports Boundary Files and Spotfire Geocodes Data

BrandNiemann11172015Slide26.PNG

Slide 27 CCS Inventory in Spotfire

Web Player

BrandNiemann11172015Slide27.PNG

Slide 29 Commons Lab Database

http://wilsoncommonslab.org/inventory/

BrandNiemann11172015Slide29.PNG

Slide 30 Goal: International network of citizen science data

http://www.wilsoncenter.org/article/ppsr-core-metadata-standards

BrandNiemann11172015Slide30.PNG

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Error: Embedded data could not be displayed. Use Google Chrome

Research Notes for National Data Science Organizers Workshop

Source: http://www.nationalprioritychallenge.org/schedule/ See Photos

Day 1: November 5, 2015:

12:00 pm – 1:00 pm  Lunch with the Big Data Regional Innovation Hubs Leaders (limited seating)

 

2:00 pm – 2:30 pm Opening Keynote: What are the National Priorities? by Thomas Kalil

 

2:30 pm – 3:30 pm Session 1: Leadership Panel on Data Science Innovation and Collaboration

 

3:30 pm – 5:00 pm Grassroots Data Science Across the Nation with Lightning Talks

 

5:00 pm – 5:15 pm Support of grassroots data science, crowd sourcing, and challenges

 

6:00 pm – 8:00 pm PUBLIC: Data Drinks: National Data Community Happy Hour!

 

Day 2: November 6, 2015:

8:00 am – 10:00 am Session 2: Exposing Data

 

10:30 am – 11:30 am Session 3: Coordination and Support of Data Science Meetups

 

12:30 pm – 1:30 pm  Lunch Keynote: Data Science in the Government by D.J. Patil

IMG_0654IMG_0646DJPatil5.JPG

1:30 pm – 5:30 pm Session 4: The National Priority Challenge

 

5:45 pm – 6:00 pm Closing Remarks

3rd Annual Big Data for Intelligence Symposium, Nov. 17-18, 2015

3rd Annual Big Data for Intelligence Symposium, Nov. 17-18, 2015

Harnessing the Power of Big Data for The Intelligence Community
Alexandria, VA
http://bigdatasymposium.dsigroup.org/

Presentation: National Priorities for Big Data

Dr. Brand Niemann
Director and Senior Data Scientist/Data Journalist
Semantic Community
http://semanticommunity.info
Founder and Co-Organizer
Federal Big Data Working Group Meetup
http://www.meetup.com/Federal-Big-Data-Working-Group/

The White House Office of Science and Technology Policy (OSTP) and the  National Science Foundation (NSF) convened a National Data Science Organizers Workshop, November 5-6, 2015, to discuss 1. Data Science for the Nation National Priorities, Impacts of Big Data Science on National Priorities, and Using Meetups to Explore National Challenges, 2. Exposing Data; 3. Coordination and Support of Data Science Meetups; and 4: The National Priority Challenge.

The results of this workshop will be summarized along with highlights from the Federal Big Data Working Group Meetup, for which the presenter is the Founder and Co-Organizer.

Examples of what the Federal Big Data Working Group Meetup has done from 2014-present to provide big data science tutorials and Massive Open Online Courses (MOOCs), curated government datasets, and citizen science and crowdsourcing in support of the White House Open Science and Innovation: Of the People, By the People, For the People, as part of the President's 2013 Second Open Government National Action Plan. Open Science and Innovation: Of the People, By the People, For the People.

Note Part of the Abstract: In the 2013 Second Open Government National Action Plan, President Obama called on agencies to harness the ingenuity of the public by accelerating and scaling the use of open innovation methods such as citizen science and crowdsourcing. This forum brings together citizen science professionals, researchers, and stakeholders from local, state, Federal, and Tribal governments; academia; non-profits; and the private sector to celebrate the contributions of crowdsourcing and citizen science to enhancing agencies’ missions, and scientific and societal outcomes. #WHCitSci

https://www.whitehouse.gov/live/open...-people-people

DRAFT AGENDA FOR REVIEW ONLY. INVITED SESSIONS ARE SUBJECT TO CHANGE

Tuesday November 17, 2015

8:00 -  8:45

Registration and Light Breakfast Reception Open

8:45- 9:00

Chairman’s Opening Remarks

9:00- 9:45

Opening Keynote Remarks:

Big Data and the Need for Information Environments

-Update on the IC’s migration to IC-ITE

-Taking advantage of cloud computing and the necessary security enhancements

-Utilizing predictive analytics in support of information security and enhanced intelligence

-Monitoring where sensitive data is and who has access to it on a real-time basis

Dr. Raymond Cook (Invited) Assistant DNI and IC CIO ODNI

9:45 – 10:30

US Army Priorities for Utilizing Big Data in Support of an Enhanced Intelligence Enterprise

-Integrating critical multi-discipline intel capabilities in all layers to support regionally aligned Army

-Maturing The US Army’s ability to leverage the national to tactical enterprise ISO expeditionary/distributed ops

-Setting conditions to ensure Army’s alignment with evolving IC ITE, DoD JIE, Army MC COE requirements

Annette Redmond, SES (Confirmed)

Director, Intelligence Community Information Management Office of the Deputy Chief of Staff, HQDA G-2

10:30 – 11:15

Networking Break

11:15-12:00

Development of Technology and Tools to Maximize Insight from Large Unstructured Data Sets

-State-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share BIG DATA that could benefit from standardization

-Potential measurements to ensure the accuracy and robustness of methods that harness these technologies

Dr. Ashit Talukder (Confirmed)

Division Chief, Information Access Division NIST

12:00- 1:15

Networking Lunch

1:15 – 2:00

Utilizing Big Data to Enhance the USMC Intelligence Enterprise

-MCISRE Enterprise capabilities

-Connecting users to product development

BGen Michael Groen, USMC (Invited)

Director

HQMC Intelligence Department

2:00 – 2:45

Utilizing Big Data and Predictive Analytics to Enhance Enterprise Effectiveness

COL Bobby Saxon, USA (Invited)

Division Chief and Program Director HQDA G-3/5/7

 

 

2:45– 3:15

Networking Break

3:15 - 4:00

Integrating Future Operations: ISR in the Combat Cloud

  • Integrating the Intelligence Community with the Warfighter
  • Empowering analytics and innovation
  • Automating sensor networks and  battlespace networking

Jeffrey Eggers, SES (Confirmed)

Chief Technology Officer

Deputy Chief of Staff for ISR, HQAF

4:00 – 4:45

National Priorities for Big Data

-Data science for the nation: Impacts of big data science on National priorities

-Data science for tackling the challenges of big data

-Developing people, processes, and products for the Federal Government

Dr. Brand Niemann (Confirmed) Founder and Co-Organizer Federal Big Data Working Group Slides

4:45

End of Day 1

 

Wednesday, November 18, 2015

8:15 -

8:45

Registration and Light Breakfast Reception Open

8:45 -

9:00

Chairman’s Opening Remarks

9:00 -

9:45

Utilizing Big Data to Enhance Intelligence Value

Mike Bender (Confirmed)

Director, Laboratory of Analytic Sciences North Carolina State University

9:45 -

10:30

Keynote Remarks:

Utilizing the Influx of Big Data to Enhance Geospatial Intelligence

-Empowering Geospatial Intelligence production with Big Data Analytics

-Crowdsourced mapping for Geospatial Big Data Analytics

-Speeding up the acquisition process to transition critical technology into the NGA enterprise

Susan Gordon (Invited)

Deputy Director NGA

10:30 - 11:00

Networking break

11:00-11:45

Initiatives at NRO to Maximize Timely Intelligence Production through Advances in Big Data Analytics

-Reducing time for detection, collection, processing & decision-making

-Utilizing advanced analytics to attack hard problems

-Big Data as part of the “Multi-INT” solution

Terry Duncan (Invited)

Director, Communications Systems Directorate NRO

 

 

11:45 - 12:30

Innovative Analytics Efforts to Gain Actionable Intelligence from Big Data

-Equipping the analytic enterprise with the tools, skills and personnel needed to harvest transformative insights from Big Data

-Creating new technology to enhance analytic efficiency and apply data science methods

Catherine Johnston (Confirmed)

Director for IC ITE and Digital Transformation Defense Intelligence Agency

12:30 – 1:30

Networking Lunch

1:30 – 2:15

Utilizing Big Data to Inform DoD Acquisition

  • Data-driven decisions from the program office, to Department decision makers, to Capitol Hill
  • Data stewardship, access, and analysis
  • The Defense Acquisition Visibility Environment

Mark Krzysko (Confirmed)

Deputy Director, Enterprise Information OUSD (AT&L)

2:15 – 3:00

Empowering the US Navy Intelligence Core Analytic Enterprise with Big Data

-Harnessing data for the warfighter

-All source analytics for big data

-US Navy requirements for industry

B. Lynn Wright (To Be Invited) Deputy Director of Naval Intelligence DCNO N2/N6

3:00

End of Symposium

Big Data and Data Science at UN

I am a data scientist/data journalist and read the following blog post: I have recently attended a Big Data Innovation Summit in Boston (Sep 9-10), and had a chance to listen to an inspiring talk by Atefeh Riazi,  United Nations Assistant Secretary-General, Chief Information Technology Officer at the United Nations, essentially a CTO of the UN. She talked about what Big Data and Data Science can do to help solve the many problems UN and the world faces and kindly agreed to do a short interview to bring these issues to the attention of the wider Data Science community.

http://www.kdnuggets.com/2015/09/exclusive-interview-big-data-science-united-nations.html

I would like to do a more in-depth story about her and your conference and any assistance you could provide would be appreciated.

http://theinnovationenterprise.com/summits/big-data-innovation-boston-2016/speakers

I just discovered that the UN has a challenge page:

Unite Ideas (unite.un.org/ideas), a crowdsourcing platform where we post data analytics challenges for individuals who would like to contribute. I encourage all data scientist to take a look and consider signing up as a volunteer.

From reading a recent KDNuggets article at: http://www.kdnuggets.com/2015/09/exclusive-interview-big-data-science-united-nations.html

And lots of databases to work with: http://www.un.org/en/databases/

I am organizing a March 7th 2016 on Big Data and Data Science at UN.

Exclusive Interview: Big Data and Data Science at UN

Source: http://www.kdnuggets.com/2015/09/exc...d-nations.html

We interview the UN Chief Information Technology Officer about how Big Data and Data Science can help solve world's problem. Check Unite Ideas crowdsourcing platform for data analytics challenges where you can help.

By Gregory Piatetsky, @kdnuggets

I have recently attended a Big Data Innovation Summit in Boston (Sep 9-10), and had a chance to listen to an inspiring talk by Atefeh Riazi, United NationsAssistant Secretary-General, Chief Information Technology Officer at the United Nations, essentially a CTO of the UN. She talked about what Big Data and Data Science can do to help solve the many problems UN and the world faces and kindly agreed to do a short interview to bring these issues to the attention of the wider Data Science community. 

Gregory Piatetsky, Q1: What we 2-3 important parts of your job? 

Atefeh RiaziAtefeh Riazi, @UN_CITO My role as the CITO is to help modernize the UN so we can deliver in our mandates, transform the organization through the use of innovation and technology and develop a cyber security strategy that responds to the changing world. 

GP, Q2: What are some examples of crises that could have be predicted? 

AR: There are many past crises that have been analyzed after the fact using Big Data. One of the earliest examples that really drew people's attention was the global Avian Flu crisis, where it was demonstrated that statistical analysis of search queries typed in by the public on popular search engines could be a good indicator of the spreading of the disease around the world. 

In the Ebola crisis we could have benefited a lot more from analytics. I do believe that learning from these will bring us closer to preventing crises or at least being better prepared. This is true for political or economic crises, which much depend on human behavior (analysis of social media is an obvious source of data here), for environmental crises (e.g. global warming which is already being analyzed using a wide range of data sources) and for natural disasters (where satellite photos and seismological data can give us crucial time gains in early warning that translate to lives being saved). 

GP, Q3: Even when crises are very predictable, like already noticeable climate change, it is very difficult to get some politicians and countries to act if it is against their short-term interest. Can (data) science play a role in making a more convincing argument? 

AR: Clearly a story or argument that is backed up by hard facts, by data that is clearly sourced and validated, is more convincing. There is a trend in news media to do exactly that: publications such as the New York Times, the Guardian and others increasingly integrate data analytics, even interactive visualizations, into their reporting. The public will soon demand this type of transparency, clarity and proof, not only from news media but from governments and politicians as well. I believe we are heading towards a time where data will be used in politics, both to make arguments and commitments and to hold leaders accountable. 

GP: Q4. What were some successes and achievements at the UN made possible by Data Science and Big Data? What about risks and downsides of Big Data? 

AR: We have experimented with a number of different techniques. For example we have just created a tool to monitor and analyze the official statements of countries on particular topics over time. This is not only for our own use but particularly as a tool to facilitate diplomacy for the Delegates of our Member States. Official statements from countries can these days be found in the form of tweets, blog posts or other forms of online discussion, and keeping track of hundreds or thousands of these is not an easy task. With Data Analytics techniques we can now provide effective search and even trend analysis. 

We make heavy use of text analysis to make the large repositories of UN documents more easily accessible and provide new insight, for instance in relationships between documents and topics. We are now starting to do the same with multimedia repositories: audio and video recordings of UN meetings and photo archives. 

In terms of predictive analytics, social media is of course an important data source for the political and socio-economic areas of our work, and we are actively exploring this. 

We are still at the early stages of systematically incorporating Data Science in the daily work of the organization, but I believe these techniques will ultimately be leveraged in every field. 

With regard to the risks of Big Data, I see those stemming from the fact that the norms and rules in our society have trouble keeping up with the speed at which technology develops and the availability of data increases. If we can hardly imagine all the possible applications of data analytics today, let alone 5 years from now, how can we judge the consequences of how we share, store and treat our data? This is true at the personal level but at the policy level as well: most countries are implementing legislation related to privacy and appropriate use of data, but these laws will need constant updating to keep pace with reality. 

GP: Q5. There are many global crises at present, from climate change to Syrian refugees. What can Data Science community do to help? Are there UN projects or activities where interested and able Data Scientists can volunteer? 

AR: We have just launched a platform for this purpose: 
Unite IdeasUnite Ideas (unite.un.org/ideas), a crowdsourcing platform where we post data analytics challenges for individuals who would like to contribute. I encourage all data scientist to take a look and consider signing up as a volunteer. 

We also welcome collaboration with organizations, both from the private sector, from academia and the non-profit sector. We already have a number of very fruitful partnerships with organizations that have generously shared their research, donated licenses for data analytics software or made their time and expertise available. We welcome other partnerships that help us tap into the potential of Data Science in the work of the UN, to support our Member States and generally to help improve lives around the world. 

Finally I am happy that there is an active Open Source community in the area of Data Science and I welcome collaboration with them. The concept of Open Source aligns well with the ideals of the UN. Since I have started this job I have hosted two large Open Source conferences at the United Nations (NYC Drupal Camp and Open Streetmap's "State of the Map") and I would be happy to do the same in the area of Data Science. 

GP: check for example dynamic visualization: UN Security Council Resolutions Relationships Explorer Un Security Council Explorer

and other solutions at https://unite.un.org/ideas/solutions 

Related:

NEXT

Page statistics
1353 view(s) and 39 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments