Table of contents
  1. Story
  2. Slides
    1. Slide 1 Data Science for Global Ebola Response Data
    2. Slide 2 Data Science for International Data Week 2016: Concept Plan for Ebola Data
    3. Slide 3 UN Global Ebola Response
    4. Slide 4 WHO Ebola Summary Table
    5. Slide 5 Ebola Situation Reports
    6. Slide 6 Ebola Situation Report - 15 July 2015
    7. Slide 7 Ebola Situation Report - Map Interface
    8. Slide 8 WHO Ebola Data and Statistics
    9. Slide 9 WHO Ebola Data and Statistics-Situation Summary
    10. Slide 10 WHO Ebola Data and Statistics-Download
    11. Slide 11 WHO Ebola Data and Statistics-Download CSV
    12. Slide 12 Data Science Global Ebola Response Data-Spotfire-Latest Summary Situation
    13. Slide 13 HDX: West Africa Ebola Outbreak Map
    14. Slide 14 HDX: West Africa Ebola Outbreak Data Sets
    15. Slide 15 MindTouch Knowledge Base: HDX 53 Data Sets
    16. Slide 16 HDX 53 Data Sets
    17. Slide 17 HDX 53 Data Sets Metadata
    18. Slide 18 HDX Data Sets Total Cases and Deaths
    19. Slide 19 WHO Ebola Response GIS Data Sharing
    20. Slide 20 WHO Ebola Response GIS Data Sharing Data Sets
    21. Slide 21 WHO Ebola Response GIS Data Sharing Metadata
    22. Slide 22 Data Science Global Ebola Response Data-Spotfire-WHO Helliopad Locations
    23. Slide 23 Data Science Questions and RDA Outcomes
    24. Slide 24 Conclusions and Recommendations
    25. Slide 25 Ebola Situation Report - 15 July 2015 Mapping
    26. Slide 26 HDX: Nepal Earthquake
  3. Slides
    1. Slide 1 Data Science for International Data Week 2016: Concept
    2. Slide 2 Purpose
    3. Slide 3 White House Interest in Data Science Meetups
    4. Slide 4 Research Data Alliance (RDA)
    5. Slide 5 Concept Note: International Data Week 2016
    6. Slide 6 Meetup.com
    7. Slide 7 US Global Ebola Response
    8. Slide 8 Federal Big Data Working Group Meetup
    9. Slide 9 Upcoming Meetups
    10. Slide 10 Data Science for International Data Week 2016: Concept Plan for Ebola Data
  4. Spotfire Dashboard
  5. Research Notes
    1. Google Search: United Nations Ebola Data
    2. Concept Note: International Data Week
      1. Comprising SciDataCon 2016 and Research Data Alliance Plenary Eight
      2. Background of Research Data Alliance Plenary Meetings
      3. Background of SciDataCon
      4. Advantages of an International Data Week
      5. Approach
      6. Practical Issues
      7. Focus and Scope of International Data Week
    3. Notes for July 15th Meeting at NSF by Brand Niemann
      1. Dates That I Know About
      2. My questions are
      3. Climate Data Sets I Prepared
      4. Message to 65 Largest Data Science Meetups and Groups
      5. Responses
      6. Data Science on Data Science Meetups and Climate.Data.gov Data Sets
        1. Number of Meetups Verses Number of Members for 60 Largest Data Science Meetups and 15 NSF Meetups and Other Groups for Initial Survey
        2. Climate.Data.gov (07092015) 548 Data Sets Showing Only About 50% Are Machine Readable and in Standard Formats
  6. Global Ebola Response Data
    1. Data on the Ebola outbreak
    2. WHO Ebola Situation Report - 8 July 2015
      1. Confirmed, probable, and suspected cases in Guinea, Liberia, and Sierra Leone
  7. WHO Ebola data and statistics
    1. Countries with intense transmission
  8. Humanitarian Data Exchange Data Sets
    1. Number of health-care workers deaths by EVD
    2. Number of health-care workers infected with EVD
    3. Number of Ebola Cases and Deaths in Affected Countries
    4. 3W OCHA Guinea as of 16 June 2015
    5. Data for Ebola Recovery
    6. Funding Coverage of the Ebola Virus Outbreak Emergency
    7. Topline Ebola Outbreak Figures
    8. Mali Health Districts
    9. Sierra Leone NERC Ebola Care Facilities Master List
    10. 3W Ebola Sierra Leone
    11. Guinea Ebola Community Care Centre
    12. Ebola outbreaks before 2014
    13. Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone
    14. Guinea 3W Ebola Response
    15. Ebola Treatment Centers or Units (ETCs or ETUs)
    16. Safe and Dignified Burial Teams
    17. Logistics Bases and Facilities
    18. Sub-national Data of Confirmed Cumulative Ebola by Gender
    19. Internet and radio services in Liberia, Sierra Leone, Guinea
    20. Ebola Treatment Units - 3 Word Addresses
    21. West Africa Movement Restrictions
    22. Guinea - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    23. Health facilities in Guinea, Liberia, Mali and Sierra Leone
    24. Health Facilities Liberia oct 2014
    25. Ebola Testing Laboratories
    26. EVD Cases by district
    27. Liberia ETU Constructions
    28. West African Health Centres - 3 word addresses
    29. Weekly EVD cases by country
    30. InterAction Member Activities related to Ebola Response
    31. Ebola GeoNode
    32. Ebola Community Care Centers
    33. Travel Distance and Time Chart
    34. Direct Relief Ebola Materials Shipped
    35. Sub-national Indicators Ebola Countries
    36. Sierra Leone OSM Roads data attributed with road surface classification
    37. OpenStreetMap Small Devices Offline Map & Navigation data, West Africa
    38. OpenStreetMap ShapeFiles for GIS softwares (Daily updates)
    39. Sierra Leone: Education Establishments
    40. Sierra Leone - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    41. Shape Files of the ETUs
    42. OpenStreetMap Settlement Place Names, West Africa
    43. OpenStreetMap GIS data on Guinea, Liberia, and Sierra Leone
    44. Community Care Centers in Guinea
    45. Mobility patterns and population densities for West Africa
    46. Matrix 4W - WASH Cluster - Guinea
    47. Mali Health Facilities
    48. 3W Dataset on the Organizations Involved in the Response to the Ebola Crisis
    49. Lat/Long/Names of ETUs in Liberia updated as of 21 Oct
    50. Sierra Leone update 1501 Health Facilities Nov 2014
    51. Number of existing beds in EVD treatment units
    52. Commonly Used Abbreviations in United Nations Logistics
    53. NetHope Open Humanitarian Data Repository for the West Africa Ebola outbreak
  9. Research Data Alliance Outputs - Download the booklet!
    1. Research Data Alliance Outputs
      1. Cover Page
      2. Inside Cover Page
      3. Foreword
      4. Connecting the Data Dots: Building Impact
    2. Why should these be adopted?
        1. Who benefits?
          1. Data Citation
          2. Data Description Registry Interoperability (DDRI)
          3. Data Foundation & Terminology (DFT)
          4. Data Type Registries (DTR)
          5. Metadata
          6. PID Information Types (PIT)
          7. Practical Policy (PP)
          8. Wheat Data Interoperability
        2. How do all these dots connect?
        3. Delivering on Promises
      1. Data Foundation and Terminology Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      2. Data Type Registries Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      3. PID Information Types Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      4. Practical Policy Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      5. Scalable Dynamic Data Citation Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      6. Data Description Registry Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. Who is involved in this working group?
        4. What is RD-Switchboard?
        5. When can be this be used?
      7. Metadata Standards Directory Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      8. Wheat Data Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
          1. Working group charter: Wheat data interoperability
        5. When can this be used?
      9. Get involved
      10. Back Inside Cover Page
      11. Back Outside Cover Page
  10. NEXT

Data Science for Global Ebola Response Data

Last modified
Table of contents
  1. Story
  2. Slides
    1. Slide 1 Data Science for Global Ebola Response Data
    2. Slide 2 Data Science for International Data Week 2016: Concept Plan for Ebola Data
    3. Slide 3 UN Global Ebola Response
    4. Slide 4 WHO Ebola Summary Table
    5. Slide 5 Ebola Situation Reports
    6. Slide 6 Ebola Situation Report - 15 July 2015
    7. Slide 7 Ebola Situation Report - Map Interface
    8. Slide 8 WHO Ebola Data and Statistics
    9. Slide 9 WHO Ebola Data and Statistics-Situation Summary
    10. Slide 10 WHO Ebola Data and Statistics-Download
    11. Slide 11 WHO Ebola Data and Statistics-Download CSV
    12. Slide 12 Data Science Global Ebola Response Data-Spotfire-Latest Summary Situation
    13. Slide 13 HDX: West Africa Ebola Outbreak Map
    14. Slide 14 HDX: West Africa Ebola Outbreak Data Sets
    15. Slide 15 MindTouch Knowledge Base: HDX 53 Data Sets
    16. Slide 16 HDX 53 Data Sets
    17. Slide 17 HDX 53 Data Sets Metadata
    18. Slide 18 HDX Data Sets Total Cases and Deaths
    19. Slide 19 WHO Ebola Response GIS Data Sharing
    20. Slide 20 WHO Ebola Response GIS Data Sharing Data Sets
    21. Slide 21 WHO Ebola Response GIS Data Sharing Metadata
    22. Slide 22 Data Science Global Ebola Response Data-Spotfire-WHO Helliopad Locations
    23. Slide 23 Data Science Questions and RDA Outcomes
    24. Slide 24 Conclusions and Recommendations
    25. Slide 25 Ebola Situation Report - 15 July 2015 Mapping
    26. Slide 26 HDX: Nepal Earthquake
  3. Slides
    1. Slide 1 Data Science for International Data Week 2016: Concept
    2. Slide 2 Purpose
    3. Slide 3 White House Interest in Data Science Meetups
    4. Slide 4 Research Data Alliance (RDA)
    5. Slide 5 Concept Note: International Data Week 2016
    6. Slide 6 Meetup.com
    7. Slide 7 US Global Ebola Response
    8. Slide 8 Federal Big Data Working Group Meetup
    9. Slide 9 Upcoming Meetups
    10. Slide 10 Data Science for International Data Week 2016: Concept Plan for Ebola Data
  4. Spotfire Dashboard
  5. Research Notes
    1. Google Search: United Nations Ebola Data
    2. Concept Note: International Data Week
      1. Comprising SciDataCon 2016 and Research Data Alliance Plenary Eight
      2. Background of Research Data Alliance Plenary Meetings
      3. Background of SciDataCon
      4. Advantages of an International Data Week
      5. Approach
      6. Practical Issues
      7. Focus and Scope of International Data Week
    3. Notes for July 15th Meeting at NSF by Brand Niemann
      1. Dates That I Know About
      2. My questions are
      3. Climate Data Sets I Prepared
      4. Message to 65 Largest Data Science Meetups and Groups
      5. Responses
      6. Data Science on Data Science Meetups and Climate.Data.gov Data Sets
        1. Number of Meetups Verses Number of Members for 60 Largest Data Science Meetups and 15 NSF Meetups and Other Groups for Initial Survey
        2. Climate.Data.gov (07092015) 548 Data Sets Showing Only About 50% Are Machine Readable and in Standard Formats
  6. Global Ebola Response Data
    1. Data on the Ebola outbreak
    2. WHO Ebola Situation Report - 8 July 2015
      1. Confirmed, probable, and suspected cases in Guinea, Liberia, and Sierra Leone
  7. WHO Ebola data and statistics
    1. Countries with intense transmission
  8. Humanitarian Data Exchange Data Sets
    1. Number of health-care workers deaths by EVD
    2. Number of health-care workers infected with EVD
    3. Number of Ebola Cases and Deaths in Affected Countries
    4. 3W OCHA Guinea as of 16 June 2015
    5. Data for Ebola Recovery
    6. Funding Coverage of the Ebola Virus Outbreak Emergency
    7. Topline Ebola Outbreak Figures
    8. Mali Health Districts
    9. Sierra Leone NERC Ebola Care Facilities Master List
    10. 3W Ebola Sierra Leone
    11. Guinea Ebola Community Care Centre
    12. Ebola outbreaks before 2014
    13. Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone
    14. Guinea 3W Ebola Response
    15. Ebola Treatment Centers or Units (ETCs or ETUs)
    16. Safe and Dignified Burial Teams
    17. Logistics Bases and Facilities
    18. Sub-national Data of Confirmed Cumulative Ebola by Gender
    19. Internet and radio services in Liberia, Sierra Leone, Guinea
    20. Ebola Treatment Units - 3 Word Addresses
    21. West Africa Movement Restrictions
    22. Guinea - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    23. Health facilities in Guinea, Liberia, Mali and Sierra Leone
    24. Health Facilities Liberia oct 2014
    25. Ebola Testing Laboratories
    26. EVD Cases by district
    27. Liberia ETU Constructions
    28. West African Health Centres - 3 word addresses
    29. Weekly EVD cases by country
    30. InterAction Member Activities related to Ebola Response
    31. Ebola GeoNode
    32. Ebola Community Care Centers
    33. Travel Distance and Time Chart
    34. Direct Relief Ebola Materials Shipped
    35. Sub-national Indicators Ebola Countries
    36. Sierra Leone OSM Roads data attributed with road surface classification
    37. OpenStreetMap Small Devices Offline Map & Navigation data, West Africa
    38. OpenStreetMap ShapeFiles for GIS softwares (Daily updates)
    39. Sierra Leone: Education Establishments
    40. Sierra Leone - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    41. Shape Files of the ETUs
    42. OpenStreetMap Settlement Place Names, West Africa
    43. OpenStreetMap GIS data on Guinea, Liberia, and Sierra Leone
    44. Community Care Centers in Guinea
    45. Mobility patterns and population densities for West Africa
    46. Matrix 4W - WASH Cluster - Guinea
    47. Mali Health Facilities
    48. 3W Dataset on the Organizations Involved in the Response to the Ebola Crisis
    49. Lat/Long/Names of ETUs in Liberia updated as of 21 Oct
    50. Sierra Leone update 1501 Health Facilities Nov 2014
    51. Number of existing beds in EVD treatment units
    52. Commonly Used Abbreviations in United Nations Logistics
    53. NetHope Open Humanitarian Data Repository for the West Africa Ebola outbreak
  9. Research Data Alliance Outputs - Download the booklet!
    1. Research Data Alliance Outputs
      1. Cover Page
      2. Inside Cover Page
      3. Foreword
      4. Connecting the Data Dots: Building Impact
    2. Why should these be adopted?
        1. Who benefits?
          1. Data Citation
          2. Data Description Registry Interoperability (DDRI)
          3. Data Foundation & Terminology (DFT)
          4. Data Type Registries (DTR)
          5. Metadata
          6. PID Information Types (PIT)
          7. Practical Policy (PP)
          8. Wheat Data Interoperability
        2. How do all these dots connect?
        3. Delivering on Promises
      1. Data Foundation and Terminology Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      2. Data Type Registries Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      3. PID Information Types Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      4. Practical Policy Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      5. Scalable Dynamic Data Citation Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      6. Data Description Registry Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. Who is involved in this working group?
        4. What is RD-Switchboard?
        5. When can be this be used?
      7. Metadata Standards Directory Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      8. Wheat Data Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
          1. Working group charter: Wheat data interoperability
        5. When can this be used?
      9. Get involved
      10. Back Inside Cover Page
      11. Back Outside Cover Page
  10. NEXT

  1. Story
  2. Slides
    1. Slide 1 Data Science for Global Ebola Response Data
    2. Slide 2 Data Science for International Data Week 2016: Concept Plan for Ebola Data
    3. Slide 3 UN Global Ebola Response
    4. Slide 4 WHO Ebola Summary Table
    5. Slide 5 Ebola Situation Reports
    6. Slide 6 Ebola Situation Report - 15 July 2015
    7. Slide 7 Ebola Situation Report - Map Interface
    8. Slide 8 WHO Ebola Data and Statistics
    9. Slide 9 WHO Ebola Data and Statistics-Situation Summary
    10. Slide 10 WHO Ebola Data and Statistics-Download
    11. Slide 11 WHO Ebola Data and Statistics-Download CSV
    12. Slide 12 Data Science Global Ebola Response Data-Spotfire-Latest Summary Situation
    13. Slide 13 HDX: West Africa Ebola Outbreak Map
    14. Slide 14 HDX: West Africa Ebola Outbreak Data Sets
    15. Slide 15 MindTouch Knowledge Base: HDX 53 Data Sets
    16. Slide 16 HDX 53 Data Sets
    17. Slide 17 HDX 53 Data Sets Metadata
    18. Slide 18 HDX Data Sets Total Cases and Deaths
    19. Slide 19 WHO Ebola Response GIS Data Sharing
    20. Slide 20 WHO Ebola Response GIS Data Sharing Data Sets
    21. Slide 21 WHO Ebola Response GIS Data Sharing Metadata
    22. Slide 22 Data Science Global Ebola Response Data-Spotfire-WHO Helliopad Locations
    23. Slide 23 Data Science Questions and RDA Outcomes
    24. Slide 24 Conclusions and Recommendations
    25. Slide 25 Ebola Situation Report - 15 July 2015 Mapping
    26. Slide 26 HDX: Nepal Earthquake
  3. Slides
    1. Slide 1 Data Science for International Data Week 2016: Concept
    2. Slide 2 Purpose
    3. Slide 3 White House Interest in Data Science Meetups
    4. Slide 4 Research Data Alliance (RDA)
    5. Slide 5 Concept Note: International Data Week 2016
    6. Slide 6 Meetup.com
    7. Slide 7 US Global Ebola Response
    8. Slide 8 Federal Big Data Working Group Meetup
    9. Slide 9 Upcoming Meetups
    10. Slide 10 Data Science for International Data Week 2016: Concept Plan for Ebola Data
  4. Spotfire Dashboard
  5. Research Notes
    1. Google Search: United Nations Ebola Data
    2. Concept Note: International Data Week
      1. Comprising SciDataCon 2016 and Research Data Alliance Plenary Eight
      2. Background of Research Data Alliance Plenary Meetings
      3. Background of SciDataCon
      4. Advantages of an International Data Week
      5. Approach
      6. Practical Issues
      7. Focus and Scope of International Data Week
    3. Notes for July 15th Meeting at NSF by Brand Niemann
      1. Dates That I Know About
      2. My questions are
      3. Climate Data Sets I Prepared
      4. Message to 65 Largest Data Science Meetups and Groups
      5. Responses
      6. Data Science on Data Science Meetups and Climate.Data.gov Data Sets
        1. Number of Meetups Verses Number of Members for 60 Largest Data Science Meetups and 15 NSF Meetups and Other Groups for Initial Survey
        2. Climate.Data.gov (07092015) 548 Data Sets Showing Only About 50% Are Machine Readable and in Standard Formats
  6. Global Ebola Response Data
    1. Data on the Ebola outbreak
    2. WHO Ebola Situation Report - 8 July 2015
      1. Confirmed, probable, and suspected cases in Guinea, Liberia, and Sierra Leone
  7. WHO Ebola data and statistics
    1. Countries with intense transmission
  8. Humanitarian Data Exchange Data Sets
    1. Number of health-care workers deaths by EVD
    2. Number of health-care workers infected with EVD
    3. Number of Ebola Cases and Deaths in Affected Countries
    4. 3W OCHA Guinea as of 16 June 2015
    5. Data for Ebola Recovery
    6. Funding Coverage of the Ebola Virus Outbreak Emergency
    7. Topline Ebola Outbreak Figures
    8. Mali Health Districts
    9. Sierra Leone NERC Ebola Care Facilities Master List
    10. 3W Ebola Sierra Leone
    11. Guinea Ebola Community Care Centre
    12. Ebola outbreaks before 2014
    13. Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone
    14. Guinea 3W Ebola Response
    15. Ebola Treatment Centers or Units (ETCs or ETUs)
    16. Safe and Dignified Burial Teams
    17. Logistics Bases and Facilities
    18. Sub-national Data of Confirmed Cumulative Ebola by Gender
    19. Internet and radio services in Liberia, Sierra Leone, Guinea
    20. Ebola Treatment Units - 3 Word Addresses
    21. West Africa Movement Restrictions
    22. Guinea - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    23. Health facilities in Guinea, Liberia, Mali and Sierra Leone
    24. Health Facilities Liberia oct 2014
    25. Ebola Testing Laboratories
    26. EVD Cases by district
    27. Liberia ETU Constructions
    28. West African Health Centres - 3 word addresses
    29. Weekly EVD cases by country
    30. InterAction Member Activities related to Ebola Response
    31. Ebola GeoNode
    32. Ebola Community Care Centers
    33. Travel Distance and Time Chart
    34. Direct Relief Ebola Materials Shipped
    35. Sub-national Indicators Ebola Countries
    36. Sierra Leone OSM Roads data attributed with road surface classification
    37. OpenStreetMap Small Devices Offline Map & Navigation data, West Africa
    38. OpenStreetMap ShapeFiles for GIS softwares (Daily updates)
    39. Sierra Leone: Education Establishments
    40. Sierra Leone - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data
    41. Shape Files of the ETUs
    42. OpenStreetMap Settlement Place Names, West Africa
    43. OpenStreetMap GIS data on Guinea, Liberia, and Sierra Leone
    44. Community Care Centers in Guinea
    45. Mobility patterns and population densities for West Africa
    46. Matrix 4W - WASH Cluster - Guinea
    47. Mali Health Facilities
    48. 3W Dataset on the Organizations Involved in the Response to the Ebola Crisis
    49. Lat/Long/Names of ETUs in Liberia updated as of 21 Oct
    50. Sierra Leone update 1501 Health Facilities Nov 2014
    51. Number of existing beds in EVD treatment units
    52. Commonly Used Abbreviations in United Nations Logistics
    53. NetHope Open Humanitarian Data Repository for the West Africa Ebola outbreak
  9. Research Data Alliance Outputs - Download the booklet!
    1. Research Data Alliance Outputs
      1. Cover Page
      2. Inside Cover Page
      3. Foreword
      4. Connecting the Data Dots: Building Impact
    2. Why should these be adopted?
        1. Who benefits?
          1. Data Citation
          2. Data Description Registry Interoperability (DDRI)
          3. Data Foundation & Terminology (DFT)
          4. Data Type Registries (DTR)
          5. Metadata
          6. PID Information Types (PIT)
          7. Practical Policy (PP)
          8. Wheat Data Interoperability
        2. How do all these dots connect?
        3. Delivering on Promises
      1. Data Foundation and Terminology Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      2. Data Type Registries Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      3. PID Information Types Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      4. Practical Policy Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      5. Scalable Dynamic Data Citation Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
      6. Data Description Registry Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. Who is involved in this working group?
        4. What is RD-Switchboard?
        5. When can be this be used?
      7. Metadata Standards Directory Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
        5. When can this be used?
        6. References
      8. Wheat Data Interoperability Working Group
        1. What is the problem?
        2. What are the goals?
        3. What is the solution?
        4. What is the impact?
          1. Working group charter: Wheat data interoperability
        5. When can this be used?
      9. Get involved
      10. Back Inside Cover Page
      11. Back Outside Cover Page
  10. NEXT

Story

Data Science for Global Ebola Response Data

Purpose:

  • White House interest in Data Science Meetups helping the US National Big Data Initiative.
  • Research Data Alliance (RDA) interest in a pipeline for reproducible data science.
  • CODATA (Committee on Data for Science and Technology) and the WDS (World Data System) interest in aligning the SciDataCon 2016 and the RDA 8th Plenary (P8) as part of an International Data Week 2016.
  • Data Science Meetups, like the DC Data Community and Federal Big Data Working Group Meetup, interest in Data Science for an International Data Week 2016.
  • United Nations interest in Crowdsourcng Data Science for Global Ebola Response Data.

Previous:

Objectives:

  • New data standards or harmonization of existing standards.
  • Greater data sharing, exchange, interoperability, usability, and re-usability.
  • Greater discoverability of research data sets.
  • Better management, stewardship, and preservation of research data.
  • Data Science for International Data Week 2016: Concept Plan for Ebola Data:
  • Date and Data Science Team to be announced: Data Science for International Data Week 2016: Ebola Data

Data Science for International Data Week 2016: Concept Plan for Ebola Data:

  • Data Mine
  • Integrate
  • Answer the Four Data Science Questions
  • Document (curate) the results in Data Science Data Publications
  • Make example of implementing the eight RDA Working Group outputs in support of the four 2015 RDA outcomes
  • UN Global Ebola response:

The Results are:

Data Science Questions:

  • How were the data collected?
    • UN Global Ebola Response Data Team
  • Where are they stored?
    • Originally at the Humanitarian Data Exchange and Other Web Sites
  • What are the results?
    • Curated in a MindTouch Knowledge Base, Excel Spreadsheets, and Spotfire Analytics and Visualizations
  • Why should we believe the results?
    • The UN Global Ebola Response Data Team Knows Their Business and I Am a Good Data Scientist!

RDA Outcomes:

  • Data Citation - Uses persistent URLs
  • Data Description Registry Interoperability (DDRI) - Provides Global View of Research Data
  • Data Type Registries (DTR) - Provides Data Types
  • Data Foundation & Terminology (DFT) – Provides Cross Disciplinary Data Exchange and Interoperability
  • Metadata - Uses Metadata Standards that Provide Interoperability
  • PID Information Types (PIT) - Data Science Data Publication Unifies the Interface
  • Practical Policy (PP) - Provides Reproducible Data Science
  • Wheat Data Interoperability – UN Global Ebola Response Data Interoperability

Conclusions and Recommendations:

  • Global Ebola Response Data Experts guided this data scientist to the best data sources.
  • Three Global Ebola Response Data Sets have been Mined, Integrated, used to Answer the Four Data Science Questions, and Documented (curated) in a Data Science Data Publication.
  • This provides an example of implementing the eight RDA Working Group outputs in support of the four 2015 RDA outcomes.
  • The Full Ebola Situation Report - 8 July 2015 could become a Data Science Data Publication to help the UN, Cluster Peer Review, and the RDA.
  • Data Science for International Data Week 2016 could also use the Humanitarian Data Exchange (HDX) Nepal Earthquake data sets.

MORE TO FOLLOW

Slides

Slides

Slide 1 Data Science for Global Ebola Response Data

Semantic Community

Data Science

Data Science for Global Ebola Response Data

BrandNiemann07172015Slide1.PNG

Slide 2 Data Science for International Data Week 2016: Concept Plan for Ebola Data

https://ebolaresponse.un.org/data

BrandNiemann07172015Slide2.PNG

Slide 4 WHO Ebola Summary Table

RDAEbola.xlsx

BrandNiemann07172015Slide4.PNG

Slide 5 Ebola Situation Reports

http://apps.who.int/ebola/ebola-situation-reports

BrandNiemann07172015Slide5.PNG

Slide 8 WHO Ebola Data and Statistics

http://apps.who.int/gho/data/node.ebola-sitrep

BrandNiemann07172015Slide8.PNG

Slide 9 WHO Ebola Data and Statistics-Situation Summary

http://apps.who.int/gho/data/node.ebola-sitrep.ebola-summary?lang=en

BrandNiemann07172015Slide9.PNG

Slide 10 WHO Ebola Data and Statistics-Download

http://apps.who.int/gho/data/node.ebola-sitrep.quick-downloads?lang=en

BrandNiemann07172015Slide10.PNG

Slide 11 WHO Ebola Data and Statistics-Download CSV

SummaryDataasof07152015.csv

BrandNiemann07172015Slide11.PNG

Slide 12 Data Science Global Ebola Response Data-Spotfire-Latest Summary Situation

Web Player

BrandNiemann07172015Slide12.PNG

Slide 13 HDX: West Africa Ebola Outbreak Map

https://data.hdx.rwlabs.org/ebola

BrandNiemann07172015Slide13.PNG

Slide 14 HDX: West Africa Ebola Outbreak Data Sets

https://data.hdx.rwlabs.org/ebola

BrandNiemann07172015Slide14.PNG

Slide 15 MindTouch Knowledge Base: HDX 53 Data Sets

Data Science for Global Ebola Response Data

BrandNiemann07172015Slide15.PNG

Slide 16 HDX 53 Data Sets

RDAEbola.xlsx

BrandNiemann07172015Slide16.PNG

Slide 18 HDX Data Sets Total Cases and Deaths

Web Player

BrandNiemann07172015Slide18.PNG

Slide 19 WHO Ebola Response GIS Data Sharing

http://home.ebolaresponse.opendata.arcgis.com/

BrandNiemann07172015Slide19.PNG

Slide 20 WHO Ebola Response GIS Data Sharing Data Sets

RDAEbola.xlsx

BrandNiemann07172015Slide20.PNG

Slide 22 Data Science Global Ebola Response Data-Spotfire-WHO Helliopad Locations

Web Player

BrandNiemann07172015Slide22.PNG

Slide 23 Data Science Questions and RDA Outcomes

BrandNiemann07172015Slide23.PNG

Slide 24 Conclusions and Recommendations

BrandNiemann07172015Slide24.PNG

Slide 25 Ebola Situation Report - 15 July 2015 Mapping

http://apps.who.int/ebola/current-situation/ebola-situation-report-8-july-2015

BrandNiemann07172015Slide25.PNG

Slides

Slides

Slide 1 Data Science for International Data Week 2016: Concept

Semantic Community

Data Science

Data Science for Global Ebola Response Data

BrandNiemann07162015Slide1.png

Slide 2 Purpose

BrandNiemann07162015Slide2.PNG

Slide 3 White House Interest in Data Science Meetups

http://www.nsf.gov/od/oia/activities/interns/aaas_fellows/renata-rawlings-goss.jsp

BrandNiemann07162015Slide3.PNG

Slide 7 US Global Ebola Response

https://ebolaresponse.un.org/data

BrandNiemann07162015Slide7.PNG

Slide 9 Upcoming Meetups

BrandNiemann07162015Slide9.PNG

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Research Notes

Google Search: United Nations Ebola Data

Global Ebola Response: https://ebolaresponse.un.org/data

Collecting data and mapping the outbreak is crucial for ensuring the right response.

The epidemiological data is updated by the World Health Organisation (WHO). Below we publish excerpts from the weekly WHO: Ebola Situation Report.

More data on the Ebola Response can be found on the UN-supported Humanitarian Data Exchange.

53 Data Sets: https://data.hdx.rwlabs.org/search?s...ator=0&q=ebola

Ebola data and statistics: http://apps.who.int/gho/data/node.ebola-sitrep

MORE TO FOLLOW

Concept Note: International Data Week

Source: https://www.rd-alliance.org/concept-...data-weel.html (Word) March 15, 2015

Comprising SciDataCon 2016 and Research Data Alliance Plenary Eight

The vision is for an International Data Week hosted in North America, either Canada or US, where it can attract the greatest level of attention.  Focusing on promoting the best exploitation of research data assets, the week comprises two major events: the RDA 8th Plenary (P8) and SciDataCon 2016.  The note below provides information on SciDataCon and makes the case for a data week that brings these two events together.

Background of Research Data Alliance Plenary Meetings

The RDA plenaries are multi-day meetings held twice a year in various locations worldwide to provide the RDA member community a unique opportunity to network and collaborate with colleagues and peers in various disciplines, and make concrete progress in technical and social areas on topics related to research data sharing and exchange. The main feature of the RDA plenary programme are the individual and joint working and interest group meetings as well as Birds of a Feather groups exploring new potential group topics.

Background of SciDataCon

SciDataCon is an International Scientific Conference co-organized by CODATA, the Committee on Data for Science and Technology and the World Data System, both bodies of the International Council for Science with a concern for data.  While building on the precedent of CODATA?s longstanding biannual conferences, SciDataCon is a new departure bringing together a broad community concerned to discuss the manifold issues around data and research. 

CODATA and WDS recently completed a Memorandum of Understanding with RDA and it makes good sense to align SciDataCon and the RDA Plenary in a major event in 2016.

Advantages of an International Data Week

Collocation of the two events will achieve the greatest impact.  Framed as an International Data Week, the organizers and host can be assured of attracting high-level attention and advancing a policy case to address research data issues by demonstrating the benefits for science and society.  Additionally, it will demonstrate that the three organizations are collaborating closely and on an international scale.

There will be benefits for the significant number of participants who would be interested in attending both events, reducing travel costs.  Economies can also be achieved for the organizations and communities by focusing both events on a single week.

The precise format is open to discussion.  One possible approach would be to have two days of SciDataCon, a shared day with emphasis on keynotes and high level discussions and then two days of RDA Plenary.

The two events are complementary.  SciDataCon is a research conference featuring scientific papers, practice papers concerned with data issues and high-level policy discussions.  Although benefitting from excellent keynotes, RDA Plenaries have more of the flavour of working meetings for RDA Members and participants in Interest Groups and Working Groups.  A significant number of those groups are joint CODATA-RDA or WDS-RDA activities.  There is overlap in the communities represented by these organizations.  SciDataCon also offers a venue for RDA Groups to present research papers about their activities, something that would be highly valued.

Approach

The unique selling point for SciDataCon is as *the* interdisciplinary scientific conference addressing data issues across the research space.  The conference should also provide a platform for agenda setting data and science policy discussions as well as practitioner papers.  We plan to designate three types of session for:

  1. research papers (quality peer-reviewed research papers);
  2. practice papers and reports (significant reports from projects, institutions or groups working on significant data issues);
  3. agenda-setting policy and strategy discussions.

Practical Issues

SciDataCon would hope to attract 400-500 participants.  The venue requirements are a large plenary and 4-6 large rooms for parallel sessions.  A large space for posters and time for poster presentations is also required.

RDA anticipates a global attendance of about 500 members with venue requirements being a large plenary auditorium and 10-12 breakout rooms of different capacities, ideally in U shaped setting. A large space for posters/demos, preferably in an area that can also accommodate a reception, is also required.

Other data sharing and stewardship organisations are also likely to want to collocate their meetings in the days before and after International Data Week.

Focus and Scope of International Data Week

SciDataCon is motivated by the conviction that the most significant research challenges?and in particular, the pressing and multidisciplinary issues relating to global sustainability in the face of ongoing natural and human-induced changes to the planetary system?cannot be properly addressed without paying attention to issues relating to data: including policy frameworks, quality and interoperability, long-term stewardship, and the research skills, technologies, and infrastructures required by increasingly data-intensive science.

The unprecedented explosion in the capacity to acquire, store and manipulate data and information and to communicate them globally, is a world historical event involving a revolution in knowledge creation far more profound and pervasive than that associated with Gutenberg?s invention of the printing press.  It poses challenges to the fundamental process of open scrutiny of scientific evidence?the data?on which concepts are based.  It offers new opportunities to identify patterns and processes in phenomena that have hitherto been below our capacity to resolve.  It challenges us to develop new modes of collaboration and coordinated action that are needed to maximize scientific and social benefit.  These developments also challenge deep-seated scientific norms and many of the habits of researchers and their institutions. SciDataCon addresses these urgent issues for science, both nationally and internationally.

SciDataCon aims to provide the forum for scientific and evidence based discussion of these issues across disciplines.  Its objectives and characteristics are as follows:

  • to advance an international agenda for Open Data and Open Science and engaging the spectrum of research disciplines, including the social sciences and arts and humanities;
  • to provide a forum that brings together researchers that create and use data with the other professional communities involved in data science, data stewardship and data and science policy;
  • to fill the gap for a quality research conference that addresses this range of issues;
  • to be genuinely global and relevant: this means supporting attendance from the global south and addressing research grand challenges and major science policy issues.

to develop and implement specific technologies, specifications, policies, and practices that improve data sharing.

Notes for July 15th Meeting at NSF by Brand Niemann

Source: Word

Dates That I Know About

September 23-25: 6th Plenary Climate Change Data Challenge, Paris

Download the Climate Change Dataset Catalogue and choose your datasets

Register for the Climate Change Data Challenge and complete the preliminary information

Start working on your winning solution

August 31: Update your application & submit final solution

September 24: 6th Plenary Experimentation Day, Paris

July 15: Deadline for Application for Experimentation Day

July 15: Data Science for RDA Climate Change Data Challenge at NITRD FASTER CoP, NSF-Ballston

September 28: Climate Change Data - Data Science Meetup of Meetups at the Federal Big Data Working Group Meetup, Tysons Corner Virginia and Remote

August 5-7: NSF Graduate Data Science Workshop & Community Building, Seattle

November2-7: NSF Big PI/Data Hubs and Data Science Meetups (6-7), Washington DC

Monday-Tuesday, November 2-3, 2015: Big Data PI Meeting

Wednesday-Thursday, November 4-5, 2015: Big Data Hubs

Friday, November 6, 2015: Big Data Hubs/Meetups

Saturday, November 7, 2015: Meetups Meeting

This suggests that the participants and content from the Big Data Hubs will join the Data Science Meetups selected to attend on Friday and then the selected Data Science Meetups will meet on Saturday.

My questions are

Will some or all of these days (November 2-7) be both in-person and virtual?

Are the Big Data Hubs and Meetups overlapping because some Meetups are also ?data hubs? because of their content?

Is the Meetup Meeting to motivate Meetups to support the three objectives of NSF?s Strategic Plan: More Big Data Scientists, Better Big Data Infrastructure, and More Big Data Science Data Publications (my paraphrasing here)?

Do we want to make this a sustainable community activity by say joining it to say the NSF Earth Cube Community and networking these Data Science Meetups in an organized way? See: https://www.nsf.gov/geo/earthcube/  and http://earthcube.org/  

Geoscience domain experts need data scientists and data scientists need geoscience domain experts and their data.

Is NSF providing the space and refreshments for the entire week or are the Meetups on Saturday to be held somewhere else?

Hopefully the answers will help get us started on developing a plan for this.

Climate Data Sets I Prepared

RDA Climate Data Challenge: Only 17 of 64 could be used so far.

NTRD: 36 Shape (problem reading largest file).

Climate.Data.gov: 16 of 38 used so far.

U.S. Climate Resilience Toolkit: 63 data sets used in 80 Case Studies. Using Climate Data, Satellite Imagery, and Local Knowledge to Prevent Famine uses 6 data sets (the maximum for any case study), so this would be the best one for integrating multiple data sets.

Message to 65 Largest Data Science Meetups and Groups

Includes 15 Data Science Meetups and 5 other Groups

Climate Change Data - Data Science Meetup of Meetups, September 28th, In Person and Virtual

http://www.meetup.com/Federal-Big-Data-Working-Group/events/223786335/

In support of the NSF Data Science / Big Data Community and the Research Data Alliance (RDA), Semantic Community has prepared four multiple data set data sets from the RDA Climate Change Data Challenge, U.S. National Transportation Atlas Database (NTRD), Climate.data.gov, and the U.S. Climate Resilience Toolkit, to jump start the Federal Big Data Working Group Meetup, and other data science meetup participants, for our September 28th Meetup of Data Science Meetups, to prepare for the NSF Meetup of Data Science Meetups, November 6-7, 2015.

All of the information is a Data FAIRPort (Free, Accessible, Interoperable, and Reusable) in a Data Science Commons or Hub as a community service. Suggestions and feedback are welcomed.

http://semanticommunity.info/%40api/deki/files/34452/BrandNiemann09282015.pptx?origin=mt-web 

http://semanticommunity.info/Data_Science/Data_Science_for_RDA_Climate_Change_Data_Challenge 

Thank you, Brand

Dr. Brand Niemann

Director and Senior Data Scientist/Data Journalist

Semantic Community

http://semanticommunity.info 

http://www.meetup.com/Federal-Big-Data-Working-Group/

Responses

DataKind New York, NY Nonprofit - Miriam Young (publicity)

Big Data Science - Shyam Sarkar (do something)

Data Science Spain - Carla Martinez (do something)

Big Data Utah Salt Lake City, Utah Collaboration - Nick Baguley (do something)

Data Community DC Washington, DC Meetup - Harlan Harris and Tony Ojeda (help host)

#BetaNYC, NYC's open data, open gov, & civic tech community - Noel Hidalgo (initially publicity)

Sophia Liu - USGS Mendenhall PostDoc (did a joint Meetup Monday with USGS managers in attendance)

Philip Journeau: Setting-up the REVUER organization in the US and internationally (Data Science Meetup Cluster?)

Federal Big Data Working Group Meetup (29 RSVPs for today and 22 so far for September 28th)

Example Response: Dr. Sarkar, Thank you for the response and the answer is yes, organizing a meetup of your meetup to use the climate change data sets and report the results before our September 28th meetup will help us plan the November 6-7 meetup of meetups for NSF in Washington DC.

You can then post those results to you meetup page or wherever you would like for all to see and benefit from them.

You have a very large meetup and I am very interested in your series on: IoT Applications, Cancer Genomics and Big Data Sciences, since we are having some meetups on those topics as well.

Data Science on Data Science Meetups and Climate.Data.gov Data Sets

Number of Meetups Verses Number of Members for 60 Largest Data Science Meetups and 15 NSF Meetups and Other Groups for Initial Survey

DataScienceMeetups-Spotfire.png

Climate.Data.gov (07092015) 548 Data Sets Showing Only About 50% Are Machine Readable and in Standard Formats

Climate.data.gov07092015All.png

Global Ebola Response Data

Source: https://ebolaresponse.un.org/data

My Note: I am able to cleanly captured this page with data table.

Data on the Ebola outbreak

 

The epidemiological data is updated by the World Health Organisation (WHO). Below we publish excerpts from the weekly WHO: Ebola Situation Report. My Note: This link goes to WHO Ebola data and statistics http://apps.who.int/gho/data/node.ebola-sitrep See Below

More data on the Ebola Response can be found on the UN-supported Humanitarian Data Exchange. My Note: This page goes to HDX Beta https://data.hdx.rwlabs.org/ebola with link to 53 data sets https://data.hdx.rwlabs.org/search?s...ator=0&q=ebola See Below

WHO Ebola Situation Report - 8 July 2015

There were 30 confirmed cases of Ebola virus disease (EVD) reported in the week to 5 July: 18 in Guinea, 3 in Liberia, and 9 in Sierra Leone. Although this is the highest weekly total since mid-May, improvements to case investigation and contact tracing, together with enhanced incentives to encourage case reporting and compliance with quarantine measures have led to a better understanding of chains of transmission than was the case a month ago. This, in turn, has resulted in a decreasing proportion of cases arising from as-yet unknown sources of infection (5 of 30 cases in the week to 5 July), particularly in previously problematic areas such as Boke and Forecariah in Guinea, and Kambia and Port Loko in Sierra Leone. However, significant challenges remain. A residual lack of trust in the response among some affected communities means that some cases still evade detection for too long, increasing the risk of further hidden transmission. The exportation of cases to densely populated urban areas such as Freetown and Conakry remains a risk, whilst the origin of the new cluster of cases in Liberia is not yet well understood.

In Guinea, cases were reported from the same 3 prefectures?Boke, Conakry, and Forecariah?that reported cases the previous week. The northern prefecture of Boke, which borders Guinea-Bissau, reported 6 cases, compared with 10 the previous week. All but one of these cases was a registered contact, with a single case reported to have arisen from an as-yet unknown source of infection. The single case reported from Conakry came from the Matam commune (municipal district) of the city, and was a known contact of a previous case from Benty sub-prefecture in Forecariah. The remaining 11 cases were reported from the prefecture of Forecariah, 9 of which were reported from the sub-prefecture of Benty. All but 2 of the 11 cases reported from Forecariah were known contacts of a previous case or have an established epidemiological link to one.

Liberia was declared free of Ebola transmission on 9 May 2015, after reporting no new cases for 42 consecutive days. The country subsequently entered a 3-month period of heightened surveillance, during which approximately 30 blood samples and oral swabs are collected each day from potential cases and tested for EVD. On 29 June, this routine surveillance detected a confirmed case of EVD in Margibi County, Liberia?the first new confirmed case reported from the country since 20 March. The case was a 17-year-old male who first became ill on 21 June, died on 28 June, and subsequently tested positive for EVD. Two contacts of the first-detected case have since been confirmed as EVD-positive. These additional cases are from the same small community as the first-detected case, and are now being treated in an Ebola Treatment Centre (ETC) in the capital, Monrovia. In addition, a probable case is in isolation at an ETC. The case has a strong epidemiological link to the first-detected case and is showing some symptoms of EVD, but has indeterminate test results for EVD. The origin of infection of the cluster of cases is currently under investigation. At present, these cases are considered to constitute a separate outbreak from that which was declared over on 9 May.

In Sierra Leone, 9 cases were reported from the same 3 districts as the previous weeks: Kambia, Port Loko, and the district that includes the capital, Freetown. One-third (3) of all cases reported from Sierra Leone arose in the densely populated Magazine Wharf area of Freetown. All 3 cases were registered contacts of a previous case. Four chiefdoms in Kambia each reported a single confirmed case of EVD, as did two chiefdoms in the neighbouring district of Port Loko. All but one of these cases were known contacts of a previous case or have an established epidemiological link to one.

Confirmed, probable, and suspected cases in Guinea, Liberia, and Sierra Leone

Country Case definition Cumulative cases Cumulative deaths
Guinea Confirmed 3287 2049
Probable 450 450
Suspected 11 Data not available
Total 3748 2499

Liberia*

Confirmed 3151 Data not available
Probable 1879 Data not available
Suspected 5636 Data not available
Total 10 666 4806
Liberia ** Confirmed 3 1
Probable 1 Data not available
Suspected Data not available Data not available
Total 4 1
Sierra Leone Confirmed 8674 3574
Probable 287 208
Suspected 4194 158
Total 13 155 3940
TOTAL   27 573 11 246

Data are based on official information reported by the ministries of health, through WHO country offices.These numbers are subject to change due to ongoing reclassification, retrospective investigation and availability of laboratory results. 

* Until 9 May 2015

** Cases and deaths reported after 9 May 2015 are considered to constitute a SEPARATE outbreak to that which was declared over on 9 May. 

For more information, read the full report.

WHO Ebola data and statistics

Source: http://apps.who.int/gho/data/node.ebola-sitrep

Disclaimer

This data set represents the best estimates of WHO using methodologies for specific indicators that aim for comparability across countries and time; they are updated as more recent or revised data become available, or when there are changes to the methodology being used. Therefore, they are not always the same as official national estimates, although WHO whenever possible will provide Member States the opportunity to review and comment on data and estimates as part of country consultations. Note that these numbers are subject to change due to ongoing reclassification, retrospective investigation and availability of laboratory results. Please check the Indicator and Measurement Registry for indicator specific information.

In this section:

Countries with intense transmission

Source: http://apps.who.int/gho/data/node.eb...ntries?lang=en

In this section:

Humanitarian Data Exchange Data Sets

Source: https://data.hdx.rwlabs.org/search?s...ator=0&q=ebola

Number of health-care workers deaths by EVD

HDX - July 8, 2015
Cumulative number of health-care workers deaths by EDV. Extracted from WHO: Ebola Response Roadmap Situation Reports, the latest of which was on 8 July 2015.

Number of health-care workers infected with EVD

HDX - July 8, 2015
Cumulative number of health-care workers infected with EDV. Extracted from WHO: Ebola Response Roadmap Situation Reports, the latest of which was on 8 July 2015.

Number of Ebola Cases and Deaths in Affected Countries

HDX - July 8, 2015
Total number of probable, confirmed and suspected Ebola cases and deaths in Guinea, Liberia, Sierra Leone, Nigeria, Senegal, Mali, Spain USA, UK and Italy according to Ebola Data and Statistics.

3W OCHA Guinea as of 16 June 2015

OCHA Guinea - June 17, 2015
illustration of who does what and where in Guinea

Data for Ebola Recovery

MIT Governance Lab - June 13, 2015
UPDATED JUNE 2015: SEE ALSO: https://data.hdx.rwlabs.org/dataset/data-for-ebola-recovery-march for data from our first and second survey waves, conducted in December 2014 and March 2015 respectively. Data on health, economic livelihoods, food security, and ebola vigilance from representative survey of Monrovia conducted in December 2014. Full details, results and ... More

Funding Coverage of the Ebola Virus Outbreak Emergency

HDX - April 30, 2015
Time series representing coverage -- the amount of funds covered -- from the Ebola response appeal. For a more detailed information, please consult the dataset Financial Tracking Data for the Ebola Virus Outbreak in West Africa. This dataset is updated every other day.

Topline Ebola Outbreak Figures

HDX - April 27, 2015
Collection of topline figures about the Ebola outbreak and response.

Mali Health Districts

OCHA Mali - April 23, 2015
The data was updated in April 2014 and is a polygon feature of health districts in Mali

Sierra Leone NERC Ebola Care Facilities Master List

UNMEER - April 20, 2015
This spreadsheet is the best understanding at this point in time of the numbers of Ebola care facilities in Sierra Leone, their details and types. This data has been compiled from a number of sources, namely the UNMEER Information Management Officers, the District Ebola Response Centres, donors and partners. This list provides and update on facility status as ... More

3W Ebola Sierra Leone

UNMEER - April 8, 2015
This data best describes 3W activities in Sierra Leone.

Guinea Ebola Community Care Centre

UNMEER - April 7, 2015
Ebola Community Care Centre

Ebola outbreaks before 2014

HDX - April 7, 2015
This dataset contains a spreadsheet that lists the ebola outbreaks that occurred from 1976 to 2013.

Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone

>OCHA ROWCA - April 2, 2015
Ebola outbreak time series data at national and sub national levels since March 2014. Data compiled manually from a number of published reports. Updated by OCHA ROWCA every working day.

Guinea 3W Ebola Response

Dataset with 3W data from Guinea. This dataset is regularly updated.

Ebola Treatment Centers or Units (ETCs or ETUs)

UNMEER - March 18, 2015
This dataset represents the best-known collection of status and location of the facilities known as EbolaTreatment Centers or Ebola Treatment Units in Guinea, Liberia and Sierra Leone, with relevant attributes and information. Please forward any mistakes or requested changes to unmeer.im@gmail.com. Updated frequently.

Safe and Dignified Burial Teams

This data shows the location and number of teams assigned to complete safe and dignified burials, a key pillar of the Ebola Response Roadmap. Note that in some cases there may be teams reported available at a national level (i.e. "Liberia has 73 teams available") but as they are not known to be tied to a specific district they may not be reflected in this dataset. ... More

Logistics Bases and Facilities

UNMEER - March 18, 2015
This dataset references a shapefile that depicts the location of UNMEER Forward Logistics Bases related to theEbola Response.

Sub-national Data of Confirmed Cumulative Ebola by Gender

OCHA ROWCA - March 10, 2015
This dataset contains sub-national data for Liberia, Guinea, and Sierra Leone broken-down by gender. The dataset only contains data on the confirmed Ebola cases. This dataset is updated occasionally. Check this page later for updates. Last updated on: 15 October 2014

Internet and radio services in Liberia, Sierra Leone, Guinea

Where you can find internet, radio, and satallite phone services provided by the ET Cluster to the humanitarian community fighting Ebola

Ebola Treatment Units - 3 Word Addresses

what3words - February 26, 2015The Ebola Treatment Units collected by UNMEER now with 3 word addresses so that partners can communicate the precise location of each unit quickly and easily. 

West Africa Movement Restrictions

BRC Maps Team - February 23, 2015
Movement restrictions in the Ebola affected Countries of Sierra Leone, Guinea and Liberia

Guinea - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data

WFP - World Food Programme - February 13, 2015
Reduced Coping Strategies Index and Food Prices data. Data collected using interactive voice response (IVR).

Health facilities in Guinea, Liberia, Mali and Sierra Leone

Standby Task Force - February 13, 2015
This dataset contains a link to a Google Spreadsheet containing lists of the health facilities (including the facility name, status, facility type, location etc) in Guinea, Liberia, Mali and Sierra Leone. The data was compiled by the Standby Task Force from various sources.

Health Facilities Liberia oct 2014

Standby Task Force - February 13, 2015
Data collected, collated and cleaned by Standby Task Force volunteers during September and October 2014. Collated from UNMIL, LISGIS, BRC, iLab Liberia and open, online websites.

Ebola Testing Laboratories

World Health Organization - February 13, 2015
This data depicts the location and status of laboratories that are actively testing samples of Ebola Virus Disease (EVD) for verification. Laboratory status, capacity, sponsoring organization and other attributes are displayed. Contact ebolamaps@who.int with any questions or requested amendments. The data is updated remotely as updates are provided and this location ... More

EVD Cases by district

World Health Organization - February 13, 2015
This data is made available from the WHO - Global Health Observatory. Data on new probable and confirmed cases by epi week and district. There is a file for each country Liberia, Sierra Leone, and Guinea. All the data from the countries can be found at: http://apps.who.int/gho/data/node.ebola-sitrep.ebola-country?lang=enThere are 2 values for each district. One is ... More

Liberia ETU Constructions

U.S. DoD - February 13, 2015
This xls provides Name, Lat/Long, Construction status of ETUs that the U.S. DoD are building

West African Health Centres - 3 word addresses

what3words - February 13, 2015
We have provided the 3 word addresses of each health centre within the West African Region. what3words is a simple, real-time, location referencing system which solves many of the key logistical issues facing aid and humanitarian organisations, for whom street addresses, GPS co-ordinates, and other systems don't exist or are problematic. Using words means non- ... More

Weekly EVD cases by country

World Health Organization - February 13, 2015
EVD Cases by country Nov. 7, 2014. Very important to note that the dataset now includes 2 data sources. The data can be used for trends BUT the case totals will not match WHO SitRep because those are derived using a combination of the 2.

InterAction Member Activities related to Ebola Response

InterAction - February 13, 2015
Ebola-related activities conducted by InterAction members. This dataset is compiled from InterAction's NGO Aid Map. The full NGO Aid Map dataset is available here.

Ebola GeoNode

HDX - February 13, 2015
This is a partnership platform for sharing geospatial data, analysis and maps related to the Ebola emergency response. The platform is intended to minimize the time that GIS analysts spend locating up-to-date data. Users are able to make maps on the fly, view metadata, and access the reports behind GIS layers. Curators are working to ensure that the layers are recent, ... More

Ebola Community Care Centers

UNMEER - February 13, 2015
This data depicts the location and various attributes of Community Care Centers (CCCs) - dedicated care facilities for Ebola patients that are smaller than a full Ebola Treatment Unit. CCC locations for Liberia geolocated with assistance from Standby Task Force (SBTF)

Travel Distance and Time Chart

World Health Organization - February 13, 2015
Chart of estimated travel time and distances by district.

Direct Relief Ebola Materials Shipped

Direct Relief - February 13, 2015
This dataset consists of medical materials shipped to partners involved in the West Africa Ebola response from July 22, 2014 until December 31, 2014. Please note that in some cases the recipient address is in the United States given that the recipient hand carried materials into West Africa.

Sub-national Indicators Ebola Countries

Global Data Lab (GDL) - February 13, 2015
Sub-national indicators for the Ebola countries Guinea, Liberia and Sierra Leone and their neighboring countries. Indicators are household wealth (International Wealth Index), education (years of education), work (farm, lower nonfarm and upper nonfarm), assets (TV, phone, floor material, toilet facility, sleeping rooms), utilities (electricity, water) and age distribution of population.

Sierra Leone OSM Roads data attributed with road surface classification

MapAction - February 13, 2015
Nov 2014 OSM data with attribute classifying roads by surface material/quality.

OpenStreetMap Small Devices Offline Map & Navigation data, West Africa

Offline Map and road navigation data (Daily updates) Guinea, Liberia and Sierra Leone. OpenStreetMap EbolaResponse Android Systems : Install OSMAnd Android Application for Road Navigation with Voice guidance. GPS Garmin Map More information

OpenStreetMap ShapeFiles for GIS softwares (Daily updates)

This data can be imported to GIS software, such as Quantum GIS or ESRI. Guinea, Liberia, Mali and Sierra Leone. OpenStreetMap Ebola Response

Sierra Leone: Education Establishments

HDX - February 13, 2015
Data which shows locations of Educational establishments which includes, Primary, Secondary schools and Colleges

Sierra Leone - Ebola: WFP VAM Food Security Reduced Coping Strategies Index and Price Data

WFP - World Food Programme - February 13, 2015
Reduced Coping Strategies Index and Food Prices data. Data collected using text message (SMS).

Shape Files of the ETUs

U.S. DoD - February 13, 2015
Shape files of the various ETUs

OpenStreetMap Settlement Place Names, West Africa

Overpass Service Query to extract from the live OpenStreetMap database the Settlement place names for Guinea, Liberia, Mali and Sierra Leone. OpenStreetMap Ebola Response These links launch the Overpass Turbo web application to extract live data. Data is downloaded automatically. Rename the file called "interpreter" for better documentation of the Query content. More information

OpenStreetMap GIS data on Guinea, Liberia, and Sierra Leone

HDX - February 13, 2015
This dataset contains shapefiles for Guinea, Liberia, and Sierra Leone from the OpenStreetMap (OSM) project. Each country has its individual file. The dataset counts with contributions of hundreds of users. This dataset is updated daily. The original dataset can be downloaded from the OSM West Africa Ebola response wiki.

Community Care Centers in Guinea

OCHA Mali - February 13, 2015
Communauty Care Centers

Mobility patterns and population densities for West Africa

Flowminder - February 13, 2015
Here we provide version 1 Flowminder (www.flowminder.org) human mobility models for West Africa, together with WorldPop population density data for the region, to support ongoing efforts to control the ebolaoutbreak. Before downloading any data, please read the documention carefully as it provides details on the datasets and models provided through the links below. ... More

Matrix 4W - WASH Cluster - Guinea

WASH Cluster Guinea - February 13, 2015
Matrix 4W (Who What Where When) - WASH Cluster - Guinea

Mali Health Facilities

OCHA ROWCA - February 13, 2015
List of health facilities in Mali. Categories: Hospital, CSCOM, CSREF including administrative levels and PCODES

3W Dataset on the Organizations Involved in the Response to the Ebola Crisis

OCHA ROWCA - February 13, 2015
Who, What, Where (3W) dataset on the Ebola response effort. Some entries have a maximum level of desegregation up to administrative level 3. The dataset contains data from Guinea, Liberia, Sierra Leone, and Nigeria. This dataset is updated weekly. Last Update 17 Nov. 2014 Note: If your humanitarian organization would like to make a correction or update the dataset, ... More

Lat/Long/Names of ETUs in Liberia updated as of 21 Oct

U.S. DoD - February 13, 2015
Lat/Long Names of ETUs in Liberia as of 21 OCt

Sierra Leone update 1501 Health Facilities Nov 2014

Standby Task Force - February 13, 2015
Updated January 2015 with some cleaning of dataset. Health Facilities, with OCHA P-codes for level 1 and 2, Chiefdoms names, category of HF. The 2014 update info refers only to the pre-Ebola outbreak status. Please reflect this in the table to avoid misrepresentation as it is not actual (Oct 2014) status but from early in the year.

Number of existing beds in EVD treatment units

HDX - December 31, 2014
Total number of existing beds in EVD treatment units. Extracted from WHO: Ebola Response Roadmap Situation Reports, the latest of which was on 31 December 2014.

Commonly Used Abbreviations in United Nations Logistics

Open Crisis - November 3, 2014
?Commonly Used Abbreviations in United Nations Logistics
Scraped from http://www.un.org/en/peacekeeping/sites/coe/referencedocuments/COE%20Acronyms.pdf on 20th October 2014 Listed by ?Contingent Owned Equipment Unit / COE & PMSS/ UNHQ Dated October 2008 COE website: ?http://www.un.org/Depts/dpko/COE/home

Research Data Alliance Outputs - Download the booklet!

Source: https://rd-alliance.org/rda-outputs.html

Submitted by TimeaBiro on Wed, 20/05/2015 - 11:17

 

The Research Data Alliance (RDA)  rises to the challenge of changing global data practices by providing concrete solutions to address some of today?s many, many data challenges. 
 
Two years since its launch RDA has already published tangible outputs aiming to achieve seamless interoperability, trust, and ultimately to provide growth & employment opportunities by making data re-use less expensive.
So far 8 RDA Working Groups have provided Outputs. Working groups are envisioned as accelerants to data sharing practice and infrastructure in the short-term with the overarching goal of advancing global data-driven discovery and innovation in the long-term.
 
The booklet provides an overview of the Outputs pushing forward for: 
  • New data standards or harmonization of existing standards.
  • Greater data sharing, exchange, interoperability, usability and re-usability
  • Greater discoverability of research data sets
  • Better management, stewardship, and preservation of research data
Attachment Size
 RDA_Outputs_May2015_web.pdf 1.11 MB

My Note: PDF converted to Word

Research Data Alliance Outputs

Cover Page

CoverPage.png

Inside Cover Page

InsideCoverPage.gif

Foreword

The Research Data Alliance is building the social and technical bridges that enable open sharing of data

The so called data revolution isn't just about the volume of scientific data; rather, it reflects a fundamental change in the way science is conducted, who does it, who pays for it and who benefits from it. And most importantly, the rising capacity to share all this data  electronically, efficiently, across borders and disciplines magnifies the impact.

The Data Harvest Report,
John Wood
Chair, Research Data Alliance-Europe, Co-Chair, RDA Foundation (Global)

My Note: This version omits Dr. Francine Berman's Foreword

Connecting the Data Dots: Building Impact

The Research Data Alliance (RDA) rises to the challenge of changing global data practices by providing concrete solutions to address some of today?s many, many data challenges.

Participation in the RDA is open to anyone  who  agrees  to  the RDA principles. Data practitioners, community representatives, scientists and technologists come  together  through  focused global Working Groups, exploratory Interest Groups to exchange knowledge, share discoveries, discuss barriers and potential solutions, explore and define policies and test as well as harmonise standards, and recommend  pre-existing  standards  to enhance and facilitate  global  data  sharing.  Coupled  with  this  RDA boasts a broad, committed membership of individuals and organizations dedicated to improving data exchange.

Two years since its launch RDA has already published tangible outputs aiming to achieve seamless interoperability, trust, and ultimately to provide growth and employment opportunities by making  data  re-use  less expensive.

So far 8 RDA Working Groups have  provided Outputs.  Working groups are envisioned as accelerants to data sharing practice and infrastructure in the short-term with the overarching goal of advancing global data-driven discovery and innovation in the long-term.

In the widest sense the group outcomes are pushing forward for:

  • New data standards or harmonization of existing   standards.
  • Greater data sharing, exchange, interoperability, usability and re-usability.
  • Greater discoverability of research data  sets.
  • Better management, stewardship, and preservation of research data.

The 4th RDA Plenary Meeting in Amsterdam (22-24 September 2014) themed ?Reaping the fruits? showcased the first concrete outputs from the RDA Working Groups

  • Data Foundation & Terminology: a model for data in the registered   domain.
  • PID Information Types: a common protocol for providers and users of persistent ID services   worldwide.
  • Data Type Registries: allowing humans and machines to act on unknown, but registered, data types.
  • Practical Policy: defining best practices of how to deal with data automatically and in a documented way with computer actionable policy.

The 5th RDA Plenary in San Diego (8-11 March 2015) took important steps forward in facilitating the uptake of the first set of outputs under the ?adopt a deliverable? theme as well as marking the launch of the second group of  outputs:

  • Metadata standards directory: Community curated standards catalogue  for  metadata  interoperability
  • Data Citation: defining mechanisms to reliably cite dynamic data
  • Data Description Registry Interoperability solutions enabling cross- platform  discovery  based  on  existing  open  protocols and standards
  • Wheat Data Interoperability impacting the discoverability, reusability and interoperability of wheat data by building  a common framework for describing, representing linking and publishing  wheat  data

In addition the Data Fabric group is working with these and other planned outputs to develop a framework for more efficient data management and processing in a  loosely  coupled  manner.  This will  ultimately  aid  reproducible  data  science.  All  RDA  groups are working together to come up with components that will fundamentally change data practices with a wide agreement on turning data into digitally actionable objects, with a persistent identifier and adequate metadata.

Why should these be adopted?

Current data practice challenges are many. Managing, re-using and combining data in science, industry and society is very inefficient, it takes up too much time and binds creative minds. The results produced by data driven work is barely reproducible with an associated lack of trust. A global change of practices is accepted as being an urgent demand, yet there is a severe lack of direction, guidance and trained data experts. Excellent island solutions testing out various options have been developed by different labs, and companies all claiming to have the optimal solutions. Similar to the early Internet this diversity highlights an urgent need for convergence and collaboration.

Adoption of RDA results will lead  to:

  • Efficient use and re-use of data and reducing related   costs
  • Increased trust in data science results based on transparent reproducibility.
  • Better scientific contribution to society?s  grand  challenges.
  • Take up by small companies and entrepreneurs to develop smart data applications for society at   large.
  • Economic growth & increased employment for data,and other,professionals.
Who benefits?
Data Citation

Researchers can cite data that is subjected to change. When data gets modified, all changes are reflected in the citation information that includes a time-stamp & version  history.

Data Description Registry Interoperability (DDRI)

Infrastructure providers & data librarians to find connections across research data registries and create  global  views  of research   data.

Data Foundation & Terminology (DFT)

Scientific Communities through increased cross disciplinary data exchange and interoperability.

Developers by creation of interoperable data management & processing systems.

Data Type Registries (DTR)

Researchers by easily processing or visualising content of unknown  data type.

Machines by automatically extracting relevant information from any registered  data type.

Metadata

Researchers & service providers to re-use existing standards, to match and map metadata standards leading to interoperability.

PID Information Types (PIT)

Providers by offering a unified access method to all PID service  users worldwide.

Developers by supporting just one interface and thus drastically decreasing programming  effort.

Practical Policy (PP)

Data managers & scientists by executing documented workflow chains to improve trust.

Researchers by creating reproducible science with the help of documenting procedures.

Wheat Data Interoperability

Data managers & scientists will benefit from the creation of a framework to support the establishment of a global wheat information system.

How do all these dots connect?

Based on similar principles, like those of  the  Internet  community, the Research Data Alliance was started and is run by practitioners for practitioners to build social  and  technical  bridges  that enable open sharing of data. Through over 60 focused Working Groups (https://www.rd-alliance.org/groups/working-groups) and exploratory Interest Groups (https://www.rd-alliance.org/groups/interest-groups), RDA is working towards making data publishing ? the end result of data science - more efficient and developing a complete framework for more efficient data management and processing and ultimately reproducible   data  science.

Delivering on Promises

RDA?s intent is to create deliverables that are developed and used by the community to facilitate data sharing and re-use. Already at this early stage outputs are being adopted by relevant scientific initiatives and organisations in the US and Europe. Through pilot studies they are identifying the potential, limitations and the effort implied in making use of these results for their scientific and infrastructure interests.

Data Foundation and Terminology Working Group

Co-Chairs:

Gary Berg-Cross - Research Data Alliance Advisory Council, Washington D.C.

Raphael Ritz - Max Planck Institute for Plasma Physics

Peter Wittenburg - Max Planck Institute for Psycholinguistics

What is the problem?

Unlike the domain of computer networks where the TCP/IP and ISO/OSI models serve as a common reference point for everyone, there is no common model for  data organisation, which  leads to the fragmentation we currently see everywhere in the data domain. Not having a common language between data communities, means that working with data is very inefficient and costly, especially when integrating cross-disciplinary data. As Bob Kahn, one of the Fathers of the Internet, has said, ?Before you can harmonise things, you first need to understand what you are talking about.?

Data Foundation and Terminology Working Group Figure 1.png

This diagram describes the essentials of the basic data model that the DFT group worked out in a simplified way. Agreeing on some basic principles and terms would make a lot of difference in data practices.

For the physical layer of data organisations, there is a clear trend towards convergence to simpler interfaces (from file systems to SWIFT-like interfaces 1). For the virtual layer information, which includes persistent identifiers, metadata of different types including provenance information, rights information,  relations between digital objects, etc., there are endless solutions that create enormous hurdles when federating. To give an idea of the scale of the problem, almost every new data project designs yet more new data organisations and management solutions.

We are witnessing increasing awareness of the fact that at a certain level of abstraction, the organisation and management of data is independent of its content. Thus we need to change the way we create and deal with data to increase efficiency and cost-effectiveness

What are the goals?
  • Pushing the discussion in the data community towards an agreed basic core model and some basic principles that will harmonize the data organization  solutions.
  • Fostering an RDA community culture by agreeing on basic terminology arising from agreed upon reference models 

When talking about data or designing data systems, we speak different languages and follow different organization principles, which in the end, result in enormous inefficiencies and costs. We urgently need to overcome these barriers to reduce costs when federating data.

What is the solution?

Based on 21 data models presented by experts from different disciplines and about 120 interviews and interactions with different scientists and scientific departments, the DFT WG has defined a number of simple definitions for digital data in a registered 2 domain based on an agreed conceptualisation.

These  definitions include:

  • Digital Object is a sequence of bits that is identified by a persistent identifier and described by metadata.
  • Persistent Identifier is a long-lasting string that uniquely identifies a Digital Object and that can be persistently resolved to meaningful state information about the identified digital object (such as checksum, multiple access paths, references to contextual information etc.).
  • A Metadata description contains contextual and provenance information about a Digital Object that is important to find, access and interpret it.
  • A Digital Collection is an aggregation of digital objects that is identified by a persistent identifier and described by metadata. A
  • Digital Collection is a (complex) Digital Object.

A number of such basic terms have been defined and put into relation with each other in a way that can be seen as spanning a reference model of the core of the data organisations.

What is the impact?

The following benefits will come from wide adoption of a harmonized terminology:

  • Members of the data community from different disciplines will be able to interact more easily with each other and come to a common understanding more rapidly.
  • Developers can design data  management  and  processing software systems enabling much easier exchange and integration of data from their colleagues in particular in a cross-disciplinary setting (full data replication for example could be efficiently done if there is an agreement on basic organization principles for data).
  • It will be easier to specify simple and standard APIs to request useful and relevant information related to a specific  Digital Object. Software developers would be motivated to integrate APIs from the beginning and thus facilitate data re-use, which currently is almost impossible without using information that is exchanged between people.
  • It will bring it a step closer to automating data processing where all can rely on self-documenting data manipulation processes and thus on reproducible data  science.
When can this be used?

The definitions have been discussed at RDA 4th Plenary meeting (September 2014) and are available as a document and on a semantic wiki to invite comments and usage since January 2015. RDA and the group members will take care of proper maintenance of the definitions. For more information see

https://rd-alliance.org/group/data-f...nology-wg.html http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page

In the next phase of the work, more terms will be defined and interested individuals will have the opportunity to comment via the semantic wiki.

References

1 https://wiki.openstack.org/wiki/Swift

2 There will always exist data in private, temporary stores, which will not be made accessible in a standard way. My Note: I do not find this reference above.

Data Type Registries Working Group

Co-Chairs:

Larry Lannom - Corporation for National Research Initiatives

Daan Broeder - Max Planck Institute for Psycholinguistics

What is the problem?

Often researchers receive files from colleagues, follow links, or otherwise encounter data created elsewhere that they would like to make use of in their own work. However, they may not know how to work with it, interpret it or visualise its content, if they are unfamiliar with the specifics of the structure and/or meaning of the data. Frequently, researchers end up not using such data, since it requires extra work to look for explanations and tools, (and install these tools where necessary) so that they can access the data.

What are the goals?

The aim of the Data Type Registries Working Group (DTR WG) was to allow data producers to record the implicit details of their data in the form of Data Types and to associate those Types, each uniquely identified, with different instances of  datasets.

Linking data type identifiers to datasets will provide, data consumers with an indication of the type of datasets they encounter. This means being able to determine which services (and other useful information) to use, to understand and to process the data, without additional support from the respective data producers. DTRs are meant to provide machine-readable information, in addition to presenting human readable information.

What is the solution?

DTRs offer developers or researchers the ability to add their type definitions in an open registry and, where useful, add references to tools that can operate on them. For example, a user who received an unknown file could query a DTR and receive back a pointer to a visualisation service able to display the data in a useful form.

A fully automated system could use a DTR, much like the MIME type system enables the automatic start of a video player in the browser once a video file has been identified. We envision humans taking advantage of Data Types in DTRs through the type definitions that clarify the nuanced and contextual aspects of structured datasets.

Precise typing of data sets and collections, combined with one or more registries that define those types in a standard fashion, would benefit every sector of data management, especially interoperability and reuse.

Data Types in DTRs can be used  to  extend  or  expand  existing types, e.g., MIME types, which provide only container-level parsing information. They can additionally describe experimental context, relationships between different portions of data, and so on. Data Types are deliberately intended to be quite  open in terms of registration policies.

The DTR solution is particularly useful for:

  • Researchers dealing with data in a cross-disciplinary, cross-border context, who encounter unknown data types. Using the DTR service allows them to immediately process and/or visualize the content of such data types
  • Machines that want to extract the checksum information of a data object from a PID record to check whether the content is still the same. Without knowing the details of the PID service provider, the machine could ask for checksum for example, since this is an information type which all PID service providers agreed upon and registered in the DTR.
What is the impact?

The potential impact on scientific practices is substantial. Unknown data types as  described  above  can  be  exploited  without  any prior knowledge and thus an enormous gain in time and/or in interoperability can be achieved. In a similar way to the MIME types that allow browsers to automatically select visualization software plug-ins when confronted with a certain file type extension, scientific software can make use of the definitions and pointers stored in the DTR to continue processing without the user acquiring knowledge beforehand.

DTRs pave the way to automatic processing in the data domain, which is becoming increasing complex, without putting an additional load  on  the researchers.

However, the individuals who categorize data types, are required to enter the associated, relevant information into a DTR.

It is assumed that there will be a federation of such DTRs setup to satisfy different needs.

Data Type Registries Working Group Figure 1.png

This diagram illustrates how the Data Type Registry (DTR) works. A user or machine receives an unknown type (1) which can be a file or a term, for example. The DTR is contacted and returns information about an available service (2) this allows the user or machine to continue processing the content (3, 4) such  as visualizing an image without asking prior knowledge from the user. This makes cross-disciplinary and cross-border work much more efficient and enables  data driven science even to those who are not data experts.

When can this be used?

The first groups are building software to implement such a DTR concept and make the software available. The RDA PID Information Type (PIT) Working Group is already using the first DTR prototype version in its API. The latest version of a DTR prototype is available here: http://typeregistry.org/. Please check the information on the DTR WG?s  web page at:

https://www.rd-alliance.org/group/da...stries-wg.html for updates.

This simple model will be the start for designing DTRs, with the intention to extend the specifications according to priorities and usage.

PID Information Types Working Group

Co-Chairs:

Tobias Weigel, DKRZ

Timothy DiLauro, John Hopkins University

What is the problem?

Numerous systems and providers to register and resolve Persistent Identifiers (PIDs) for Digital Objects and other entities have been designed in the past and are used today. However, almost all of them differ in the way they allow researchers to associate additional information, such as for proving identity and integrity with the PID. For application developers this is an unacceptable situation, since for all providers a different Application Programming Interface (API) needs to be developed and maintained. If a researcher finds a useful file and wishes to check that it is still the same stream of bits, as when it was first created, the researcher should be able to request the checksum independent of the provider holding the PID. How should the researcher do this not knowing whether the provider offers this information and if so, how to request it? We can overcome such extreme inefficiencies only if all providers agree on a common API, register their information types in a common data type registry and agree on some core types, such as the checksum

What are the goals?

The aim of the PID Information Types Working Group (PIT) was to  :

  • Come to a core set of information types and register (and define) them in a commonly accessible Data Type Registry
  • Provide a common API and prototypical implementation to access PID records that employ registered types
What is the solution?

The PIT Working Group accomplished the following:

  • Defined and registered a number of core PID information types (such as checksum)
  • Developed a model to structure these information types
  • Provided an API, including a prototypical server implementation that offers services to request certain types associated with PID records by making use of registered types.

Due to high demand, a variety of trusted PID service providers have been set up already, yet all of the different attributes associated with the registered PIDs make the life of a software developer a nightmare. It is essential to harmonize the major information types and suggest a common API, so that if the  checksum is requested one has to program one piece of software independent of the provider.

The set of core information types currently provided can help to illustrate cross-discipline usage scenarios. It can also act as an example for a community-driven governance process creating and governing more user-driven types. PID service providers and community experts need to come together regularly and add types to the data type registry to make full use of the possibilities of the results of the PIT group.

It is now essential to convince PID service providers such as those using the Handle System (DOI, EPIC, etc.) to adopt the API to unify access. The diagram gives an example of the usage and potential of the suggested solution.

What is the impact?

It is important to envisage the situation in a few years, when the amount and complexity of data has been increased in all sciences and there is a greater need to rely on automatic processes, as human intervention means loss of efficiency. In such scenarios, particularly in the area of big data analytics, communities can exploit the wealth of the data domain by relying on semantic interoperability between all relevant actors. The above example is just one small usage scenario that would be enabled if the relevant PID service providers accept the results of the PIT WG and harmonize their approach. Application software writing would be reduced dramatically since only one API would be supported and one module would be sufficient for retrieving the checksum, for example, and checking identity and integrity.

The strengthening of PID information types could also move the existing identifier systems and the overall idea of identification  into a more central and fundamental position as suggested by DFT's core model of a Digital Object, leading to an enormous increase in efficiency when dealing with data.

PID Information Types Working Group Figure 1.png

Assume that you have a list of PIDs referring to data that you would like to use in a computation. Despite the fact that the PIDs might be registered at various providers, you would simply use a single module that reads (or 'selects') the relevant PID from the list of PIDs, and then submits a request to the appropriate resolver to send the checksum.

If all actors refer to the same entry in the DTR, interoperability is a given. That is, one module would be sufficient to retrieve the checksums, independent of the internal terminologies used by the various providers.

When can this be used?

Initial work has already been done on building software to implement a first prototype based on the defined PIT API. This first prototype works together with the DTR prototype and both are publicly available, but not designed for production use.
 
Please check the information and updates on the PIT group's web page at:

https://www.rd-alliance.org/group/pi...-types-wg.html.

It is now time to convince the PID service providers to adopt the solution.

Practical Policy Working Group

Co-Chairs:

Reagan Moore, RENCI

Rainer Stotzka, Karlsruhe Institute of Technology

What is the problem?

Repositories' responsibilities for data stewardship and processing require a highly automated, safe and documented management strategy. Management policies need to be enforced, administrative policies need to be automated, and assessment validation policies need to be evaluated periodically.

With the increasing amount and complexity of data, repositories need to publish their policies and  procedures to build trust in their operation. By sharing policies, repositories can build upon discipline expertise, and implement improved procedures for ensuring trustworthiness.

Operations or chains of operations that are  computer actionable and enforced on collections of data objects can be based on the outcomes from the 'Practical Policy' (PP) working group. The outcomes are stated in natural languages and can be turned into robust and tested executable procedures. The ability to re-execute procedures is at the basis of reproducible science, an important element in the chain of building trust and one of the core elements in  repository  certification processes.

What are the goals?

The goals of the PP Working group were to:

  • Define computer actionable PPs that enforce proper management and stewardship, automate administrative tasks, validate assessment criteria, and automate types of scientific data processing
  • Identify typical application scenarios for practical policies such as replication, preservation, metadata extraction, etc.
  • Collect, register and compare existing practical policies
  • Enable sharing, revising, adapting and re-use of such practical policies and thus harmonize practices, learning from good examples and increasing trust

Since these goals were broad in scope, the PP WG focused its efforts on a few application scenarios for the collection and registration process.

Current practice in managing and processing data collections are determined by manual operations and ad- hoc scripts making verification of the results an almost impossible task. Establishing trust and a reproducible data science requires automatic procedures which are guided by practical policies. Collecting typical policies, evaluating them and providing best practice solutions will help all repositories and researchers.

What is the solution?

In order to identify the most  relevant  areas  of  practice,  the  PP WG conducted a survey as a first step. The analysis of the survey resulted in 11 highly important policy areas  which  were  tackled first by the WG: 1) contextual metadata extraction, 2) data access control, 3) data backup, 4) data formal control, 5) data retention, 6) disposition, 7) integrity (incl. replication), 8) notification, 9) restricted searching, 10) storage cost reports, and 11) use agreements.

Participants and interested experts were asked to describe their policy suggestions in simple semi-formal descriptions. With this information, the WG developed a 50-page document covering the simple descriptions, the beginning of a conceptual analysis and a list of typical cases such as extract metadata from DICOM, FITS, netCDF or HDF files.

The WG functioned through RDA 5th Plenary (March 2015), and focused on further analysing, categorising and describing the offered policies. Volunteers reviewed the policies and different groups  implemented some of these policies in environments  such as iRODS and GPFS. The goal was to register prototypical policies with suitable metadata so that people  can  easily  find  what they are looking for and re-use what they found at abstract, declarative or even at code level. At this point, there is still much work to be done to reach a stage where the policies can be easily re-used. An initial template has been developed that describes the constraints that control the policy, the state information needed to evaluate the constraints, the operations that are performed by the policy, and the state information needed to execute the  operations.

What is the impact?

The potential impact is huge. In the ideal case, data managers or data scientists can simply plug-in useful code into their workflow chains to carry out operations at a qualitatively high level. This will improve the quality of all operations on data collections and thus increase trust and simplify  quality  assessments. Large data federation initiatives such as EUDAT(http://eudat.eu) and the DATANET Federation Consortium (US) (http://datafed.org ) are very active in this group, since they also expect to share code development and maintenance, thus saving considerable effort by re-using tested software components. Research Infrastructure experts that need to maintain community repositories can simply re- use best practice suggestions, thus avoiding ending up in traps. In particular, when these best practice suggestions for practical policies are combined with proper data organisations, as suggested by the Data Foundation and Terminology Working Group, powerful mechanisms will be in place to simplify the data landscape and make federating data much more cost-effective.

When can this be used?

The document mentioned above already provides a valuable resource to get inspiration and perhaps make use of suggested policies, therefore  improving   people?s own ideas or to even profit from developed code.

Once evaluated, properly categorised and described, the next step ahead will be registering practical policies in suitable registries, so that data professionals can easily re-use them, if possible even at code level. The group intends to progress to this step for a number of policy areas, making use of  the policy registry developed by EUDAT.

Policies are expected to form an essential component of the Data Fabric Interest Group outcomes. Federation of existing data repositories depends upon the ability to characterize assertions about each participating collection, and enforce the assertions across the participating repositories.

Example assertions include:

  • Presence of required descriptive metadata
  • Presence of required derived data products (typically alternate data formats)
  • Guarantees on integrity
  • Guarantees on data provenance
  • Logical arrangements that span repositories (virtual collections)
  • Guarantees on access controls.

Policies provide a way to quantify the management steps needed to enforce an assertion, share the management step with other repositories, and automate enforcement. The Data Fabric Interest Group can promote the policies needed to manage repository federations.

Within the DataNet Federation Consortium, a 'Policy Workbook' is being created that extends the policy set defined in the PP Working Group. The 'Policy Workbook' will be published through the iRODS Consortium.

For more details on the PP WG, see https://www.rd-alliance.org/group/pr...policy-wg.html

Practical Policy Working Group Figure 1.png


The diagram indicates the final goal of the PP WG. A policy inventory will be made available with best practice examples. Data managers will have the ability to select and implement the procedures most relevant to them.

Scalable Dynamic Data Citation Working Group

Co-Chairs:
Andreas Rauber, Vienna University of Technology Dieter Van Uytvanck, CLARIN

Ari Asmi, University of Helsinki

Stefan Pröll, SBA Research (Secretary)

What is the problem?

Digitally driven research is dependent on quickly evolving technology. As a result, many existing tools and collections of data were not developed with a focus on long term sustainability. Researchers strive for fast results and promotion of those results, but without a consistent and long term record of the validation of their data, evaluation and verification of research experiments and business processes is not possible.

To verify research results, repeat studies, or perform meta-studies reusing data, the data used needs to be precisely identified. This, however, is complicated by two challenges: (1)  Especially  in  big data settings, researchers rarely use an  entire  dataset. Instead, they select specific subsets /views of the entire dataset based on their individual requirements, such as  a  specific  time-range, a set of measurements, etc. (2) Data is not static: new data are often added to datasets, and erroneous values are often corrected or deleted from datasets. This makes it difficult to identify precisely which data (or which version of the dataset) was cited, over time. Thus, there is a strong need for data identification and citation mechanisms that identify arbitrary subsets of large data sets with precision in a machine-actionable way. These  mechanisms  need to be user-friendly, transparent, machine-actionable, scalable and applicable to various static and dynamic data types.

What are the goals?

The aim of the Dynamic Data Citation Working Group was to devise a simple, scalable mechanism that allows the precise, machine- actionable identification of arbitrary sub selections of data at  a given point in time irrespective of any subsequent addition, deletion or modification. The principles must be applicable regardless  of the underlying database management system (DMBS), working across technological changes. It shall enable efficient resolution of the identified data, allowing it to be used in both human-readable citations as well as machine-processable linking to data as part of analysis processes.

What is the solution?

The approach recommended by the Working Group relies on dynamic resolution of a data citation via a time-stamped query also known as dynamic data citation. It is based on time-stamped and versioned source data and time-stamped queries utilized for retrieving the desired dataset at the specific time in the appropriate version.

The solution comprises of the following core recommendations:

  • Data Versioning: For retrieving earlier states of datasets the data needs to be versioned. Markers shall indicate inserts, updates and deletes of data in the database.
  • Data Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp.
  • Data Identification: The data used shall be identified via a PID pointing to a time-stamped query, resolving to a landing page.

Although the exact technical implementation depends on existing local data structures and procedures, evaluations of numerous pilot projects  involving  various  data  types (SQL, CSV,  XML) indicate the applicability and versatility of this solution.

The WG recently created the RDA recommendations for data citation, which is available as a draft on the RDA Website. The document provides 13 recommendations providing guidance from preparing the data store via the persistent identification of datasets, the retrieval of a dataset until the long term perspective for identifiable datasets.

Scalable Dynamic Data Citation Working Group Figure 1.png

What is the impact?

The main impact of this solution is to provide a mechanism supporting reproducibility of scientific research by allowing for a data source to be dynamically updated when information is added, updated or deleted, while still enabling for the reproduction of any previous or intermediate version of the data. The approach detailed above has several advantages over current practices, which mainly utilize redundant data deposits or ambiguous natural language textual descriptions.

First, the query/expression identifying the dataset provides valuable provenance information on the way the specific dataset was constructed, as opposed to merely having a data dump.

Secondly, the recommended solution allows users to re-execute the query with the original time stamp and retrieve the original data, or to obtain the current version of the data with all additions and corrections by executing it against the current version of the data repository. This allows them to compare the resulting  differences.

Thirdly, it is generally applicable across different types of data representation and data characteristics (big or small data; static or highly dynamic; identifying single values or the entire data set).

As data migrates to new representations, the queries can also be migrated, ensuring stability across changing technologies.

By promoting a consistent approach, decision making and scientific research based on data will become more transparent and reproducible.

When can this be used?

As demonstrated by first successful pilots, this approach can be applied right now. The recommendations are available for comments and can be used as an implementation guideline.

For more information on the solutions detailed above or to learn more about the Dynamic Data Citations Working Group, please visit https://rd-alliance.org/groups/data-citation-wg.html

Data Description Registry Interoperability Working Group

Co-Chairs:
Amir Aryani, Australian National Data Service
Adrian Burton,  Australian National Data Service

What is the problem?

In recent years there has been a significant growth of research data repositories and registries; however, these infrastructures are fragmented across institutions, countries and research domains. As such, finding research datasetsis not a trivial task for many researchers.

What are the goals?

Data Description Registry Interoperability WG is working on a series of bi-lateral information exchange projects and an open, extensible, and flexible cross-platform research data discovery software solutions.

Where research data registries and repositories provide machine-to- machine readable interfaces, the issue of wider discovery is often addressed either by metadata aggregation or federated search. However, the main problem is providing scientists search results for datasets that are actually relevant to their research. Such relevance depends on research context, and as a result enabling cross-platform discovery includes providing a connected graph of researchers, research activities (projects and grants), research datasets, publications and other research outcomes and research concepts.

This working group does not aim for a monolithic solution, avoiding a one uber-portal to rule them all. Rather it compiles simple enabling infrastructures based on existing open protocols and standards with a flexible and extensible approach that allows registries to opt-in and enables any third-party to create particular global views of research data.

Who is involved in this working group?

The outcome and the deliverables of this working group will be the result of the direct contribution of the following major institutions in Australia, US and Europe: Australian National Data Service (ANDS), CERN, DANS, DataCite, DataPASS, da-ra, Dryad, Thomson Reuters DCI, VIVO Cornell.

What is RD-Switchboard?

Research Data Switchboard is a collaborative project by the members of the DDRI WG. This project leverages DataCite DOI, ORCID and other persistent identifiers, and uses simple but effective research graph technology to link datasets.  This  system  currently links datasets across the following platforms: Dryad, INSPIREHEP (at CERN), ORCID, Figshare and link Australian research datasets through Research Data Australia  -  supported by ANDS.

For example, this platform enables connecting this dataset by Associate Professor Katherine Belov: Wong ESW, Nichol S, Warren WC, Belov K (2013) Data from: Echidna venom gland transcriptome provides insights into the evolution of monotreme venom. Dryad Digital Repository  http://dx.doi.org/10.5061/dryad.4qq0v to  her other data collections in Research Data Australia:

Data Description Registry Interoperability Working Group Figure 1.png

Data Description Registry Interoperability Working Group Figure 2.png

The figure above shows the functions of the three layers of Research Data Switchboard:

Provider Layer: This layer enables data providers to import metadata records into the platform using OAI-PMH or RESTful services.

Graph Creation Layer: This layer aggregates information, and uses Google API and other services to identify missing connections.

API Consumer Layer: This layer enables e-Infrastructure providers and university librarians to find connections across research data registries.

When can be this be used?

The work on Research Data Switchboard will continue in the scope of the Data Description Registry Interoperability Working Group. The upcoming RDA Plenaries will provide momentum and opportunity for new partners to join and work toward a sustainable and innovative interoperability platform.

Metadata Standards Directory Working Group

Co-Chairs:

Alex Ball, UKOLNInformatics

Jane Greenberg, Metadata ResearchCenter

eithJeffery,KeithGJefferyConsultants

ebecca Koskela,DataONE

What is the problem?

When working with research datasets, a common challenge is the information within them is often difficult to identify, contextualize, interpret and use due to the inconsistent approaches in applying related metadata, or metadata schemes. To fully understand the content within datasets, researchers need metadata that clearly describes, explains, and associates the dataset with various other entities.

However, metadata needs vary depending on the data type and the application. This results in the use of numerous metadata schemes and lack of interoperability 1. With the continued use of custom metadata schemes, and the development of rival, incompatible standards, there are now even more barriers to interoperation 2.

This challenge can be overcome through the  implementation  of one set of metadata standards, which would involve the application of the same metadata, and hence data, in multiple contexts and systems.

A collaborative, open directory of metadata standards applicable to scientific data can help address these infrastructural challenges, by allowing researchers to:

  • Learn about the various metadata standards applicable to their research;
  • Learn about controlled vocabularies used by their community;Understand the elements that comprise these standards and  vocabularies; and
  • Map between elements when combining data from different sources.

These standards can only be successful if they are user-friendly, well promoted and widely adopted in target communities.

What are the goals?

The goals of this group are three-fold:

1. Set up a sustainable, community-driven RDA Metadata Standards Directory, designed for users rather than automated tools, which provides brief details for common research data.

2. Compile a set of use cases that analyze and document  the various ways in which metadata can be used (e.g. for discovery, exchange, re-use, etc.).

3. Lay the foundation for a future RDA Working Group to develop a machine-understandable catalogue of metadata standards.

What is the solution?

The United Kingdom Digital Curation Centre (DCC) launched a Disciplinary Metadata Standards Catalogue (http://www.dcc.ac.uk/resources/metadata-standards) just before this Working Group started its activity. The DCC's catalogue was adopted, enriched, and expanded by the Working  Group.

The Working Group developed a functional prototype directory (http://rd-alliance.github.io/metadata-directory/), based around the GitHub infrastructure, that places the information from the DCC directory into an environment where it can be maintained transparently and with full version control.

Metadata use cases were also collected from Working Group members using a standard template and ultimately included in the set of use cases compiled by the RDA Metadata Interest Group.

What is the impact?

The RDA Metadata Standards Directory has many benefits for the community:

  • By guiding researchers towards the metadata standards and tools relevant to their discipline, the directory drives up adoption of those standards, improving the chances of future researchers finding, accessing, and reusing the associated data.
  • By raising awareness of existing standards, the directory reduces the proliferation of ad hoc metadata formats and helps direct future standards development efforts towards those areas that most need it.
  • If a topical standard is not available, the directory allows researchers to look beyond their subject boundaries for standards that are a close fit for their work.
  • By raising awareness of standards among tool developers, the directory can help improve technical support for those standards

The human-readable directory is also the first step towards a machine-understandable catalogue, which would have a significant impact on the ability of researchers and service providers to migrate metadata automatically between systems. Through this automation, services would be allowed to bring together specific data based on smart metadata selection, thereby breaking down barriers in research and opening up new possibilities for startup companies and entrepreneurs.

When can this be used?

The DCC directory has been available for use since May  2012. RDA's prototype directory is fully functional, open to the community, and actively monitored so that contributions are fed back to the DCC version and vice versa.

For more information on the usage of this metadata standards directory, please consult the online documentation

(http://rd-alliance.github.io/metadata-directory/) on GitHub or a recent article on this work 3.

References

1 Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Frame, M. (2011). Data sharing by scientists: Practices and perceptions. PLoS ONE, 6(6), e21101. doi:10.1371/journal.pone.0021101

2 Willis, C.,  Greenberg,  J.,  &  White,  H.  (2012).  Analysis  and  synthesis of metadata goals for scientific data. Journal of the American Society for Information Science and Technology, 63(8), 1505?1520. doi:10.1002/ asi.22683

3 Ball, A., Chen, S., Greenberg, J., Perez, C., Jeffery, K., & Koskela, R. (2014). Building a Disciplinary Metadata Standards Directory. International Journal of Digital Curation 9(1), 142?151. doi:10.2218/ijdc.v9i1.308

Wheat Data Interoperability Working Group

Co-Chairs:
Esther Dzale Yeumo Kabore, INRA

Richard Fulss, CIMMYT

What is the problem?

The Wheat Data Interoperability Working Group (WDIWG) is working within the global context of a large  societal challenge, due  in  part to the following:

  • Wheat is the most widely grown crop in the world
  • Wheat provides 20% of the world?s daily protein and calories
  • Wheat is the second most important crop in the developing world after rice
  • Wheat production has not satisfied demand in recent years
  • It is expected that by 2050 the demand for wheat will increase by 60% To respond to these facts ? and to produce an adequate amount of wheat ? the yield increase must go from 1% a year to 1.6% a year.

In order to tackle this issue, many organizations and initiatives are doing research in experimental and farmers? fields, as well as in laboratories, ultimately generating a large quantity of heterogeneous data that are stored in different systems/platforms/repositories. The WDIWG considers data standards harmonization a priority in promoting interoperable wheat data.

Wheat Data Interoperability Working Group Figure 1.png

What are the goals?

Interoperability of all wheat-related data

The goals of the WDIWG are to make wheat data interoperable by agreeing on a common set  of:

  • Metadata standards
  • Data formats
  • Vocabularies
  • Guidelines for describing, representing, and linking data

Furthermore, the group aims to produce tools that encourage the adoption of the recommendations and  guidelines. Note that the group did not start from zero, the community has a large amount of assets which are used as a basis. The requirements for the work are based on the real needs of the wheat  community.

Wheat Data Interoperability Working Group Figure 2.png

What is the solution?

The needs of the wheat community are addressed in three ways:

  • By building an interactive cookbook with recommendations and guidelines on data formats and standards to use,
  • By identifying wheat-related vocabularies and ontologies and including them in a single human and machine readable portal,
  • By building a prototype based on real use cases that leverage the recommendations in order to assess the gain of   interoperability.

Wheat Data Interoperability Working Group Figure 3.png

What is the impact?

The impact of this work is the immediate and ongoing improvement of discovery, reusability, and interoperability of data within the wheat community.

Going forward, the standardization and harmonization of wheat data will reduce variability and increase the relevance of wheat data related tools.

The outputs of this group have been adopted by the WheatIS (http://www.wheatis.org) which is an effort to build an international Wheat Information System. My Note: I did not find wheat data here to download.

My Audit Trail:

Charter: https://www.rd-alliance.org/groups/w...bility-wg.html (Word)

http://www.wheatis.org/
http://ist.blogs.inra.fr/wdi/
https://rd-alliance.org/groups/agric...roup-igad.html
https://rd-alliance.org/ig-agricultu...session.html-0
https://rd-alliance.org/joint-sessio...erability.html
http://www.slideshare.net/CIARD_/rda...t-developments
 

Slides: See Slide 11 Below

Next steps

  • Metadata (harmonization, minimal metadata sets)
  • Mappings
  • Next workshop (summer 2015)
    • Review and complete the recommendations
    • Refine and complete the guidelines and the best practices
  • Finalize the repository of Wheat related vocabularies
  • Prototyping: a semantic knowledge base My Note: I really want to see this!
    • Integrate data from different data sources
    • Provide smart search capabilities that leverage the vocabularies used against the metadata.

Google: Global Wheat Data

First Hit: http://www.ers.usda.gov/data-products/wheat-data.aspx

My Note: So is this the most authoritative (and harmonized) data set? This data was easily to find, download and use because I have just completed a USDA Data Science MOOC.

Documentation: http://www.ers.usda.gov/data-product...mentation.aspx

My Note: This says its Sources are: Most of the data are from USDA's National Agricultural Statistics Service, World Agricultural Outlook Board, Agricultural Marketing Service, Farm Service Agency, and Foreign Agricultural Service. Other data are from the U.S. Department of Commerce, U.S. Census Bureau. Some data are calculated by ERS.

What am I missing here about what the Wheat Data Interoperability Working Group is trying to accomplish? Is it in the Charter Document below?

Working group charter: Wheat data interoperability

Scope

  • International context:
    • Wheat initiative: www.wheatinitiative.org The Wheat Initiative aims to reinforce synergies between bread and durum wheat national and international research programmes to increase food security, nutritional value and safety while taking into account societal demands for sustainable and resilient agricultural production systems. Main goals :
      • coordinate worldwide research efforts in the fields of wheat genetics, genomics, physiology, breeding and agronomy.
      • provide a forum to facilitate communication between research groups and organisations worldwide.
      • foster communication between the research community, funders and global policy makers at the international level to meet their research and development goals.
      • facilitate and ensure the rapid exchange of information and know-how among researchers, and support knowledge transfer to breeders and farmers
    • G8+5 open data for agriculture: At the 2012 G-8 Summit, G-8 leaders committed to the New Alliance for Food Security and Nutrition, the next phase of a shared commitment to achieving global food security.
      • As part of this commitment, they agreed to 'share relevant agricultural data available from G-8 countries with African partners and convene an international conference on Open Data for Agriculture, to develop options for the establishment of a global platform to make reliable agricultural and related information available to African farmers, researchers and policymakers, taking into account existing agricultural data systems.'
  • Charter: Interoperability is a wide concept that encompasses the ability of organisations to work together towards mutually beneficial and commonly agreed goals. The Working group is using the following definition from the EIF:  'An interoperability framework is an agreed approach to interoperability for organisations that wish to work together towards the joint delivery of public services. Within its scope of applicability, it specifies a set of common elements such as vocabulary, concepts, principles, policies, guidelines, recommendations, standards, specifications and practices.'
    • The working group aims to provide a common framework for describing, representing linking and publishing Wheat data with respect to open standards. Such a framework will promote and sustain Wheat data sharing, reusability and operability. Specifying the Wheat linked data framework will come with many questions: which (minimal) metadata to describe which type of data? Which vocabularies/ontologies/formats? Which good practices?  
    • Mainly based on the the needs of the Wheat initiatiative Information System (WheatIS) in terms of functionalities and data types, the working group will identify relevant use cases in order to produce a  'cookbook' on how to produce ?wheat data? that are easily shareable, reusable and interoperable.To do so, the working group will :
      • Run a survey of existing standards and recommendations (vocabularies, ontologies, formats): this survey will identify which standards are adopted in the Wheat data managers community, which ones are missing and which ones can stand as references.
      • Coordinate data exchanges : identify the main Wheat data types, end-user categories, case studies and provide standards harmonization, guidelines to describe, document, structure and interlink data taking into account the diversity of data types.
      • Evaluate the interest of linked data technologies to improve usage and access to the information.
      • Identify relevant platforms to support the Wheat linked data framework.
    • Based on a survey report performed in June 2012, the Working group will focus on the following data types, by order of priority: SNP, Genomic annotations, Phenotypes, Genetic Maps, Physical Maps, Germplasm. Implementing the framework will help cultivate a Wheat  ecosystem with people familiar with interoperability, organisations ready to collaborate, and common tools and services.

[1] http://www.wheatinitiative.org/sites/default/files/docs/wheat-info-system-report.pdf

Value proposition

  • Individuals, communities, and initiatives that will benefit from the Wheat Data Interoperability Guidelines
    • The WheatIS will be provided with a linked data framework based on community-accepted standards, which ensure data analysis and data integration facilities. Such a framework is a great asset for the WheatIS to provide the analysis functions and other services expected by the researchers.
    • The Wheat data managers and data scientists will have a common and global framework to describe, document, and structure their data.
    • Researchers, growers, breeders, and other data users will have seamless access, use, and reuse to a wide range of Wheat data. Data linking will also ease emergence of new data analyses and knowledge discovery methodologies.
    • Other plants data managers and scientists will have the benefit of a reusable data framework. 
    • Researchers working on other plants will be able to more easily access, reuse and link up Wheat data with their own data.

The ?cookbook? might be adapted for other crops such as RICE, MAIZE which are also very important for food security

  • Key impacts of the RDA Wheat Data Interoperability Guidelines
    • Promote adoption of common standards, vocabularies and best practices for Wheat data management
    • Facilitate access, discovery and reuse of Wheat data
    • Facilitate Wheat data integration

Engagement with existing work in the area

The Wheat data interoperability WG is a working group of the RDA Agricultural data interest group.
The working group will take advantage of other RDA?s working group?s production. In particular, the working group will be watchful of working groups concerned with metadata, data harmonization and data publishing.

The working group will also interact with the WheatIS experts and other plant projects such as TransPLANT (http://urgi.versailles.inra.fr/Projects/TransPLANT), agINFRA (http://www.aginfra.eu) which are built on standard technologies for data exchange and representation.

The Wheat data interoperability group will exploit existing collaboration mechanisms like CIARD (http://www.ciard.net) to get as much as possible stakeholder involvement in the work.

Work plan

  • Form and description of final deliverables
    • A report on the survey of existing standards
    • A Wheat linked data framework specification (cookbook)
    • Library of vocabularies/ontologies
    • Decision tree for describing/representing data based on
  • data and metadata description recommendations
  • file formats recommendations 
  • Months/Deliverables/Milestones
    • Month 1 to 6: Survey of existing standards and recommendations (vocabularies, ontologies, formats). Plus identification of end-user categories and relevant platforms.
    • Month 6 to 10: First version of the Wheat linked data framework specification (cookbook)
    • Month 7 to 9: Identification of end-user categories and relevant platforms
    • Month 10 to 15: Evaluation of the Wheat linked data framework (WheatIS)
    • Month 16 to 18: Final version of the Wheat linked data framework specification (cookbook v1)
    • Month 10 to 18: Promotion.
    • Milestone 1: First version of the Wheat linked data framework specification
    • Milestone 2: Final version of the Wheat linked data framework specification

Adoption plan

The working group can rely on its initial members to promote a large adoption of the data framework. Indeed:

  • INRA is a leading partner of the Wheat initiative and an active member of the working group. One of the Wheat initiative objectives is to build an international and  integrated Wheat information system (WheatIS) intended to an international Wheat community (researchers, growers, breeders, etc.). The WheatIS could operate as a hub and integrate wheat data produced by the community. The working group will base a large part of its specification requirements on the Wheat initiative?s data exchange needs, and the WheatIS experts will be part of the working group stakeholders. The Wheat linked data framework will be tested first through the WheatIS. The working group can rely both on the Wheat initiative members (public research organizations and private companies)  and its community to ensure a large circulation of the Wheat linked data framework and facilitate its adoption.
  • Achieving food security for all is at the heart of FAO's efforts to make sure people have regular access to enough high-quality food to lead active, healthy lives. FAO?s mandate is to improve nutrition, increase agricultural productivity, raise the standard of living in rural populations and contribute to global economic growth. One of FAO?s principal means to achieve this goal is collecting, disseminating and brokering knowledge.  Therefore FAO has a prime interest on excellent information systems. The  'AIMS team' (Agricultural Information Management Standards)  in FAO is engaged in helping to set standards and methodologies for easier sharing and exchange of Agricultural information
  • CIMMYT

Initial membership 

  • Johannes KEIZER, FAO
  • Devika MADALLI, ISI
  • Odile HOLOGNE, INRA
  • Esther DZALE YEUMO KABORE, INRA
  • Nikos MANOUSELIS, Agro-Know Technologies
  • Michael Alaux, INRA
  • Cyril Pommier, INRA
  • Sophie Aubin, INRA
  • Richard Fulss (CIMMYT)
  • Helmuth Knuepffer  (Genbankdokumentation Gatersleben)
  • wheat initiatives partners (http://www.wheatinitiative.org/about/members
  • Achieving consensus, addressing conflicts, and staying on task and within scope
    • Consensus will be reached via open discussion, voting, and majority considerations informed by evidence where possible.
    • Conflict will first be addressed by WG leaders.  An escalation procedure will be drafted, for example the RDA Council will be consulted, and an independent person not in the WG will be brought in to mediate the conflict.
    • Staying on task and within scope:  we have considerable experience in projects and standards development. The key mechanism for reaching consensus will be through examining evidence and identifying limitations of applicability of competing ideas. In addition, of course, we will agree on a detailed schedule and track action items.
  • Operation parameters
    • The work is voluntary, and not every WG member will be able to contribute equally therefore we will aim to fit the work to focus efforts on members? specific interests but also to ensure that all members can contribute to internal reviews. The WG hold internal assessments every 6 months to ensure we are on track. 
  • WG Assessment
    • The 6 monthly assessments will involve work group members and also external reviewers who have expertise in this area, including those who declined membership of the working group because of pressure of other work.
  • Broader community engagement and participation

References

Wheat initiative Information System:
http://www.wheatinitiative.org/research/wis
http://www.wheatinitiative.org/sites...tem-report.pdf

GARNet report - Making data accessible to all:
http://www.garnetcommunity.org.uk/si...ort%202012.pdf

Various relevant refs:

http://www.wheatbp.net
http://wheat.pw.usda.gov

When can this be used?

The guidelines produced by the group, as well as the bioportal of wheat-related linked vocabularies, are directly usable now.

Following the guidelines and linking into existing vocabularies will give wheat related data a larger relevance and impact going forward.

For more information on WDIWG visit: https://www.rd-alliance.org/groups/w...bility-wg.html. My Note: I did not find anything in the vocabularies and ontologies folder.

See also http://ist.blogs.inra.fr/wdi/recomme...or-phenotypes/ for direct links to clear recommendations.

Get involved

RDA Vision: Researchers and innovators openly sharing data across technologies, disciplines, and countries to address the  grand challenges of society.

RDA Mission: The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data.

RDA Guiding Principles

  • Openness - Membership is open to all interested individuals who subscribe to the RDA?s Guiding Principles. RDA community meetings and processes are open, and the deliverables of RDA Working Groups will be publicly disseminated.
  • Consensus - The RDA moves forward by achieving consensus among its membership. RDA processes and procedures include appropriate mechanisms to resolve conflicts.
  • Balance - The RDA seeks to promote balanced representation of its membership and stakeholder communities.
  • Harmonization - The RDA works to achieve harmonization across data standards, policies, technologies, infrastructure, and communities.
  • Community-driven - The RDA is a public, community-driven body constituted of volunteer members and organizations, supported by the RDA Secretariat.
  • Non-profit -  RDA  does not  promote, endorse, or sell commercial products, technologies, or services.

How to play a part in the RDA Process

There are severals (My Note: Misspelled Word) way in which you can play a part in RDA:

RDASixthPlenary.png

Back Inside Cover Page

BackInsideCoverPage.gif

Back Outside Cover Page

BackOutsideCoverPage.gif

https://www.rd-alliance.org

Contact: enquiries@rd-alliance.org

Photography by Inge Angevaare, Johnny Babmbury Designed and produced by RDA Europe (May 2015).

NEXT

Page statistics
922 view(s) and 74 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments