Table of contents
  1. Big Data Senior Steering Group Meeting
  2. Government Challenges With Big Data: A Semantic Web Strategy for Big Data
    1. Title Slide
    2. Outline
    3. Data Science Team
    4. Semantic Community: Mission Statement for 2013
    5. Why We Are Here
    6. NIST Cloud Computing AND Big Data Forum and Workshop
    7. Spotfire for Big Data Analytics: Microscope
    8. Data Science Analytics Library: Telescope & Library
    9. From the Year of Big Data to the Year of the Data Scientist Working With Big Data
    10. Cross-Walk Table (in progress)
    11. The Practice of Data Science
    12. Current US Government Semantic Web Strategy
    13. Comment From Owen Ambur
    14. International Linked Open Data Strategy: Linked Open Data Cloud Data
    15. International Linked Open Data: Comments to David Wood
    16. International Linked Open Data: My EPA Green App Data App Example
    17. Our Semantic Web Strategy for Big Data: Previous Presentations
    18. Our Semantic Web Strategy for Data: Simple Explanation
    19. Our Semantic Web Strategy for Data: NASA Big Data Example
    20. Our Semantic Web Strategy for Data: Spotfire Network Analytics
    21. My 5-Step Method
    22. Get to 5-Stars With Open Data
    23. System of Systems Architecture
    24. Data Federation in Spotfire: In-Memory and In-Database Data
    25. Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library
    26. Data Federation in Spotfire: Data Panel
    27. Data Federation in Spotfire: Information Designer
    28. 15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
    29. Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
    30. Q & A
  3. Story
  4. Spotfire Dashboard
  5. Slides
    1. Slides 1
    2. Slides 2
    3. Slides 3
    4. Slides 4
    5. Slides 5
    6. Slide 6
  6. Upcoming
  7. Previous
  8. Research Notes
  9. Summary
    1. Cross-Walk Table
    2. ELC Track Three: Big Data Bold Horizons
    3. Big Data At the Hill
    4. Big Data Case Studies High Level Summary
    5. Demystifying Big Data — A Practical Guide to Transforming the Business of Government
    6. NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
    7. Big Data Exchange Meeting, February 26, 2013
  10. The Big Data Challenge
    1. Big Data Gap
    2. Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data
    3. The Big Data Gap: The 2012 NetApp Study – Media Results
      1. Coverage to Date
      2. Coverage to Date – Press Release Pick Ups
    4. The Big Data Gap: Report
      1. Title Slide
      2. Introduction
      3. Executive Summary
      4. Big Data = Better Government
      5. Not There Yet
      6. Data Disconnect
      7. Data Deluge
      8. Data on the Loose
      9. Management Hurdles
      10. Vision –Reality: The Big Data Gap
      11. Unmanageable Data
      12. Data On Demand
      13. Driving Data Management Forward
      14. Recommendations
      15. Methodology and Demographics
      16. Thank You
  11. Welcome to the NITRD Big Data Challenge Series!
    1. NITRD Review Board
    2. About the Contests
    3. Big Data Challenge - Conceptualization - Idea Generation
      1. Contest Overview
      2. Technologies
      3. Final Submission Guidelines
      4. Eligibility
      5. Results
    4. The NITRD Big Data Challenge Review Board
  12. Big Data Buzzwords From A to Z
    1. Introduction
    2. Data Warehousing
    3. ETL
    4. Flume
    5. Geospatial Analysis
    6. Hadoop
    7. In-Memory Database
    8. Java
    9. Kafka
    10. Latency
    11. Map/reduce
    12. NoSQL Databases
    13. Oozie
    14. Pig
    15. Quantitative Data Analysis
    16. Relational Database
    17. Sharding
    18. Text Analytics
    19. Unstructured Data
    20. Visualization
    21. Whirr
    22. XML
    23. Yottabyte
    24. ZooKeeper
  13. Chronology For Federal Government
    1. Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
    2. Big Data is a Big Deal
    3. Big Data Across the Federal Government Fact Sheet
      1. DEPARTMENT OF DEFENSE (DOD)
        1. Defense Advanced Research Projects Agency (DARPA)
      2. DEPARTMENT OF HOMELAND SECURITY (DHS)
      3. DEPARTMENT OF ENERGY (DOE)
        1. The Office of Science
        2. The Office of Basic Energy Sciences (BES)
        3. The Office of Fusion Energy Sciences (FES)
        4. The Office of High Energy Physics (HEP)
        5. The Office of Nuclear Physics (NP)
        6. The Office of Scientific and Technical Information (OSTI)
      4. DEPARTMENT OF VETERANS ADMINISTRATION (VA)
      5. DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
        1. Center for Disease Control & Prevention (CDC)
        2. Center for Medicare & Medicaid Services (CMS)
        3. Food & Drug Administration (FDA)
      6. NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
      7. NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
      8. NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
      9. NATIONAL INSTITUTES OF HEALTH (NIH)
        1. National Cancer Institute (NCI)
        2. National Heart Lung and Blood Institute (NHLBI)
        3. National Institute of Biomedical Imaging and Bioengineering (NIBIB)
        4. NIH Blueprint
        5. NIH Common Fund
        6. National Institute of General Medical Sciences:
        7. National Library of Medicine
        8. Office of Behavioral and Social Sciences (OBSSR)
        9. Joint NIH - NSF Programs
      10. NATIONAL SCIENCE FOUNDATION (NSF)
      11. NATIONAL SECURITY AGENCY (NSA)
      12. UNITED STATES GEOLOGICAL SURVEY (USGS)
    4. NSF Leads Federal Efforts In Big Data
    5. Core Techniques and Technologies for Advancing Big Data Science & Engineering  (BIGDATA)
    6. Frequently Asked Questions
      1. Is my proposal a good fit for the Big Data solicitation?
      2. Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?
      3. Should I submit a "mid scale" or a "small" proposal?
      4. How do I submit a proposal to this program?
      5. Do I need to use Grants.gov or Fastlane to apply?
      6. Is my project likely to get funded?
      7. Can I obtain a postdoctoral fellowship through the BIGDATA program?
      8. Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?
      9. Can for-profit entities apply for funding through this solicitation?
      10. What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?
      11. Can a foreign organization submit a proposal?
      12. How do I know if my request for funding is relevant to NIH?
      13. Are duplicate submissions allowed?
      14. Will there be future BIGDATA solicitations?
    7. Event 2nd BIGDATA Webinar
    8. What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
    9. Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
      1. Title Slide
      2. Big Data Research and Development Initiative
      3. The Big Data Team
      4. Outline
      5. Data Deluge 1
      6. Dealing with Data
      7. Data Deluge 2
      8. Opportunities
      9. Dealing with Data
      10. Examples of Research Challenges
      11. BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision
      12. BIGDATA Solicitation in Context
      13. BIGDATA Solicitation
      14. Data management, collection and storage (DCM)
      15. Data Analytics (DA)
      16. E-science collaboration environments (ESCE)
      17. NIH BIGDATA Priorities
      18. National Priorities
      19. What proposals are not good fits for the BIGDATA Solicitation? 1
      20. What proposals are not good fits for the BIGDATA Solicitation? 2
      21. Proposal Submission and Review
      22. Review Criterion: Intellectual Merit 1
      23. Review Criterion: Intellectual Merit 2
      24. Review Criterion – Capacity Building (CB)
      25. Evaluation Plan
      26. Data Sharing Plan
      27. Software Sharing Plan
      28. Mid-scale proposals: Coordination plan (CP)
      29. Proposal Types and Deadlines
      30. How many awards are anticipated?
      31. How does one apply?
      32. Questions and Answers
      33. Credits
  14. NEXT

Government Challenges With Big Data

Last modified
Table of contents
  1. Big Data Senior Steering Group Meeting
  2. Government Challenges With Big Data: A Semantic Web Strategy for Big Data
    1. Title Slide
    2. Outline
    3. Data Science Team
    4. Semantic Community: Mission Statement for 2013
    5. Why We Are Here
    6. NIST Cloud Computing AND Big Data Forum and Workshop
    7. Spotfire for Big Data Analytics: Microscope
    8. Data Science Analytics Library: Telescope & Library
    9. From the Year of Big Data to the Year of the Data Scientist Working With Big Data
    10. Cross-Walk Table (in progress)
    11. The Practice of Data Science
    12. Current US Government Semantic Web Strategy
    13. Comment From Owen Ambur
    14. International Linked Open Data Strategy: Linked Open Data Cloud Data
    15. International Linked Open Data: Comments to David Wood
    16. International Linked Open Data: My EPA Green App Data App Example
    17. Our Semantic Web Strategy for Big Data: Previous Presentations
    18. Our Semantic Web Strategy for Data: Simple Explanation
    19. Our Semantic Web Strategy for Data: NASA Big Data Example
    20. Our Semantic Web Strategy for Data: Spotfire Network Analytics
    21. My 5-Step Method
    22. Get to 5-Stars With Open Data
    23. System of Systems Architecture
    24. Data Federation in Spotfire: In-Memory and In-Database Data
    25. Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library
    26. Data Federation in Spotfire: Data Panel
    27. Data Federation in Spotfire: Information Designer
    28. 15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
    29. Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
    30. Q & A
  3. Story
  4. Spotfire Dashboard
  5. Slides
    1. Slides 1
    2. Slides 2
    3. Slides 3
    4. Slides 4
    5. Slides 5
    6. Slide 6
  6. Upcoming
  7. Previous
  8. Research Notes
  9. Summary
    1. Cross-Walk Table
    2. ELC Track Three: Big Data Bold Horizons
    3. Big Data At the Hill
    4. Big Data Case Studies High Level Summary
    5. Demystifying Big Data — A Practical Guide to Transforming the Business of Government
    6. NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
    7. Big Data Exchange Meeting, February 26, 2013
  10. The Big Data Challenge
    1. Big Data Gap
    2. Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data
    3. The Big Data Gap: The 2012 NetApp Study – Media Results
      1. Coverage to Date
      2. Coverage to Date – Press Release Pick Ups
    4. The Big Data Gap: Report
      1. Title Slide
      2. Introduction
      3. Executive Summary
      4. Big Data = Better Government
      5. Not There Yet
      6. Data Disconnect
      7. Data Deluge
      8. Data on the Loose
      9. Management Hurdles
      10. Vision –Reality: The Big Data Gap
      11. Unmanageable Data
      12. Data On Demand
      13. Driving Data Management Forward
      14. Recommendations
      15. Methodology and Demographics
      16. Thank You
  11. Welcome to the NITRD Big Data Challenge Series!
    1. NITRD Review Board
    2. About the Contests
    3. Big Data Challenge - Conceptualization - Idea Generation
      1. Contest Overview
      2. Technologies
      3. Final Submission Guidelines
      4. Eligibility
      5. Results
    4. The NITRD Big Data Challenge Review Board
  12. Big Data Buzzwords From A to Z
    1. Introduction
    2. Data Warehousing
    3. ETL
    4. Flume
    5. Geospatial Analysis
    6. Hadoop
    7. In-Memory Database
    8. Java
    9. Kafka
    10. Latency
    11. Map/reduce
    12. NoSQL Databases
    13. Oozie
    14. Pig
    15. Quantitative Data Analysis
    16. Relational Database
    17. Sharding
    18. Text Analytics
    19. Unstructured Data
    20. Visualization
    21. Whirr
    22. XML
    23. Yottabyte
    24. ZooKeeper
  13. Chronology For Federal Government
    1. Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
    2. Big Data is a Big Deal
    3. Big Data Across the Federal Government Fact Sheet
      1. DEPARTMENT OF DEFENSE (DOD)
        1. Defense Advanced Research Projects Agency (DARPA)
      2. DEPARTMENT OF HOMELAND SECURITY (DHS)
      3. DEPARTMENT OF ENERGY (DOE)
        1. The Office of Science
        2. The Office of Basic Energy Sciences (BES)
        3. The Office of Fusion Energy Sciences (FES)
        4. The Office of High Energy Physics (HEP)
        5. The Office of Nuclear Physics (NP)
        6. The Office of Scientific and Technical Information (OSTI)
      4. DEPARTMENT OF VETERANS ADMINISTRATION (VA)
      5. DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
        1. Center for Disease Control & Prevention (CDC)
        2. Center for Medicare & Medicaid Services (CMS)
        3. Food & Drug Administration (FDA)
      6. NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
      7. NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
      8. NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
      9. NATIONAL INSTITUTES OF HEALTH (NIH)
        1. National Cancer Institute (NCI)
        2. National Heart Lung and Blood Institute (NHLBI)
        3. National Institute of Biomedical Imaging and Bioengineering (NIBIB)
        4. NIH Blueprint
        5. NIH Common Fund
        6. National Institute of General Medical Sciences:
        7. National Library of Medicine
        8. Office of Behavioral and Social Sciences (OBSSR)
        9. Joint NIH - NSF Programs
      10. NATIONAL SCIENCE FOUNDATION (NSF)
      11. NATIONAL SECURITY AGENCY (NSA)
      12. UNITED STATES GEOLOGICAL SURVEY (USGS)
    4. NSF Leads Federal Efforts In Big Data
    5. Core Techniques and Technologies for Advancing Big Data Science & Engineering  (BIGDATA)
    6. Frequently Asked Questions
      1. Is my proposal a good fit for the Big Data solicitation?
      2. Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?
      3. Should I submit a "mid scale" or a "small" proposal?
      4. How do I submit a proposal to this program?
      5. Do I need to use Grants.gov or Fastlane to apply?
      6. Is my project likely to get funded?
      7. Can I obtain a postdoctoral fellowship through the BIGDATA program?
      8. Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?
      9. Can for-profit entities apply for funding through this solicitation?
      10. What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?
      11. Can a foreign organization submit a proposal?
      12. How do I know if my request for funding is relevant to NIH?
      13. Are duplicate submissions allowed?
      14. Will there be future BIGDATA solicitations?
    7. Event 2nd BIGDATA Webinar
    8. What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
    9. Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
      1. Title Slide
      2. Big Data Research and Development Initiative
      3. The Big Data Team
      4. Outline
      5. Data Deluge 1
      6. Dealing with Data
      7. Data Deluge 2
      8. Opportunities
      9. Dealing with Data
      10. Examples of Research Challenges
      11. BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision
      12. BIGDATA Solicitation in Context
      13. BIGDATA Solicitation
      14. Data management, collection and storage (DCM)
      15. Data Analytics (DA)
      16. E-science collaboration environments (ESCE)
      17. NIH BIGDATA Priorities
      18. National Priorities
      19. What proposals are not good fits for the BIGDATA Solicitation? 1
      20. What proposals are not good fits for the BIGDATA Solicitation? 2
      21. Proposal Submission and Review
      22. Review Criterion: Intellectual Merit 1
      23. Review Criterion: Intellectual Merit 2
      24. Review Criterion – Capacity Building (CB)
      25. Evaluation Plan
      26. Data Sharing Plan
      27. Software Sharing Plan
      28. Mid-scale proposals: Coordination plan (CP)
      29. Proposal Types and Deadlines
      30. How many awards are anticipated?
      31. How does one apply?
      32. Questions and Answers
      33. Credits
  14. NEXT

  1. Big Data Senior Steering Group Meeting
  2. Government Challenges With Big Data: A Semantic Web Strategy for Big Data
    1. Title Slide
    2. Outline
    3. Data Science Team
    4. Semantic Community: Mission Statement for 2013
    5. Why We Are Here
    6. NIST Cloud Computing AND Big Data Forum and Workshop
    7. Spotfire for Big Data Analytics: Microscope
    8. Data Science Analytics Library: Telescope & Library
    9. From the Year of Big Data to the Year of the Data Scientist Working With Big Data
    10. Cross-Walk Table (in progress)
    11. The Practice of Data Science
    12. Current US Government Semantic Web Strategy
    13. Comment From Owen Ambur
    14. International Linked Open Data Strategy: Linked Open Data Cloud Data
    15. International Linked Open Data: Comments to David Wood
    16. International Linked Open Data: My EPA Green App Data App Example
    17. Our Semantic Web Strategy for Big Data: Previous Presentations
    18. Our Semantic Web Strategy for Data: Simple Explanation
    19. Our Semantic Web Strategy for Data: NASA Big Data Example
    20. Our Semantic Web Strategy for Data: Spotfire Network Analytics
    21. My 5-Step Method
    22. Get to 5-Stars With Open Data
    23. System of Systems Architecture
    24. Data Federation in Spotfire: In-Memory and In-Database Data
    25. Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library
    26. Data Federation in Spotfire: Data Panel
    27. Data Federation in Spotfire: Information Designer
    28. 15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
    29. Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
    30. Q & A
  3. Story
  4. Spotfire Dashboard
  5. Slides
    1. Slides 1
    2. Slides 2
    3. Slides 3
    4. Slides 4
    5. Slides 5
    6. Slide 6
  6. Upcoming
  7. Previous
  8. Research Notes
  9. Summary
    1. Cross-Walk Table
    2. ELC Track Three: Big Data Bold Horizons
    3. Big Data At the Hill
    4. Big Data Case Studies High Level Summary
    5. Demystifying Big Data — A Practical Guide to Transforming the Business of Government
    6. NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
    7. Big Data Exchange Meeting, February 26, 2013
  10. The Big Data Challenge
    1. Big Data Gap
    2. Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data
    3. The Big Data Gap: The 2012 NetApp Study – Media Results
      1. Coverage to Date
      2. Coverage to Date – Press Release Pick Ups
    4. The Big Data Gap: Report
      1. Title Slide
      2. Introduction
      3. Executive Summary
      4. Big Data = Better Government
      5. Not There Yet
      6. Data Disconnect
      7. Data Deluge
      8. Data on the Loose
      9. Management Hurdles
      10. Vision –Reality: The Big Data Gap
      11. Unmanageable Data
      12. Data On Demand
      13. Driving Data Management Forward
      14. Recommendations
      15. Methodology and Demographics
      16. Thank You
  11. Welcome to the NITRD Big Data Challenge Series!
    1. NITRD Review Board
    2. About the Contests
    3. Big Data Challenge - Conceptualization - Idea Generation
      1. Contest Overview
      2. Technologies
      3. Final Submission Guidelines
      4. Eligibility
      5. Results
    4. The NITRD Big Data Challenge Review Board
  12. Big Data Buzzwords From A to Z
    1. Introduction
    2. Data Warehousing
    3. ETL
    4. Flume
    5. Geospatial Analysis
    6. Hadoop
    7. In-Memory Database
    8. Java
    9. Kafka
    10. Latency
    11. Map/reduce
    12. NoSQL Databases
    13. Oozie
    14. Pig
    15. Quantitative Data Analysis
    16. Relational Database
    17. Sharding
    18. Text Analytics
    19. Unstructured Data
    20. Visualization
    21. Whirr
    22. XML
    23. Yottabyte
    24. ZooKeeper
  13. Chronology For Federal Government
    1. Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
    2. Big Data is a Big Deal
    3. Big Data Across the Federal Government Fact Sheet
      1. DEPARTMENT OF DEFENSE (DOD)
        1. Defense Advanced Research Projects Agency (DARPA)
      2. DEPARTMENT OF HOMELAND SECURITY (DHS)
      3. DEPARTMENT OF ENERGY (DOE)
        1. The Office of Science
        2. The Office of Basic Energy Sciences (BES)
        3. The Office of Fusion Energy Sciences (FES)
        4. The Office of High Energy Physics (HEP)
        5. The Office of Nuclear Physics (NP)
        6. The Office of Scientific and Technical Information (OSTI)
      4. DEPARTMENT OF VETERANS ADMINISTRATION (VA)
      5. DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
        1. Center for Disease Control & Prevention (CDC)
        2. Center for Medicare & Medicaid Services (CMS)
        3. Food & Drug Administration (FDA)
      6. NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
      7. NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
      8. NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
      9. NATIONAL INSTITUTES OF HEALTH (NIH)
        1. National Cancer Institute (NCI)
        2. National Heart Lung and Blood Institute (NHLBI)
        3. National Institute of Biomedical Imaging and Bioengineering (NIBIB)
        4. NIH Blueprint
        5. NIH Common Fund
        6. National Institute of General Medical Sciences:
        7. National Library of Medicine
        8. Office of Behavioral and Social Sciences (OBSSR)
        9. Joint NIH - NSF Programs
      10. NATIONAL SCIENCE FOUNDATION (NSF)
      11. NATIONAL SECURITY AGENCY (NSA)
      12. UNITED STATES GEOLOGICAL SURVEY (USGS)
    4. NSF Leads Federal Efforts In Big Data
    5. Core Techniques and Technologies for Advancing Big Data Science & Engineering  (BIGDATA)
    6. Frequently Asked Questions
      1. Is my proposal a good fit for the Big Data solicitation?
      2. Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?
      3. Should I submit a "mid scale" or a "small" proposal?
      4. How do I submit a proposal to this program?
      5. Do I need to use Grants.gov or Fastlane to apply?
      6. Is my project likely to get funded?
      7. Can I obtain a postdoctoral fellowship through the BIGDATA program?
      8. Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?
      9. Can for-profit entities apply for funding through this solicitation?
      10. What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?
      11. Can a foreign organization submit a proposal?
      12. How do I know if my request for funding is relevant to NIH?
      13. Are duplicate submissions allowed?
      14. Will there be future BIGDATA solicitations?
    7. Event 2nd BIGDATA Webinar
    8. What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
    9. Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
      1. Title Slide
      2. Big Data Research and Development Initiative
      3. The Big Data Team
      4. Outline
      5. Data Deluge 1
      6. Dealing with Data
      7. Data Deluge 2
      8. Opportunities
      9. Dealing with Data
      10. Examples of Research Challenges
      11. BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision
      12. BIGDATA Solicitation in Context
      13. BIGDATA Solicitation
      14. Data management, collection and storage (DCM)
      15. Data Analytics (DA)
      16. E-science collaboration environments (ESCE)
      17. NIH BIGDATA Priorities
      18. National Priorities
      19. What proposals are not good fits for the BIGDATA Solicitation? 1
      20. What proposals are not good fits for the BIGDATA Solicitation? 2
      21. Proposal Submission and Review
      22. Review Criterion: Intellectual Merit 1
      23. Review Criterion: Intellectual Merit 2
      24. Review Criterion – Capacity Building (CB)
      25. Evaluation Plan
      26. Data Sharing Plan
      27. Software Sharing Plan
      28. Mid-scale proposals: Coordination plan (CP)
      29. Proposal Types and Deadlines
      30. How many awards are anticipated?
      31. How does one apply?
      32. Questions and Answers
      33. Credits
  14. NEXT

Big Data Senior Steering Group Meeting

January 24, 2013
Qinetiq-NA, 4100 North Fairfax Ave, Arlington, VA Suite 800
10AM – 12PM ET
Call in: 1-866-773-0704 Code: 5288814#
WebEx

Agenda

Handouts:
Agenda
January 10 Meeting Notes are on the Wiki
Presentation Slides as made available
OMB/OST NITRD Priorities for 2012

1.    Introduction of Brand Niemann, “Government Challenges With Big Data: A Semantic Web Strategy for Big Data” Slides Slides
a.    Presentation
b.    Q&A

2.    Introduction of  Celia MerzbacherSRC (Semiconductor Research Corporation) Slides

a.    Presentation
b.    Q&A

3.    Other Business:
a.    OMB/OSTB NITRD Priorities for 2013 – input request
b.    Intra-NITRD Collaboration ideas/needs

4.    Next meeting February 14, 2013, 10AM ET: guest speakers: Peter Lyster and Allen Dearry will present on the “NIH Big Data Initiative

Government Challenges With Big Data: A Semantic Web Strategy for Big Data

Source:Slides

Title Slide

BrandNiemann01242013Slide1.GIF

Outline

BrandNiemann01242013Slide2.GIF

Why We Are Here

BrandNiemann01242013Slide5.GIF

Data Science Analytics Library: Telescope & Library

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public

BrandNiemann01242013Slide8.GIF

From the Year of Big Data to the Year of the Data Scientist Working With Big Data

http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data#Story

BrandNiemann01242013Slide9.GIF

The Practice of Data Science

BrandNiemann01242013Slide11.GIF

Current US Government Semantic Web Strategy

BrandNiemann01242013Slide12.GIF

Comment From Owen Ambur

BrandNiemann01242013Slide13.GIF

International Linked Open Data Strategy: Linked Open Data Cloud Data

http://semanticommunity.info/@api/deki/files/8824/=VIVO.xlsx

BrandNiemann01242013Slide14.GIF

International Linked Open Data: My EPA Green App Data App Example

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?EPAGreenAppsDataApp-Spotfire

BrandNiemann01242013Slide16.GIF

Our Semantic Web Strategy for Data: Simple Explanation

http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline

BrandNiemann01242013Slide18.GIF

Our Semantic Web Strategy for Data: NASA Big Data Example

http://semanticommunity.info/@api/deki/files/20313/NASABigData.xls

BrandNiemann01242013Slide19.GIF

My 5-Step Method

BrandNiemann01242013Slide21.GIF

Get to 5-Stars With Open Data

http://www.w3.org/DesignIssues/LinkedData.html

BrandNiemann01242013Slide22.GIF

System of Systems Architecture

BrandNiemann01242013Slide23.GIF

Data Federation in Spotfire: In-Memory and In-Database Data

http://semanticommunity.info/A_Spotfire_Gallery/Users_Guide#Working_With_Large_Data_Volumes

BrandNiemann01242013Slide24.GIF

Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library

BrandNiemann01242013Slide25.GIF

Data Federation in Spotfire: Data Panel

Web Player

BrandNiemann01242013Slide26.GIF

15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)

http://semanticommunity.info/Federal_SOA

BrandNiemann01242013Slide28.GIF

Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies

Q & A

BrandNiemann01242013Slide30.GIF

Story

Slides Slides

From the Year of Big Data to the Year of the Data Scientist Working With Big Data

As the year of Big Data comes to a close and the Year of the Data Scientist Working With Big Data begins, it is useful to assess what has been accomplished in the Federal Government and what should be accomplished this next year.

Of course the highlights this past year were the announcements:

To aid readers that are new to big data, I created a more detailed Chronology of Big Data in the Federal Government and included a glossary of Big Data Buzzwords From A to Z.

The White House Office of Science and Technology Policy (OSTP) launched the Big Data Initiative through its Networking Information Technology Research and Development (NITRD) Office, led by Dr. George Strawn, and its Big Data Senior Steering Group, Co-Chaired by Suzi Iacono, NSF, and Karin Remington, NIH.

At the Big Data and the Government Enterprise Conference, Dr. Strawn said that if we do not produce big data results of value for government business and science soon, then we will be on to something deemed more important next year. In support of that the NITRD Big Data Challenge Series was launched. The MeriTalk organization launched a discussion forum in support of The Big Data Challenge that will have a Big Data Exchange Meeting in February. MeriTalk also did a survey of the state of big data (The Big Data Gap) that concluded: Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data. Government cloud computing leaders have realized there is a need to have something more than just email and collaboration tools in the cloud, like big data, and will discuss that at an upcoming conference (NIST, January 15-17, 2013).

In my role as Knowledge Capture Chair for the ACT-IAC Big Data Committee, I have used the following resources:

to produce a Cross-Walk Table of big data pilots by government agency. I found that NASA was conspiciouslu absent, especially since last fall they kicked off The Big Data Challenge series designed to find innovative solutions to the government’s big data problems. The first contest was all about making disparate, incompatible data sets usable and actually valuable across agencies as follows:

“How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”

They also posed the questions: Is creating a contest the right way to get the government to start thinking about how to control big data?  What would be your submission?

So I decided to take that challenge and provide a specific example shown elsewhere using NSF and NASA data sources as follows:

The first objective: “How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted?

Was met by using the NSF Spreadsheet of Big Data Awards and the Data.gov Catalog Spreadsheet (about 6000 dissimilar and incompatible data sets) to query for the NASA data sets (only 3 actual data sets and 22 tools) and then getting those 3 data sets (actually only two could be retrieved because the 3rd was a broken link) into a common tool where the knowledge could be extracted for the second objective. Interesting the data.nasa.gov site says they have over 500 data sets in a directory and the NASA Data Resources web site mentions the Global Change Master Directory with 1000s on data sets with high-quality metadata. So i put one of the Directory data sets (Venus Craters Spreadsheet) into the common tool. I requested the Global Change Master Directory in an open format so I could put it into the common tool as well in the future. So I have no problem putting diverse data sets into a common tool once I get and prepare them (see below: "it is the problem.")

The second objective: How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”

Was met only partially so far because so much work is required to get and prepare the NASA data sets for a common tool.

This reminded me of the experience of Josh Wills, Data Scientist @Cloudera, writing on The Practice of Data Science who said:

  • Key trait of all data scientists. Understanding “that the heavy lifting of [data] cleanup and preparation isn’t something that gets in the way of solving the problem: it is the problem.” (DJ Patil)
  • Inverse problems. Not every data scientist is a statistician, but all data scientists are interested in extracting information about complex systems from observed data, and so we can say that data science is related to the study of inverse problems.   Real-world inverse problems are often ill-posed or ill-conditioned, which means that scientists need substantive expertise in the field in order to apply reasonable regularization conditions in order to solve the problem.
  • Data sets that have a rich set of relationships between observations. We might think of this as a kind of Metcalfe’s Law for data sets, where the value of a data set increases nonlinearly with each additional observation. For example, a single web page doesn’t have very much value, but 128 billion web pages can be used to build a search engine.
  • Open-source software tools with an emphasis on data visualization. One indicator that a research area is full of data scientists is an active community of open source developers.

So NASA's work with big data seems to be more about tools (software coding) than actual data science about data sets and reinforces Dr. George Strawn's observation that if we do not get some real results with big data of value to the business and science of government, we will be on to something else that will.

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Error: Embedded data could not be displayed. Use Google Chrome

Slides

Slides 1

NASABigData-Spotfire1.png

Slides 2

NASABigData-Spotfire2.png

Slides 3

NASABigData-Spotfire3.png

Slides 4

NASABigData-Spotfire4.png

Slides 5

NASABigData-Spotfire5.png

Slide 6

NASABigData-Spotfire6.png

Upcoming

Data Transparency Coallition, January 4, Capitol Hill. WikiSlides.

Big Data & Activity Based Intelligence, IAC General Membership Meeting, January 16, Falls Church, VA. WikiSlides

NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, Gaithersburg, MD

W3C eGov Special Interest GroupOpen Government Data for Japan (and the US and Europe): January 21, Slides

Federal Big Data Senior Steering Group,Wiki. January 24, Ballston, VA

Big Data Exchange Meeting, February 26, 2013, 8-10 a.m., The City Club of Washington at Columbia Square, Washington, D.C.

ACT-IAC Collaboration & Transformation SIG, Government Challenges With Big Data, February 23, Fairfax, VA. WikiSlides

Previous

Big Data: Big Problem, Big Answer for the CIA, November 17, 2011

The Value Potential of Big Data to Government: Parsing and serving it up “small", February 26, 2012 USE GRAPHIC

Intelligence Community Loves Big Data, March 6, Wiki

Challenges and Opportunities in Big Data, March 29. AAAS, Washington, DC

Big Data Conference, May 8-9, Arlington, VA. Wiki

How To Become a Data Scientist With Spotfire 5, May 30, Slides

Big Data, June 13-14, NIST, Gaithersburg, MD.Wiki

Big Data and the Government Enterprise, June 21, Wiki

Big Data Innovation andSocial Media & Web Analytics Innovation, September 13-14, Boston, MA. Wiki Blogs Slides

BIG DATA at the Hill, September 25, Rayburn House Office Building, Washington, DC

Federal Big Data Senior Steering Group, September 27. Wiki. Moved to January 24th

14th SOA for e-Government Conference, MITRE, McLean, VA, October 2. Federation of SOA Pilot.Slides Slides

Emerging Technology SIG Meeting: Big Data Committee Overview, October 18, Noblis. Wiki

Recorded Future User Conference, October 16, 2012, Newseum, Washington, DC. Wiki

Executive Leadership Conference "Charting a Course", October 28-30, Colonial Williamsburg, Virginia. Wiki

Using Data Science Evidence in Public Policy for Big Data and Elections, George Mason University, University Hall, November 1-2. Slides

ACT-IAC ET SIG Semantic Web (with Big Data), November 29. GSA Fairfax, VA. Slides

Government Information and Analytics Summit, November 28-29, Washington, DC. Proposal Wiki Wiki

NGA Collaboration Forum Outreach Event, December 11, NGA Campus East (NCE), Springfield, VA

Big Data Part II, December 12, Washington, DC. Wiki

Research Notes

My 5-Step Method
So what I like to do to illustrate (data science) and explain (data journalism) is the following (like a recipe):
  • Put the Best Content into a Knowledge Base (e.g. MindTouch*)
    • The Japan Statistical Yearbook 2012
  • Put the Knowledge Base into a Spreadsheet (Excel*)
    • Linked Data to Subparts of the Knowledge Base
  • Put the Spreadsheet into a Dashboard (Spotfire*)
    • Data Integration and Interoperability Interface
  • Put the Dashboard into a Semantic Model (Excel*)
    • Data Dictionaries and Models
  • Put the Semantic Model into Dynamic Case Management (Be Informed*)
    • Structured Process for Updating Data in the Dashboard
 
* Examples of tools used.
 
To Get to 5-Stars With Open Data
Star Definition Example / Tool*
Make your stuff available on the Web (whatever format) under an open license This Story /MindTouch
★★ Make it available as structured data (e.g., Excel instead of image scan of a table) Spreadsheet / Excel
★★★ Use non-proprietary formats (e.g., CSV instead of Excel) Table / MindTouch and Spotfire
★★★★ Use URIs to identify things, so that people can point at your stuff Table of Contents / MindTouch and Spotfire
★★★★★ Link your data to other data to provide context Table / MindTouch and Spotfire
 

* Examples of tools used.

Source of Star and Definition: http://www.w3.org/DesignIssues/LinkedData.html
 
MY NOTE: I am trying to optimized MindTouch content for mobile devices now as part of my Big Data on Mobile Devices for the above presentations. Spotfire already works on mobile devices (iPads).

"MindTouch is focused on becoming optimized for mobile devices.  Currently mobile optimization is performed through our client services department on a case by case basis.  These enhancements do require additional costs for implementation and development. Mobile enhancement will become a productized part of our product in the future. I do not have a firm date." Source Cory Ganser, MindTouch

Summary

Cross-Walk Table

Agency Contact BDSSG Conferences ELC (4) Pilots (3)
OSTP (1) John Holdren, Assistant to the President and Director, White House Of..ice of Science and Technology Policy Through NITRD (2) George Strawn and Wendy Wigen

My DoS Story (6, etc.)

George Strawn HealthSafety, and Energy Data and Semantic Medline
NSF (1) Subra Suresh, Director, National Science Foundation Independent Agency (2)     Dashboards
NIH (1) Francis Collins, Director, National Institutes of Health Through DDHS (2) with AHRQ and ONC   Andrea T. Norris and Frank Baitman Medicare for IOM and SEER
USGS (1) Marcia McNutt, Director, United States Geological Survey   EarthCube Charet   EarthCube
DoE (1) William Brinkman, Director, Department of Energy Office of Science NNSA and SC (2) Internal Summit Peter Tseronis, Michael Franklin, and E. Wes Bethel DISRE Solar (Government Information and Analytics Summit)
DoD (1) Zach Lemnios, Assistant Secretary of Defense for Research & Engineering, Department of Defense DARPA, NSA, OSD, & Service Research Organizations (2)   (David Wennergran and Teri Takai) Dashboards
DARPA (1) Kaigham “Ken” Gabriel, Deputy Director, Defense Advanced Research Projects Agency   (Martin Hyatt, January 27-29)    
DoC   NIST and NOAA (2) App Contest Story   Dashboards
DHS   Independent Agency (2) Story    
EPA Malcolm Jackson Independent Agency (2) Mine (Malcolm Jackson) EnviroFacts and Indicators
NASA Tomas Soderstrom Independent Agency (2) Story Tomas Soderstrom IN PROCESS
NARA Jason Baron Independent Agency (2) Story (Jason Baron) IN PROCESS
IC Gus Hunt, CTO, CIA, and Robert Ames, In-Q-Tel   Story and Story Michael Howell CIA World Fact Book and Quint
GSA Dave McClure and Kathleen Turco   Story (Marie Davie and Johan Bos-Beijer) Governmentwide Acquisition Contract (GWAC) Dashboard
Treasury     Story Adam Goldberg and Thomas Vannoy Bureau of Public Debt

(1) http://semanticommunity.info/AOL_Government/Challenges_and_Opportunities_in_Big_Data

(2) http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee#Slide_2_Member_Agencies

(3) http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee#Compile_initial_list_of_Big_Data_initiatives_and_disseminate_via_site

(4) http://semanticommunity.info/AOL_Government/ACT-IAC_2012_Executive_Leadership_Conference#TRACK_THREE:_BIG_DATA.2C_BOLD_HORIZONS

See details below

ELC Track Three: Big Data Bold Horizons

MY NOTE: This conference was cancelled due to Hurricane Sandy and its content has been included in the Cross-Walk Table above

Government agencies are awash in ever-expanding volumes of data, and providing timely and efficient management and analysis of data assets represents one of the great management challenges of our time. The rapid increase in data generated from mobile devices, sensors, audio/visual tools, web traffic, and electronic customer transactions illustrates the enormity of this challenge – and the new opportunities it offers. The White House has unveiled a new “Big Data Research and Development Initiative” that commits $200 million to new research efforts in the management of “big data”. Most agencies have efforts underway to harness these new sources of information to provide customer and citizen benefit derived from nuggets of valuable information that are buried in diverse and massive data repositories.

 
This track will examine current big data efforts in select industries and across the Federal Government, and explore technical and policy issues affecting the storage, management, and analysis of big data. Panels will also investigate tools and technologies that enable users to extract value from big data, and identify realistic outcomes and risk mitigation strategies that agencies can use to maximize their return on investments in solving big data challenges.

Defining and Maximizing Big Data

Big data has burst onto the forefront of contemporary IT approaches due the convergence of multiple factors.  This session will explore the government and market forces driving its emergence, including the abundance of data being generated, budgetary and financial influences, and new technologies. The discussion will include the characteristics of big data and the potential impact it has for improving operational capabilities and government services to citizens.

Invited Participants

Timothy Paydos, Director Worldwide Government Information Agenda Team, International Business Machines (IBM)
Michael Franklin, Thomas M. Siebel Professor of Computer Science Director, AMPLab University of California, Berkeley
Dave McClure, Associate Administrator, Citizen Services and Innovative Technologies, General Services Administration
Brendan M. Peter, Director of Global Government Relations, CA Technologies
George Strawn, Director, National Coordination Office, Networking and Information Technology Research and Development Program
 
Making Big Data Real
Early adopter case studies provide a window into the impressive value that big data can provide and the pitfalls to avoid.  This session will explore case studies and peel back the onion on investment decisions, lessons learned and the value achieved for citizens and customers.  It will include a discussion of the business, operational and technical approaches that teams used in their projects, and the decisions and trade-offs that pointed these path finder organizations to a particular architecture.

Invited Participants

Tomas Soderstrom, IT Chief Technology Officer, Jet Propulsion Laboratory, National Aeronautics and Space Administration
E. Wes Bethel, Senior Computer Scientist, Lawrence Berkeley National Laboratory
Bernadette Hyland, Chief Executive Officer, 3 Round Stones
Andrea T. Norris, Director of Center for Information Technology and Chief Information Officer, National Institute of Health
Jason Stowe, Chief Executive Officer, Cycle Computing
 
Volume/Variety/Velocity… Navigating Barriers to Big Data Deployment
Big data is being propelled by new scale out IT capabilities, and existing technologies are being used in innovative ways.  This session will discuss the characteristics of the tools employed to accommodate the volume, variety and velocity of data and the trade-offs for their use.   It will explore the limitations and benefits of technologies, and integration opportunities with the existing IT ecosystem.

Invited Participants

Flavio Villanustre, Vice President, Infrastructure and Products, LexisNexis Risk Solutions and HPCC Systems
Frank Baitman, Chief Information Officer, U.S. Department of Health and Human Services
Rich Byrne, Senior Vice President and General Manager Command and Control Center, MITRE\
Larry Pizette, Solutions Architect Manager, Cloud and Big Data, Amazon Web Services
Amr Awadallah, Chief Technology Officer, Cloudera
 
Leadership Roles in Big Data
The transitioning of big data projects from vision to reality requires innovative leadership, coupled with the ability to obtain funding and move through the acquisition processes.  This session will discuss the challenges and opportunities for kicking off big data efforts and navigating the current budgetary environment.  It will include acquisition and funding approaches that can help leaders to jump start their efforts.

Invited Participants

Adelaide O'Brien, Research Director, IDC Government Insights
Adam Goldberg, Executive Architect, Department of the Treasury
Michael Howell, Deputy Program Manager, Office ot the Program Manager for the Information Sharing Environment, Office of the Director of National Intelligence
Thomas Vannoy, Treasury Bureau of Public Debt
Kathleen Turco, Associate Administrator, Office of Governmentwide Policy, General Services Administration
Peter Tseronis, Chief Technology Officer, Department of Energy

Big Data At the Hill

Source: http://semanticommunity.info/AOL_Government/BIG_DATA_at_the_Hill#Story

Topics Trends Issues Comments
Myth vs. Realities Big Data Solves Everything Hype Without Demonstrated Business and Scientific Value See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year?
Privacy: Who knows what? The Intelligence Community Knows Everything Who Knows Everything the Intelligence Communty Is Doing? See Intelligence Community Loves Big Data
Cloud: Where Big Data belongs? Terabytes to Zettabytes Bandwidth Limitations Amazon: Fedex Your Storage Devices To Us to Upload Your Big Data
Mobility – of you and your data Bring Your Own Device (BYOD) Conventional Web Sites and Databases Are Not Mobile-Enabled Your Mobile Device Has Access To a Supercomputer
Storage and technology Scalable single level storage Collapses the Server, Network, and storage by removing software and replacing them with memory system primitives Panève’s ZettaLeaf & ZettaTree Products
Data Analytics – hidden gems and spurious conclusions Data Science Too Few Data Scientists - Need a Government Data Science Community See My Data Journalism Articles
Opportunities and risks in data aggregation Aggregate Before Analysis To Reduce Size Needels Could Be Lost See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year?
Security concerns for large data sets Integrate Classified and Unclassified Data Sources Different Security Levels Need To Specify/Protect Security at the Row and Element Level
Financial Implications Hadoop for Everything with Big Data Costs 50 Times Higher Than Expected Big Data In Memory Could Be More Costs Effective

Big Data Case Studies High Level Summary

Source: http://semanticommunity.info/AOL_Government/BIG_DATA_at_the_Hill#Table_2:_Case_Studies_High_Level_Summary

The Commission has compiled a set of 10 case studies detailing the business or mission challenge faced, the initial Big Data use case, early steps the agency took to address the challenge and support the use case, and the business results. Although the full text of these case studies will be posted at the TechAmerica Foundation Website, some are summarized below.

Edit section

Agency/Organization/

Company Big Data Project Name

Underpinning Technologies Big Data Metrics Initial Big Data Entry Point Public/User Benefits
Case Studies and Use Cases National Archive and Records Administration (NARA) Electronics Records Archive Metadata, Submission, Access, Repository, Search and Taxonomy applications for storage and archival systems Petabytes, Terabytes/ sec, Semi-structured Warehouse Optimization, Distributed Info Mgt Provides Electronic Records Archive and Online Public Access systems for US records and documentary heritage
TerraEchos
Perimeter Intrusion Detection
Streams analytic software, predictive analytics Terabytes/sec Streaming and Data Analytics Helps organizations protect and monitor critical infrastructure and secure borders
Royal Institute of Technology of Sweden (KTH) Traffic Pattern Analysis Streams analytic software, predictive analytics Gigabits/sec Streaming and Data Analytics Improve traffic in metropolitan areas by decreasing congestion and reducing traffic accident injury rates
Vestas Wind Energy
Wind Turbine Placement & Maintenance
Apache Hadoop  Petabytes Streaming and Data Analytics Pinpointing the optimal location for wind turbines to maximize power generation and reduce energy cost

University of Ontario (UOIT) Medical   Monitoring

Streams analytic software, predictive analytics, supporting Relational Database Petabytes Streaming and Data Analytics Detecting infections in premature infants up to 24 hours before they exhibit symptoms
National Aeronautics and Space Administration (NASA) Human Space Flight Imagery Metadata, Archival, Search and Taxonomy applications for tape library systems, GOTS Petabytes, Terabytes/ sec, Semi-structured Warehouse    Optimization Provide industry and the public with some of the most iconic and historic human spaceflight imagery for scientific discovery, education and entertainment
AM Biotechnologies (AM Biotech) DNA Sequence Analysys for Creating Aptamers Cloud-based HPC genomic applications and transportable data files Gigabytes, 107 DNA sequences compared Streaming Data & Analytics, Warehouse   Optimization,   Distributed Info Mgt Creation of a unique aptamer compounds to develop improved therapeutics for many medical conditions and diseases

National Oceanic and Atmospheric Administration(NOAA) National Weather Service

HPC modeling, data from satellites, ships, aircraft and deployed sensors Petabytes, Terabytes/ sec, Semi-structured, ExaFLOPS,  PetaFLOPS Streaming Data & Analytics, Warehouse   Optimization,   Distributed Info Mgt Provide weather, water, and climate data, forecasts and warnings for the protection of life and property and enhancement of the national economy
Internal Revenue Service (IRS) Compliance Data Warehouse Columnar database architecture, multiple analytics applications, descriptive, exploratory, and predictive analysis Petabytes Streaming Data & Analytics, Warehouse   Optimization,   Distributed Info Mgt Provide America's taxpayers top quality service by helping them to understand and meet their tax responsibilities and enforce the law with integrity and fairness to all
Centers for Medicare & Medicaid Services (CMS) Medical Records Analytics
Columnar and NoSQL databases, Hadoop being looked at, EHR on the front end, with legacy structured database systems (including
DB2 and COBOL)
Petabytes, Terabytes/ day Streaming Data & Analytics, Warehouse   Optimization,   Distributed Info Mgt Protect the health of all Americans and ensure compliant processing of insurance claims
 

Demystifying Big Data — A Practical Guide to Transforming the Business of Government

Source: http://federalbriefings.1105cms01.co.../overview.aspx

See: http://semanticommunity.info/AOL_Government/Big_Data_Part_II

“Big Data” is top-of-mind for technologists today—driven by the recognition that we are only just beginning to see the accumulation of data generated by our digitally-dependent and networked lives. Given the increasingly mobile and real-time nature of information collection and sharing, in formats from audio to video to instant messaging and more, there is no doubt the sheer volume of data generated by users and organizations will quickly surpass most near-term forecasts.

How should Federal Government organizations prepare for and plan to face the big data wave of 2013 and beyond? First, by becoming informed about the nature and scope of the topic, and second, by hearing more about how agencies already are taking steps to apply data science to optimizing their operations and results.

This is your opportunity to join the dialogue--plan to join the Tech America Foundation and the 1105 Government Information Group in mid-December to review how big data is affecting government organizations today and will continue to do so into the future. The program will feature the findings of the Tech America Foundation 2012 Big Data Commission Report as well as discussion of:

  • How big data trends will impact the way agencies collect, store, manage, and protect their information assets
  • How government enterprises can get started to leverage the data they have and how to “scale up”
  • Why user demands for more information--anytime and anywhere—will continue to transform how agencies deliver services and data
  • The role of data analytics for using information on-hand to improve decision support
  • How to design trusted data sharing methods for collaborative environments and valuable results
  • Techniques for monitoring and protecting data stores to prevent fraud, breach, and compromise
  • Strategies for ensuring privacy protection using available technology and sound compliance policy
  • Which technologies enable big data applications, and the impact of wireless devices and sensors on big data accumulation

NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013

Source: http://www.nist.gov/itl/cloud/cloudbdworkshop.cfm and http://www.nist.gov/itl/cloud/upload/Cloud-Computing-and-Big-Data-Forum-and-Workshop_agenda.pdf (PDF)

See: http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Cloud_Computing_AND_Big_Data_Forum_and_Workshop_January_15-17_2013

The NIST Cloud and Big Data Workshop will bring together leaders and innovators from industry, academia and government in an interactive format that combines keynote presentations, panel discussions, interactive breakout sessions and open discussion. The conference will be led off by Pat Gallagher, Under Secretary of Commerce for Standards and Technology and Director, NIST, and Steven VanRoekel, the Chief Information Officer of the United States. 

The second and third days of the workshop focuses on the intersection of Cloud and Big Data. Fully realizing the power of Big Data depends on meeting the unprecedented demands on storage, integration, and analysis presented by massive data sets--demands that Cloud innovators are working to meet today. The workshop will explore possibilities for harmonizing Cloud and Big Data measurement, benchmarking, and standards in ways that bring the power of these two approaches to bear in driving progress and prosperity. 

Whether you’re interested in Big Data as a service, analytics and visualization, operational infrastructure, or other areas, the workshop will give you the opportunity to share your ideas and explore those of others on key questions such as:  

  • What benchmarks are needed to evaluate different Cloud architectures and implementations for Big Data applications? 
  • How do the ways in which Big Data are structured and measured today either facilitate or impede Cloud solutions? What changes are needed?
  • What standards and metrics could be harmonized to create synergy between Cloud and Big Data?  What standards are missing?
  • How are Big Data analytics changing/influencing Cloud and vice-versa?
  • What are the needs for interoperability with regard to Big Data in the Cloud?
  • What is your most pressing question regarding Big Data in the cloud?

Big Data Exchange Meeting, February 26, 2013

8-10 a.m.
The City Club of Washington at Columbia Square
Washington, D.C.
Stay tuned for more information. This meeting will be complimentary for government attendees. Questions? Contact Emily Smalling at esmalling@meritalk.com.

The Big Data Challenge

Source: http://meritalk.com/blog.php?user=BigDataExchange&blogentry_id=3340

Posted: 10/17/2012

On October 3, 2012, NASA and a couple other government agencies kicked off The Big Data Challenge series designed to find innovative solutions to the government’s big data problems. The first contest is all about making disparate, incompatible data sets usable and actually valuable across agencies.

Here is the first contest:

“How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”

Is creating a contest the right way to get the government to start thinking about how to control big data?  What would be your submission?  Check out the contest page here and share your thoughts below. MY NOTE: See Below.

Big Data Gap

Source: http://meritalk.com/big-data-report-register.php

Despite the buzz, big data is a new concept that most mid-level decision makers are unfamiliar with. Though Federal thought leaders are extolling the impact of big data, the truth is that to capture the full potential of big data, agencies need the following:

  • The ability to easily store and access data
  • Robust computational power and software to manipulate the data
  • Trained personnel to analyze the data

The Big Data Gap report captures insights from big data leaders. Download the Big Data Gap to find out:

  • How leaders define big data and the benefits it can bring to Federal government
  • How agencies are using their data today and how they would like to use it in the future
  • What steps agencies need to take to get big data programs on track to realize savings and improve service to citizens

Click here to view the press release. MY NOTE: See Below.

Click here to view the media coverage. MY NOTE: See Below.

Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data

Source: http://meritalk.com/pdfs/big-data/MeriTalk_Big_Data_Gap_Press_Release.pdf (PDF)

 
IT professionals estimate that they have less than half of the storage, computing, and personnel resources necessary to leverage big data for efficiency gains, better decision making
 
Alexandria, Va., May 7, 2012 – Government data is growing and agencies are looking to leverage big data to support government mission outcomes. However, most agencies lack the data storage/access, computational power, and personnel they need to take advantage of the big data opportunity, according to a new study by MeriTalk sponsored by NetApp. The new report, “The Big Data Gap,” reveals that Federal IT professionals believe big data can improve government but that the promise of big data is locked away in unused or inaccessible data.
 
President Obama’s recently announced Big Data Research and Development Initiative highlights the big data promise – that improving our ability to extract knowledge and insights from large and complex collections of data will help government solve problems. Federal IT professionals agree. According to the Big Data Gap report, Federal IT professionals say improving overall agency efficiency is the top advantage of big data (59 percent) followed by improving speed/accuracy of decisions (51 percent) and the ability to forecast (30 percent).
 
While Federal IT professionals agree there are many benefits to big data, the technology and applications needed to successfully leverage big data are still emerging. Sixty percent of civilian agencies and 42 percent of Department of Defense/intelligence agencies say they are just now learning about big data and how it can work for their agency. While the promise of big data is strong, most agencies are still years away from using it. Just 60 percent of IT professionals say their agency is analyzing the data it collects and less than half (40 percent) are using data to make strategic decisions.
 
On average, Federal IT professionals report that it will take their agencies three years to take full advantage of big data.
Federal IT professionals report that the amount of government data will continue to grow. Eighty-seven percent of Federal IT professionals say their agency’s stored data has grown in the last two years. The majority of Federal IT professionals – 96 percent – expect their agency’s stored data to grow in the next two years by an average of 64 percent.
 
“Government has a gold mine of data at its fingertips,” said Mark Weber, president of U.S. Public Sector for NetApp. “The key is turning that data into high-quality information that can increase efficiencies and inform decisions. Agencies need to look at big data solutions that can help them efficiently process, analyze, manage, and access data, enabling them to more effectively execute their missions.”
 
While agencies have a huge amount of data – that continues to grow – in many agencies the data is locked away. Nearly a third of agency data is unstructured and therefore substantially less useful. The amount of unstructured data is growing – 64 percent of Federal IT professionals report that the amount of unstructured data they store has increased in the past two years. Data ownership further complicates agencies’ ability to use big data. Agencies are unclear on who owns the data, with 42 percent reporting IT departments own the data, 28 percent reporting that the data belongs to the department that generates it, and 12 percent reporting the data belongs to the C-level.
 
Federal IT professionals also identify a gap between the big data possibility and reality with nine out of 10 reporting challenges on the path to harnessing big data. Agencies estimate that they have just 49 percent of the data content storage/access, 46 percent of the bandwidth/computational power, and 44 percent of the personnel they need to leverage big data and drive mission results. In addition, 57 percent say they have a least one dataset that has grown too big to work with using their current management tools and/or infrastructure.
 
Despite the challenges, agencies are working to harness big data. Sixty-four percent of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand. However, they estimate that it would take an average of 10 months to double their short-to medium-term capacity. In addition, some agencies are taking steps to improve their ability to manage and make decisions with big data. Top tactics include investing in IT infrastructure to optimize data storage (39 percent), training IT professionals to manage/analyze big data (33 percent), and improving the security of stored data (31 percent).
 
“The Big Data Gap” is based on a survey of 151 Federal government CIOs and IT managers in March 2012. The report has a margin of error of +/- 7.95 percent at a 95 percent confidence level. To download the full study, please visit http://www.meritalk.com/bigdatagap.
 
About MeriTalk
The voice of tomorrow’s government today, MeriTalk is an online community and go-to resource for government IT. Focusing on government’s hot-button issues, MeriTalk hosts Data Center Exchange, Cyber Security Exchange, and Cloud Exchange – platforms dedicated to supporting public-private dialogue and collaboration. MeriTalk connects with an audience of 85,000 government community contacts. For more information, visit http://www.meritalk.com or follow us on Twitter, @meritalk.

The Big Data Gap: The 2012 NetApp Study – Media Results

Source: http://meritalk.com/pdfs/big-data/2012_The_Big_Data_Gap_Media_Coverage_062212.pdf (PDF)

 
As of July 6, 2012

Coverage to Date

Civ Source
By Bailey McCann
May 29, 2012
 
On the Frontlines
June/July 2012
 
Government Health IT
By Katie Spies
June 11, 2012
 
Cloud Times
By Saroj Kar
June 11, 2012
 
PC Advisor
By Thor Olavsrud
June 11, 2012
 
CIO
By Thor Olavsrud
June 11, 2012
 
Fierce CIO
By Caron Carlson
June 13, 2012
 
Executive Gov
By Katie Noland
May 23, 2012
 
eWeek
By Nathan Eddy
May 10, 2012
 
Federal News Radio
By Michael O’Connell
May 16, 2012
 
Baseline
By Jennifer Lawinski
May 16, 2012
 
Experian QAS
By Experian QAS Staff
May 10, 2012
 
Read Write Hack
By Scott M. Fulton
May 10, 2012
 
O’Reilly Radar
By Audrey Watters
May 10, 2012
 
Government Health
By Tom Sullivan
May 10, 2012
 
TechZone 360
By Peter Bernstein
May 10, 2012
 
AOL Government
By Kathleen Hickey
May 10, 2012
 
Techno Capital
By Techno Capital Staff
May 8, 2012
 
Tech Investor News
By CIO Insight Staff
May 8, 2012
 
DataVersity
By Angela Guess
May 8, 2012
 
Baseline
By Baseline Staff
May 8, 2012
 
eWeek
By Nathan Eddy
May 8, 2012
 
CIO Insight
By CIO Insight Staff
May 8, 2012
 
Silicon Angle
By Maria Deutscher
May 8, 2012
 
WTN News
By Nathan Eddy
May 7, 2012
 
Channel Biz
By Tamlin Magee
May 7, 2012
 
Federal Computer Week
By Camille Tuutti
May 7, 2012
 
Information Management
By Jim Ericson
May 7, 2012
 
Potomac Tech Wire
By Potomac Tech Wire Staff
May 7, 2012

Coverage to Date – Press Release Pick Ups

The Data Center Journal
Enhanced Online News
Virtual Strategy Magazine
IT Briefing
Smart Grid, TMC Net
Yahoo Finance
Reuters
TD Ameritrade
Sympatico Finance
Canada Health
Benzinga
Yahoo Finance Canada
Morning Star
Financial Content
Newsblaze
Infrastructure
Canada
iStock Analyst
TMC Net Green
Sun Herald
TMC Net Government

The Big Data Gap: Report

Source: http://meritalk.com/big-data-report-register.php (PDF)

Title Slide

TheBigDataGapSlide1.png

Introduction

TheBigDataGapSlide2.png

Executive Summary

TheBigDataGapSlide3.png

Big Data = Better Government

TheBigDataGapSlide4.png

Not There Yet

TheBigDataGapSlide5.png

Data Disconnect

TheBigDataGapSlide6.png

Data Deluge

TheBigDataGapSlide7.png

Data on the Loose

TheBigDataGapSlide8.png

Management Hurdles

TheBigDataGapSlide9.png

Vision –Reality: The Big Data Gap

TheBigDataGapSlide10.png

Unmanageable Data

TheBigDataGapSlide11.png

Data On Demand

TheBigDataGapSlide12.png

Driving Data Management Forward

TheBigDataGapSlide13.png

Recommendations

TheBigDataGapSlide14.png

Methodology and Demographics

TheBigDataGapSlide15.png

Thank You

TheBigDataGapSlide16.png

Welcome to the NITRD Big Data Challenge Series!

Source: http://community.topcoder.com/coeci/nitrd/

The Big Data Challenge is an effort by the U.S. government to conceptualize new and novel approaches to extracting value from “Big Data” information sets residing in various agency silos and delivering impactful value while remaining consistent with individual agency missions This data comes from the fields of health, energy and Earth science. Competitors will be tasked with imagining analytical techniques, and describe how they may be shared as universal, cross-agency solutions that transcend the limitations of individual agencies.

“Big Data is characterized not only by the enormous volume or the velocity of its generation but also by the heterogeneity, diversity and complexity of the data,” said Suzi Iacono, co-chair of the interagency Big Data Senior Steering Group, a part of the Networking and Information Technology Research and Development program. “There are enormous opportunities to extract knowledge from these large-scale diverse data sets, and to provide powerful new approaches to drive discovery and decision-making, and to make increasingly accurate predictions. We’re excited to see what this competition will yield and how it will guide us in funding the next round of big data science and engineering.

In this contest series, we will ask competitors to consider big data ideas from several different perspectives.  We begin with a contest that will award the best ideas for tools and techniques for homogenizing disparate data sources and topics.  In later contests, we’ll ask for ideas that are more focused in the domains of Health, Energy and Earth Science.  Check out the contest specifications, and good luck!

NITRD Review Board

The NITRD Big Data Contest Series will be supported by a review board assembled from experts in industry and academia.  Once submissions have been screened for completeness, they will be forwarded to the board for final review.  You can read about the Board by clicking here.

About the Contests

There are two ways to compete on the first contest.  If you prefer to work with TopCoder Studio, you can register for the contest by clicking the Studio link below.  If your preference is for the /tc Community, you can register by clicking the “Software” link below.  Both choices are embodiments of the same challenge.  If you are uncertain which choice is best for you, or you are new to TopCoder, we suggest the Studio link.

Click here to compete on TopCoder Studio. MY NOTE: Requires Log-In.

Click here to compete on /tc. MY NOTE: See Below.

Big Data Challenge - Conceptualization - Idea Generation

  • 1st Place $750
  • 2nd Place $375
  • Reliability Bonus $150
  • DR Points N/A

Contest Overview

Detailed Requirements

The Big Data Challenge is an effort by the U.S. government to find new and inventive ways to use the huge and diverse sets of data maintained by numerous government agencies. There is a lot of data out there, collected for many different purposes and in many different formats that make interoperation very challenging. How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted?  How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges? Those are the questions we'd like you to help us answer. We're looking for your ideas about how to coordinate data sets drawn from multiple domains, and about what end uses we should be working toward.

Please note that regardless of what is displayed elsewhere on this page, we will be paying out $750 for each of the top three ideas.

1st place: $750
2nd place: $750
3rd place: $750

We will also pay a 30% bonus to prize-winning ideas that make use of streaming data, as noted on the contest wiki page.

Technologies

N/A

Final Submission Guidelines

Please see the contest wiki page for complete details.

Eligibility

You must be a TopCoder member, at least 18 years of age, meeting all of the membership requirements. In addition, you must fit into one of the following categories.

If you reside in the United States, you must be either:
  • A US Citizen
  • A Lawful Permanent Resident of the US
  • A temporary resident, asylee, refugee of the U.S., or have a lawfully issued work authorization card permitting unrestricted employment in the U.S.
If you do not reside in the United States:
  • You must be authorized to perform services as an independent contractor.
    (Note: In most cases you will not need to do anything to become authorized)

 

Results

  • Registrants: 56 
  • Submissions: 5
  • Submission %: 8.93%
  • Passed Screening: 5
  • Passed %: 100.00%
  • Average Initial Score: 28.00
  • Average Final Score: 28.00
 
Handle Date
Registered
Date
Submitted
Screening
Score
Initial/Final
Score
 
Points
d_jash 10.05.2012 10.15.2012 100.00 100.00 / 100.00 N/A
poundinc 10.05.2012 10.15.2012 100.00 10.00 / 10.00 N/A
mostafaizz 10.06.2012 10.14.2012 100.00 10.00 / 10.00 N/A
manish_mca7 10.13.2012 10.14.2012 100.00 10.00 / 10.00 N/A
milton93 10.08.2012 10.15.2012 100.00 10.00 / 10.00 N/A

The NITRD Big Data Challenge Review Board

Source: http://community.topcoder.com/coeci/nitrd/judges/

To support this contest series, NITRD has assembled a review board of professional and academic professionals.  Once submissions have been screened, they will be forwarded to this board for a final review.  Read about the board members below.

 

Robert W. Bectel serves as Chief Technology Officer and Senior Policy Advisor for EERE where he helps accelerate the commercialization of new energy solutions and seeks to establish a uniform, efficient, agile, and user-friendly IT infrastructure which facilitates the performance of every employee’s work and the achievement of EERE’s mission.  To do this, he brings a unique passion, focus and expertise on using cutting-edge technologies to enable the development of low cost and easily accessible distributable content, mobile applications, robust software and interactive solutions – all with the goal of helping transform and expedite market acceptance, transparency and efficiency.  Prior to his transition to government, Rob’s more than 16 years of experience including both the non-profit and commercial sectors. He built and managed the first vertical business network for pharmacists, http://ww.pharmacistelink.com, for the National Community Pharmacists Association, and led its Committee on Innovation and Technology.

Prior to NCPA, Rob directed the development of the USDA consumer website and was the marketing director for ChainDrugStore.net. As the portal manager for the TruSecure Corporation, he performed interactive outreach activities to customers and the public on matters of information and network security. Rob has co-authored several publications and textbooks on long-term care systems in the United States and Europe.

 

Austin L. Brown, Ph.D. is senior analyst in the Washington, DC office of the National Renewable Energy Laboratory (NREL). His work focuses on clean transportation, including efficient and electrified vehicles, renewable fuels, and transportation system interactions with the built environment. He also moonlights as Deputy Chief Technology Officer for the Office of Energy Efficiency and Renewable Energy in the U.S. Department of Energy, specializing in energy analysis, tools, and opening up data sets for innovation.

Austin was a scientist in his previous life. He has a B.S. in Physics from Harvey Mudd College and received his Ph.D. in Biophysics from Stanford University. With this scientific training, he transitioned to Washington to connect science to policy decisions, especially federal clean energy research funding.

Austin’s primary career interest is to see the United States begin on a pathway that leads towards a future where energy is clean, sustainable, affordable, and reliable domestically and worldwide. This overall goal is at least partially selfish as it will ensure he can continue doing the things he loves – appreciating the outdoors, SCUBA diving, skiing – and help save these invaluable resources for future generations.

 

Will Barkis is the Project Lead for the Mozilla Ignite Challenge, a partnership with the National Science Foundation to create apps to change the world on next-gen networks. He is a co-founder of Bill-Doctor.com and served as technology policy fellow at NSF. Will has a PhD in Neuroscience from the University of California, San Diego.

 

Ian J. Kalin is passionate about energy and empowering people through data. Ian started his professional career as a Counter-Terrorism Officer for the US Navy, later serving as a Nuclear Engineer onboard the USS Ronald Reagan. After leaving the Navy, Ian joined a rising company called PowerAdvocate, which delivers market intelligence solutions to the electric and gas sectors. His entrepreneurial work led to significant cost savings for utility companies and their customers. Ian has a BS in International Politics from Georgetown and a MA in Engineering Management from Old Dominion. He lives in San Francisco with his wife, Amanda, and is a musician in his spare time.

 

Ryan McKeel is a technologist with a passion for web-based entrepreneurship and data visualization.  He started his first business at the age of 14 creating database-driven websites, and has since worked with the Air Force Research Laboratory, the DARPA COORDINATORs program and the National Renewable Energy Laboratory on advanced data sharing and visualization tools.  Ryan is one of the lead developers on the Open Energy Information platform (OpenEI.org), a collaborative website using linked open data to provide simple access to international energy information and data.  He holds a Bachelor of Science degree in IT and Entrepreneurship from Rensselaer Polytechnic Institute.  Ryan lives in Denver, Colorado with his wife and three young children; he is also a classical and jazz pianist who has had the honor of performing with Colorado symphony orchestras.

Big Data Buzzwords From A to Z

By Rick Whiting, CRN 4:00 PM EST Wed. Nov. 28, 2012

Introduction

Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation of technology to handle it. And, with new technologies come new buzzwords: acronyms, technical terms, product names, etc.
 
Even the phrase "big data" itself can be confusing. Many think of "lots of data" when they hear it, but big data is much more than just data volume.
 
Here, in alphabetical order, are some of the buzzwords we think you need to be familiar with.
 
An acronym for Atomicity, Consistency, Isolation and Durability, ACID is a set of requirements or properties that, when adhered to, ensure the data integrity of database transactions during processing. While ACID has been around for a while, the explosion in transaction data volumes has focused more attention on the need for meeting ACID provisions when working with big data.
 
IT systems today pump out data that's "big" on volume, velocity and variety.
 
Volume: IDC estimates that the volume of world information will reach 2.7 zettabytes this year (that's 2.7 billion terabytes) and that's doubling every two years.
 
Velocity: It's not just the amount of data that's causing headaches for IT managers, but the increasingly rapid speed at which data is flowing from financial systems, retail systems, websites, sensors, RFID chips and social networks like Facebook, Twitter, etc.
 
Variety: Going back five, maybe 10 years, IT mostly dealt with alphanumeric data that was easy to store in neat rows and columns in relational databases. No longer. Today, unstructured data, such as Tweets and Facebook posts, documents, Web content and so on, is all part of the big data mix.
 
Some new-generation databases (such as the open-source Cassandra and HP's Vertica) are designed to store data by column rather than by row as traditional SQL databases do. Their design provides faster disk access, improving their performance when handling big data. Columnar databases are especially popular for data-intensive business analytics applications.

Data Warehousing

The concept of data warehousing, copying data from multiple operational IT systems into a secondary, off-line database for business analytics applications, has been around for about 25 years.
 
But as data volumes explode, data warehouse systems are rapidly changing. They need to store more data -- and more kinds of data -- making their management a challenge. And where 10 or 20 years ago data might have been copied into a data warehouse system on a weekly or monthly basis, data warehouses today are refreshed far more frequently with some even updated in real time.

ETL

Extract, transform and load (ETL) software is used when moving data from one database, such as one supporting a banking application transaction processing system, to another, such as a data warehouse system used for business analytics. Data often needs to be reformatted and cleaned up when being transferred from one database to another.
 
The performance demands on ETL tools have increased as data volumes have grown exponentially and data processing speeds have accelerated.

Flume

Flume, a technology in the Apache Hadoop family (others include HBase, Hive, Oozie, Pig and Whirr), is a framework for populating Hadoop with data. The technology uses agents scattered across application servers, Web servers, mobile devices and other systems to collect data and transfer it to a Hadoop system.
 
A business, for example, could use Apache Flume running on a Web server to collect data from Twitter posts for analysis.

Geospatial Analysis

One trend fueling big data is the increasing volume of geospatial data being generated and collected by IT systems today. A picture may be worth 1,000 words, so it's no surprise the growing number of maps, charts, photographs and other geographic-based content is a major driver of today's big data explosion.
 
Geospatial analysis is a specific form of data visualization (see "V" for visualization) that overlays data on geographical maps to help users better understand the results of big data analysis.

Hadoop

Hadoop is an open-source platform for developing distributed, data-intensive applications. It's controlled by the Apache Software Foundation.
 
Hadoop was created by Yahoo developer Doug Cutting, who based it on Google Labs' MapReduce concept and named it after his infant son's toy elephant.
 
Bonus "H" entries, or HBase, is a non-relational database developed as part of the Hadoop project. The Hadoop Distributed Filesystem (HDFS) is a key component of Hadoop. And, Hive is a data warehouse system built on Hadoop.

In-Memory Database

Computers generally retrieve data from disk drives as they process transactions or perform queries. But, that can be too slow when IT systems are working with big data.
 
In-memory database systems utilize a computer's main memory to store frequently used data, greatly reducing processing times. In-memory database products include SAP HANA and the Oracle Times Ten In-Memory Database.

Java

Java is a programming language developed at Sun Microsystems and released in 1995. Hadoop and a number of other big data technologies were built using Java, and it remains a dominant development technology in the big data world.

Kafka

Kafka is a high-throughput, distributed messaging system originally developed at LinkedIn to manage the service's activity stream (data about a Website's usage) and operational data processing pipeline (about the performance of server components).
 
Kafka is effective for processing large volumes of streaming data -- a key issue in many big data computing environments. Storm, developed by Twitter, is another stream-processing technology that's catching on.
 
The Apache Software Foundation has taken Kafka on as an open-source project. No jokes about buggy software, please ...

Latency

Latency is the delay when data is being delivered from one point to another or the amount of delay for a system, such as an application, to respond to another.
 
While the term isn't new, you're hearing it more often today as data volumes grow and IT systems struggle to keep up. "Low latency" is good; "high latency" is bad.

Map/reduce

Map/reduce is a way of breaking up a complex problem into smaller chunks, distributing them across many computers and then reassembling them into a single answer.
 
Google's search system utilizes map/reduce concepts and the company has a framework with the brand name MapReduce.
 
In 2004, Google released a white paper describing its use of map/reduce. Doug Cutting recognized its potential and developed the first release of Hadoop that also incorporates map/reduce concepts.

NoSQL Databases

Most mainstream databases (such as the Oracle Database and Microsoft SQL Server) are based on a relational architecture and use structured query language (SQL) for development and data management.
 
But a new generation of database systems dubbed "NoSQL" (which some now say stands for "Not only SQL") is based on architectures that proponents argue are better for handling big data.
 
Some NoSQL databases are designed for scalability and flexibility whereas others are more efficient at handling documents and other unstructured data. Examples include Hadoop/HBase, Cassandra, MongoDB and CouchDB, while some big vendors like Oracle have launched their own NoSQL products.

Oozie

Apache Oozie is an open-source workflow engine that's used to help manage processing jobs for Hadoop. Using Oozie, a series of jobs can be defined in multiple languages, such as Pig and MapReduce, and then linked to each other. That allows a programmer to launch a data analysis query once a job to collect data from an operational application has finished, for example.

Pig

Pig, another Apache Software Foundation project, is a platform for analyzing huge data sets. At its core, it's a programming language for developing parallel computation queries that run on Hadoop.

Quantitative Data Analysis

Quantitative data analysis is the use of complex mathematical or statistical modeling to explain financial and business behavior or even predict future behavior.
 
With the exploding volumes of data being collected today, quantitative data analysis has become more complex. But more data also holds the promise of more data analysis opportunities for companies that know how to use it to gain better visibility and insights into their businesses and spot market trends.
 
One problem: There's a serious shortage of people with these kinds of analytical skills. Consulting firm McKinsey says there is a need for 1.5 million additional analysts and managers with big data analysis skills in the U.S.

Relational Database

Relational database management systems, including IBM's DB2, Microsoft's SQL Server and the Oracle Database, are the most widely used type of database today. Most corporate transaction processing systems run on RDBMs, from banking applications to retail point-of-sale systems to inventory management applications.
 
But, some argue that relational databases may be unable to keep up with today's exploding volume and variety of data. RDBMs, for example, were designed with alphanumeric data in mind and aren't as effective when working with unstructured data.

Sharding

As databases become ever larger, they become more difficult to work with. Sharding is a form of database partitioning that breaks a database up into smaller, more easily managed parts. Specifically, a database is partitioned horizontally to separately manage rows in a database table.
 
Sharding allows segments of a huge database to be distributed across multiple servers, improving the overall speed and performance of the database.
 
Bonus "S" entry: Sqoop is an open-source tool for moving data from non-Hadoop sources, such as relational databases, into Hadoop.

Text Analytics

One of the contributors to the big data problem is the increasing amount of text being collected from social media sites like Twitter and Facebook, external news feeds and even within a company for analysis. Because text is unstructured (unlike structured data typically stored in relational databases), mainstream business analytics tools often falter when faced with text.
 
Text analytics uses a range of techniques -- from key word search to statistical analysis to linguistic approaches -- to derive insight from text-based data.

Unstructured Data

Until recent years, most data was structured, the kind of alphanumeric information (such as financial data from sales transactions) that could be easily stored in a relational database and analyzed by business intelligence tools.
 
But, a big chunk of the 2.7 zettabytes of stored data today is unstructured, such as text-based documents, tweets, photos posted on Flickr, videos posted on YouTube and so on. (Fun fact: Thirty-five hours of content are uploaded to YouTube every minute.)
 
Processing, storing and analyzing all that messy unstructured stuff are often challenges for today's IT systems.

Visualization

As the volume of data grows, it becomes increasingly difficult for people to understand it using static charts and graphs. That's led to the development of a new generation of data visualization and analysis tools that present data in new ways to help people make sense of huge amounts of information.
 
These tools include color-coded heat maps, three-dimensional graphs, animated visualizations that show changes over time and geospatial representations that overlay data on geographical maps. Today's advanced data visualization tools are also more interactive, such as allowing a user to zoom in on a data subset for closer inspection.

Whirr

Apache Whirr is a set of libraries for running big data cloud services. More specifically, it speeds up the development of Hadoop clusters on virtual infrastructure such as Amazon EC2 and Rackspace.

XML

Extensible Markup Language is used to transport and store data (not to be confused with HTML, which is used to display data). With XML, programmers can create common data formats and share both the information and the format through the Web.
 
Because XML documents can be very large and complex, they are often seen as contributing to IT organization's big data challenges.

Yottabyte

A yottabyte is a data storage benchmark that's equal to 1,000 zettabytes. The total amount of data stored worldwide is expected to reach 2.7 zettabytes this year, up 48 percent from 2011, according to an IDC calculation. So we're a long way from reaching the yottabyte threshold -- although with the rate of big data growth, it might come sooner than we think.
 
Just to review, a zettabyte is one sextillion bytes of data. It's equal to 1,000 exabytes, 1 million petabytes and 1 billion terabytes.

ZooKeeper

ZooKeeper was created by the Apache Software Foundation to help Hadoop users manage and coordinate Hadoop nodes across a distributed network.
 
Closely integrated with HBase, the database associated with Hadoop, ZooKeeper is a centralized service for maintaining configuration information, naming services, distributed synchronization and other group services. IT managers use it to implement reliable messaging, synchronize process execution and implement redundant services.

Chronology For Federal Government

Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments

Source: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release.pdf (PDF)

Office of Science and Technology Policy

Executive Office of the President
New Executive Office Building
Washington, DC 20502
 
FOR IMMEDIATE RELEASE
March 29, 2012
Contact: Rick Weiss 202 456-6037 rweiss@ostp.eop.gov
Lisa-Joy Zgorski 703 292-8311 lisajoy@nsf.gov

http://whitehouse.gov/ostp

 
Aiming to make the most of the fast-growing volume of digital data, the Obama Administration today announced a “Big Data Research and Development Initiative.” By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.
 
To launch the initiative, six Federal departments and agencies today announced more than $200 million in new commitments that, together, promise to greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.
 
“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security,” said Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy.
 
To make the most of this opportunity, the White House Office of Science and Technology Policy (OSTP)—in concert with several Federal departments and agencies—created the Big Data Research and Development Initiative to:
  • Advance state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data.
  • Harness these technologies to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning; and
  • Expand the workforce needed to develop and use Big Data technologies
 
Today’s initiative responds to recommendations by the President’s Council of Advisors on Science and Technology, which last year concluded that the Federal Government is under-investing in technologies related to Big Data. In response, OSTP launched a Senior Steering Group on Big Data to coordinate and expand the Government’s investments in this critical area. Today’s announcement describes the first wave of agency commitments to support this initiative, including:
 
National Science Foundation and the National Institutes of Health - Core Techniques and Technologies for Advancing Big Data Science & Engineering “Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.
 
In addition to its funding of the Big Data solicitation, NSF is also:
  • Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers;
  • Funding a $10 million project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information - machine learning, cloud computing, and crowd sourcing;
  • Providing the first round of grants to support “EarthCube” – a system that will allow geoscientists to access, analyze and share information about our planet;
  • Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
  • Providing $1.4 million in support for a focused research group of statisticians and biologists to tell us about protein structures and biological pathways.
  • Convening researchers across disciplines to determine how Big Data can transform teaching and learning.
 
Department of Defense – Data to Decisions: The Department of Defense (DoD) is “placing a big bet on big data” investing $250 million annually (with $60 million available for new research projects) across the Military Departments in a series of programs that will:
  • Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own.
  • Improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.
 
To accelerate innovation in Big Data that meets these and other requirements, DoD will  announce a series of open prize competitions over the next several months. In addition, the Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include:
  • Developing scalable algorithms for processing imperfect data in distributed data stores; and
  • Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.
 
The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.
 
National Institutes of Health – 1,000 Genomes Project Data Available on Cloud: The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) computing cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is hosting the 1000 Genomes Project as a publically available data set for free and researchers only will pay for the computing services that they use.
 
Department of Energy – Scientific Discovery Through Advanced Computing: As part of its Scientific Discovery through Advanced Computing program, the Department of Energy will provide $25 million in funding for the Scalable Data Management, Analysis and Visualization Institute. Led by Lawrence Berkeley National Laboratory, the Institute will bring together the expertise of six National Laboratories and seven universities, with the goal of developing new and improved tools to help scientists manage and visualize data. The need for these new tools has grown as the simulations running on the Department of Energy’s supercomputers have increased in size and complexity.
 
US Geological Survey – Big Data for Earth System Science: USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The Center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve our understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.
 
Further details about each department’s or agency’s commitments can be found at the following websites by 2 pm today:
 
DOD: www.DefenseInnovationMarketplace.mil
 
###
OSTP was created by Congress in 1976 to serve as a source of scientific and technological analysis and judgment for the President with respect to major policies, plans, and programs of the Federal Government.

For more information about OSTP, visit http://WhiteHouse.gov/OSTP

Big Data is a Big Deal

Source: http://www.whitehouse.gov/blog/2012/...-data-big-deal

Posted by Tom Kalil on March 29, 2012 at 09:23 AM EST

 
[Editor's Note:  Watch the live webcast today at 2pm ET of the Big Data Research and Development event at http://live.science360.gov/bigdata/]

Today, the Obama Administration is announcing the “Big Data Research and Development Initiative.”  By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning.

To launch the initiative, six Federal departments and agencies will announce more than $200 million in new commitments that, together, promise to greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data. Learn more about ongoing Federal government programs that address the challenges of, and tap the opportunities afforded by, the big data revolution in our Big Data Fact Sheet. MY NOTE: See below

We also want to challenge industry, research universities, and non-profits to join with the Administration to make the most of the opportunities created by Big Data.  Clearly, the government can’t do this on its own.  We need what the President calls an “all hands on deck” effort. 

Some companies are already sponsoring Big Data-related competitions, and providing funding for university research.  Universities are beginning to create new courses—and entire courses of study—to prepare the next generation of “data scientists.”  Organizations like Data Without Borders are helping non-profits by providing pro bono data collection, analysis, and visualization.  OSTP would be very interested in supporting the creation of a forum to highlight new public-private partnerships related to Big Data.

Tom Kalil is Deputy Director for Policy at OSTP

Big Data Across the Federal Government Fact Sheet

Source: http://www.whitehouse.gov/sites/defa...et_final_1.pdf (PDF)

 
March 29, 2012
Here are highlights of ongoing Federal programs that address the challenges of, and tap the opportunities afforded by, the big data revolution to advance agency missions and further scientific discovery and innovation.

DEPARTMENT OF DEFENSE (DOD)

Defense Advanced Research Projects Agency (DARPA)
The Anomaly Detection at Multiple Scales (ADAMS) program addresses the problem of anomaly detection and characterization in massive data sets. In this context, anomalies in data are intended to cue collection of additional, actionable information in a wide variety of real-world contexts. The initial ADAMS application domain is insider threat detection, in which anomalous actions by an individual are detected against a background of routine network activity.
 
The Cyber-Insider Threat (CINDER) program seeks to develop novel approaches to detect activities consistent with cyber espionage in military computer networks. As a means to expose hidden operations, CINDER will apply various models of adversary missions to "normal" activity on internal networks. CINDER also aims to increase the accuracy, rate and speed with which cyber threats are detected.
 
The Insight program addresses key shortfalls in current intelligence, surveillance and reconnaissance systems. Automation and integrated human-machine reasoning enable operators to analyze greater numbers of potential threats ahead of time-sensitive situations. The Insight program aims to develop a resource management system to automatically identify threat networks and irregular warfare operations through the analysis of information from imaging and non-imaging sensors and other sources.
 
The Machine Reading program seeks to realize artificial intelligence applications by developing learning systems that process natural text and insert the resulting semantic representation into a knowledge base rather than relying on expensive and time-consuming current processes for knowledge representation that require expert and associated-knowledge engineers to hand craft information.
 
The Mind's Eye program seeks to develop a capability for “visual intelligence” in machines. Whereas traditional study of machine vision has made progress in recognizing a wide range of objects and their properties—the nouns in the description of a scene—Mind's Eye seeks to add the perceptual and cognitive underpinnings needed for recognizing and reasoning about the verbs in those scenes. Together, these technologies could enable a more complete visual narrative.
 
The Mission-oriented Resilient Clouds program aims to address security challenges inherent in cloud computing by developing technologies to detect, diagnose and respond to attacks, effectively building a “community health system” for the cloud. The program also aims to develop technologies to enable cloud applications and infrastructure to continue functioning while under attack. The loss of individual hosts and tasks within the cloud ensemble would be allowable as long as overall mission effectiveness was preserved.
 
The Programming Computation on Encrypted Data (PROCEED) research effort seeks to overcome a major challenge for information security in cloud-computing environments by developing practical methods and associated modern programming languages for computation on data that remains encrypted the entire time it is in use. Giving users the ability to manipulate encrypted data without first decrypting it would make interception by an adversary more difficult.
 
The Video and Image Retrieval and Analysis Tool (VIRAT) (MY NOTE: Page Not Found) program aims to develop a system to provide military imagery analysts with the capability to exploit the vast amount of overhead video content being collected. If successful, VIRAT will enable analysts to establish alerts for activities and events of interest as they occur. VIRAT also seeks to develop tools that would enable analysts to rapidly retrieve, with high precision and recall, video content from extremely large video libraries.
 
The XDATA program seeks to develop computational techniques and software tools for analyzing large volumes of semi-structured and unstructured data. Central challenges to be addressed include scalable algorithms for processing imperfect data in distributed data stores and effective human-computer interaction tools that are rapidly customizable to facilitate visual reasoning for diverse missions. The program envisions open source software toolkits for flexible software development that enable processing of large volumes of data for use in targeted defense applications.

DEPARTMENT OF HOMELAND SECURITY (DHS)

The Center of Excellence on Visualization and Data Analytics (CVADA), a collaboration among researchers at Rutgers University and Purdue University (with three additional partner universities each) leads research efforts on large, heterogeneous data that First Responders could use to address issues ranging from manmade or natural disasters to terrorist incidents; law enforcement to border security concerns; and explosives to cyber threats.

DEPARTMENT OF ENERGY (DOE)

The Office of Science
The Office of Advanced Scientific Computing Research (ASCR) provides leadership to the data management, visualization and data analytics communities, including digital preservation and community access. Programs within the suite include widely used data management technologies such as the Kepler scientific workflow system; Storage Resource Management standard; a variety of data storage management technologies, such as BeSTman, the Bulk Data Mover and the Adaptable IO System (ADIOS); FastBit data indexing technology (used by Yahoo!); and two major scientific visualization tools, ParaView and VisIt. MY NOTE: 4 of these are the same URL!
 
The High Performance Storage System (HPSS) is software that manages petabytes of data on disks and robotic tape systems. Developed by DoE and IBM with input from universities and labs around the world, HPSS is used in digital libraries, defense applications and a range of scientific disciplines including nanotechnology, genomics, chemistry, magnetic resonance imaging, nuclear physics, computational fluid dynamics, and climate science, as well as Northrop Grumman, NASA and the Library of Congress.
 
Mathematics for Analysis of Petascale Data addresses the mathematical challenges of extracting insight from huge scientific datasets, finding key features and understanding the relationships between those features. Research areas include machine learning, real-time analysis of streaming data, stochastic nonlinear data-reduction techniques and scalable statistical analysis techniques applicable to a broad range of DOE applications including sensor data from the electric grid, cosmology and climate data.
 
The Next Generation Networking program supports tools that enable research collaborations to find, move and use large data from the Globus Middleware Project in 2001, to the GridFTP data transfer protocol in 2003, to the Earth Systems Grid (ESG) in 2007. Today, GridFTP servers move over 1 petabyte of science data per month for the Open Science Grid, ESG, and Biology communities. Globus middleware has also been leveraged by a collaboration of Texas universities, software companies, and oil companies to train students in state-of-the-art petroleum engineering methods and integrated workflows.
The Office of Basic Energy Sciences (BES)
BES Scientific User Facilities have supported a number of efforts aimed at assisting users with data management and analysis of big data, which can be as big as terabytes (1012 bytes) of data per day from a single experiment. For example, the Accelerating Data Acquisition, Reduction and Analysis (ADARA) project addresses the data workflow needs of the Spallation Neutron Source (SNS) data system to provide real-time analysis for experimental control, and the Coherent X-ray Imaging Data Bank has been created to maximize data availability and more efficient use of synchrotron light sources.
 
In October 2011, the Data and Communications in Basic Energy Sciences workshop sponsored by BES and ASCR identified needs in experimental data that could impact the progress of scientific discovery.
 
The Biological and Environmental Research Program (BER) Atmospheric Radiation Measurement (ARM) Climate Research Facility is a multi-platform scientific user facility that provides the international research community infrastructure for obtaining precise observations of key atmospheric phenomena needed for the advancement of atmospheric process understanding and climate models. ARM data are available and used as a resource for over 100 journal articles per year. Challenges associated with collecting and presenting the high temporal resolution and spectral information from hundreds of instruments are being addressed to meet user needs.
 
The Systems Biology Knowledgebase (Kbase) is a community-driven software framework enabling data-driven predictions of microbial, plant and biological community function in an environmental context. Kbase was developed with an open design to improve algorithmic development and deployment efficiency, and to increase access to and integration of experimental data from heterogeneous sources. Kbase is not a typical database, but rather a means to interpret missing information to become a predictive tool for experimental design.
The Office of Fusion Energy Sciences (FES)
The Scientific Discovery through Advanced Computing (SciDAC) partnership between FES and the Office of Advanced Scientific Computing Research (ASCR) addresses big data challenges associated with computational and experimental research in fusion energy science. The data management technologies developed by the ASCR-FES partnerships include high performance input/output systems, advanced scientific workflow and provenance frameworks, and visualization techniques addressing unique fusion needs, which have attracted the attention of European integrated modeling efforts and ITER, an international nuclear fusion research and engineering project.
The Office of High Energy Physics (HEP)
The Computational High Energy Physics Program supports research for the analysis of large, complex experimental data sets as well as large volumes of simulated data—an undertaking that typically requires a global effort by hundreds of scientists. Collaborative big data management ventures include PanDA (Production and Distributed Analysis) Workload Management System and XRootD, a high performance, fault-tolerant software for fast, scalable access to data repositories of many kinds.
The Office of Nuclear Physics (NP)
The US Nuclear Data Program (USNDP) is a multisite effort involving seven national labs and two universities that maintains and provides access to extensive, dedicated databases spanning several areas of nuclear physics, which compile and cross-check all relevant experimental results on important properties of nuclei.
The Office of Scientific and Technical Information (OSTI)
OSTI, the only U.S. federal agency member of DataCite (a global consortium of leading scientific and technical information organizations) plays a key role in shaping the policies and technical implementations of the practice of data citation, which enables efficient reuse and verification of data so that the impact of data may be tracked and a scholarly structure that recognizes and rewards data producers may be established.

DEPARTMENT OF VETERANS ADMINISTRATION (VA)

Consortium for Healthcare Informatics Research (CHIR) develops Natural Language Processing (NLP) tools in order to unlock vast amounts of information that are currently stored in VA as text data.
 
Efforts in the VA are underway to produce transparent, reproducible and reusable software for surveillance of various safety related events through Protecting Warfighters using Algorithms for Text Processing to Capture Health Events (ProWatch), a research-based surveillance program that relies on newly developed informatics resources to detect, track, and measure health conditions associated with military deployment.
 
AViVA is the VA’s next generation employment human resources system that will separate the database from business applications and the browser-based user interface. Analytical tools are already being built upon this foundation for research and, ultimately, support of decisions at the patient encounter.
 
Observational Medical Outcomes Project is designed to compare the validity, feasibility and performance of various safety surveillance analytic methods.
 
Corporate Data Warehouse (CDW) is the VA program to organize and manage data from various sources with delivery to the point of care for a complete view of disease and treatment for individuals and populations.
 
Health Data Repository is standardizing terminology and data format among health care providers, notably between the VA and DOD, allowing the CDW to integrate data.
 
Genomic Information System for Integrated Science (GenISIS) is a program to enhance health care for Veterans through personalized medicine. The GenISIS consortium serves as the contact for clinical studies with access to electronic health records and genetic data in order that clinical trials, genomic trials and outcome studies can be conducted across the VA.
 
The Million Veteran Program is recruiting voluntary contribution of blood samples from veterans for genotyping and genetic sequencing. These genetic samples support the GenISIS consortium and will be attributed to the “phenotype” in the individual veteran’s health record for understanding the genetic disease states.
 
VA Informatics and Computing Infrastructure provides analytical workspace and tools for the analysis of large datasets now available in the VA, promoting collaborative research from anywhere on the VA network.

DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)

Center for Disease Control & Prevention (CDC)
BioSense 2.0 is the first system to take into account the feasibility of regional and national coordination for public health situation awareness through an interoperable network of systems, built on existing state and local capabilities. BioSense 2.0 removes many of the costs associated with monolithic physical architecture, while still making the distributed aspects of the system transparent to end users, as well as making data accessible for appropriate analyses and reporting.
 
CDC’s Special Bacteriology Reference Laboratory (SBRL) identifies and classifies unknown bacterial pathogens for effective, rapid outbreak detection using networked phylogenomics for bacteria and outbreak ID. Phylogenomics, the comparative phylogenetic analysis of the entire genome DNA sequence, will bring the concept of sequence-based identification to an entirely new level in the very near future with profound implications on public health. The development of an SBRL genomic pipeline for new species identification will allow for multiple analyses on a new or rapidly emerging pathogen to be performed in hours, rather than days or weeks.
Center for Medicare & Medicaid Services (CMS)
A data warehouse based on Hadoop is being developed to support analytic and reporting requirements from Medicare and Medicaid programs. A major goal is to develop a supportable, sustainable, and scalable design that accommodates accumulated data at the Warehouse level and complements existing technologies.
 
The use of XML database technologies is being evaluated to support the transactional-intensive environment of the Insurance Exchanges, specifically to support the eligibility and enrollment processes. XML databases potentially can accommodate Big Tables scale data, optimized for transactional performance.
 
CMS has a current set of pilot projects with the Oak Ridge National laboratories that involve the evaluation of data visualization tools, platform technologies, user interface options and high performance computing technologies--aimed at using administrative claims data (Medicare) to create useful information products to guide and support improved decision-making in various CMS high priority programs.
Food & Drug Administration (FDA)
A Virtual Laboratory Environment (VLE) will combine existing resources and capabilities to enable a virtual laboratory data network, advanced analytical and statistical tools and capabilities, crowd sourcing of analytics to predict and promote public health, document management support, tele-presence capability to enable worldwide collaboration, and make any location a virtual laboratory with advanced capabilities in a matter of hours.

NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)

The Cyberinfrastructure for a Billion Electronic Records (CI-BER) is a joint agency sponsored testbed notable for its application of a multi-agency sponsored cyber infrastructure and the National Archives' diverse 87+ million file collection of digital records and information now active at the Renaissance Computing Institute. This testbed will evaluate technologies and approaches to support sustainable access to ultra-large data collections.

NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)

NASA’s Advanced Information Systems Technology (AIST) Awards seek to reduce the risk and cost of evolving NASA information systems to support future Earth observation missions and to transform observations into Earth information as envisioned by NASA’s Climate Centric Architecture. Some AIST programs seek to mature Big Data capabilities to reduce the risk, cost, size and development time of Earth Science Division space-based and ground-based information systems and increase the accessibility and utility of science data.
 
NASA's Earth Science Data and Information System (ESDIS) project, active for over 15 years, has worked to process, archive, and distribute Earth science satellite data and data from airborne and field campaigns. With attention to user satisfaction, it strives to ensure that scientists and the public have access to data to enable the study of Earth from space to advance Earth system science to meet the challenges of climate and environmental change.
 
The Global Earth Observation System of Systems (GEOSS) is a collaborative, international effort to share and integrate Earth observation data. NASA has joined forces with the U.S. Environmental Protection Agency (EPA), National Oceanic and Atmospheric Administration (NOAA), other agencies and nations to integrate satellite- and ground-based monitoring and modeling systems to evaluate environmental conditions and predict outcomes of events such as forest fires, population growth and other developments that are natural and man-made. In the near-term, researchers will integrate a complex variety of air quality information to better understand and address the impact of air quality on the environment and human health.
 
A Space Act Agreement, entered into by NASA and Cray, Inc., allows for collaboration on one or more projects centered on the development and application of low-latency “big data” systems. In particular, the project is testing the utility of hybrid computers systems using a highly integrated non-SQL database as a means for data delivery to accelerate the execution of modeling and analysis software.
 
NASA’s Planetary Data System (PDS) is an archive of data products from NASA planetary missions, which has become a basic resource for scientists around the world. All PDS-produced products are peer-reviewed, well-documented, and easily accessible via a system of online catalogs that are organized by planetary disciplines.
 
The Multimission Archive at the Space Telescope Science Institute (MAST), a component of NASA’s distributed Space Science Data Services, supports and provides to the astronomical community a variety of astronomical data archives, with the primary focus on scientifically related data sets in the optical, ultraviolet, and near-infrared parts of the spectrum. MAST archives and supports several tools to provide access to a variety of spectral and image data.
 
The Earth System Grid Federation is a public archive expected to support the research underlying the International Panel on Climate Change’s Fifth Assessment Report, to be completed in 2014 (as the archive did for the Fourth Assessment Report). NASA is contributing both observational data and model output to the Federation through collaboration with the DOE.

NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)

The Digging into Data Challenge addresses how big data changes the research landscape for the humanities and social sciences, in which new, computationally-based research methods are needed to search, analyze, and understand massive databases of materials such as digitized books and newspapers, and transactional data from web searches, sensors and cell phone records. Under the leadership of NEH, this Challenge is funded by eight U.S. and international organizations in four countries.

NATIONAL INSTITUTES OF HEALTH (NIH)

National Cancer Institute (NCI)
The Cancer Imaging Archive (TCIA) is an image data-sharing service that facilitates open science in the field of medical imaging. TCIA aims to improve the use of imaging in today's cancer research and practice by increasing the efficiency and reproducibility of imaging cancer detection and diagnosis, leveraging imaging to provide an objective assessment of therapeutic response, and ultimately enabling the development of imaging resources that will lead to improved clinical decision support.
 
The Cancer Genome Atlas (TCGA) project is a comprehensive and coordinated effort to accelerate understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. With fast development of large  scale genomic technology, the TCGA project will accumulate several petabytes of raw data by 2014.
National Heart Lung and Blood Institute (NHLBI)
The Cardiovascular Research Grid (CVRG) and the Integrating Data for Analysis, Anonymization and Sharing (iDASH) are two informatics resources supported by NHLBI which provide secure data storage, integration, and analysis resources that enable collaboration while minimizing the burden on users. The CVRG provides resources for the cardiovascular research community to share data and analysis tools. iDASH leads development in privacy-preserving technology and is fostering an integrated data sharing and analysis environment.
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
The Development and Launch of an Interoperable and Curated Nanomaterial Registry, led by the NIBIB institute, seeks to establish a nanomaterial registry, whose primary function is to provide consistent and curated information on the biological and environmental interactions of well-characterized nanomaterials, as well as links to associated publications, modeling tools, computational results and manufacturing guidance. The registry facilitates building standards and consistent information on manufacturing and characterizing nanomaterials, as well as their biological interactions.
 
The Internet-Based Network for Patient-Controlled Medical Image Sharing contract addresses the feasibility of an image-sharing model to test how hospitals, imaging centers and physician practices can implement cross-enterprise document sharing to transmit images and image reports.
 
As a Research Resource for Complex Physiologic Signals, PhysioNet offers free web access to large collections of recorded physiologic signals (PhysioBank) and related open-source software (PhysioToolkit). Each month, about 45,000 visitors worldwide use PhysioNet, retrieving about 4 terabytes of data.
 
The Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) is an NIH blueprint project to promote the dissemination, sharing, adoption and evolution of neuroimaging informatics tools and neuroimaging data by providing access, information and forums for interaction for the research community. Over 450 software tools and data sets are registered on NITRC; the site has had over 30.1 million hits since its launch in 2007.
 
The Extensible Neuroimaging Archive Toolkit (XNAT) is an open source imaging informatics platform, developed by the Neuroinformatics Research Group at Washington University, and widely used by research institutions around the world. XNAT facilitates common management, productivity and quality assurance tasks for imaging and associated data.
 
The Computational Anatomy and Multidimensional Modeling Resource has several components. The Los Angeles Laboratory of Neuro Imaging (LONI MY NOTE: File Not Found) houses databases that contain imaging data from several modalities, mostly various forms of MR and PET, genetics, behavior, demographics and other data. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a good example of a project that collects data from acquisition sites around the U.S., makes data anonymous, quarantines it pending quality control (often done immediately) and makes it available for download to users around the world in a variety of formats.
 
The Computer-Assisted Functional Neurosurgery Database develops methods and techniques to assist in the placement and programming of Deep Brain Stimulators (DBSs) used for the treatment of Parkinson’s disease and other movement disorders. A central database has been developed at Vanderbilt University (VU), which is collaborating with Ohio State and Wake Forest universities to acquire data from multiple sites. Since the clinical workflow and the stereotactic frames at different hospitals can vary, the surgical planning software has been updated and successfully tested.
 
For over a decade the NIH Biomedical Information Science and Technology Initiative (BISTI) Consortium has joined the institutes and centers at NIH to promote the nation’s research in Biomedical Informatics and Computational Biology (BICB), promoted a number of program announcements, and funded more than a billion dollars in research. In addition, the collaboration has promoted activities within NIH such as the adoption of modern data and software sharing practices so that the fruits of research are properly disseminated to the research community.
NIH Blueprint
The Neuroscience Information Framework (NIF) is a dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via any computer connected to the Internet. An initiative of the NIH Blueprint for Neuroscience Research, NIF advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.
 
The NIH Human Connectome Project is an ambitious effort to map the neural pathways that underlie human brain function and to share data about the structural and functional connectivity of the human brain. The project will lead to major advances in our understanding of what makes us uniquely human and will set the stage for future studies of abnormal brain circuits in many neurological and psychiatric disorders.
NIH Common Fund
The National Centers for Biomedical Computing (NCBC) are intended to be part of the national infrastructure in Biomedical Informatics and Computational Biology MY NOTE: Same as previous). The eight centers create innovative software programs and other tools that enable the biomedical community to integrate, analyze, model, simulate, and share data on human health and disease.
 
Patient Reported Outcomes Measurement Information System (PROMIS) is a system of highly reliable, valid, flexible, precise, and responsive assessment tools that measure patient-reported health status. A core resource is the Assessment Center, which provides tools and a database to help researchers collect, store, and analyze data related to patient health status.
National Institute of General Medical Sciences:
The Models of Infectious Disease Agent Study (MIDAS) is an effort to develop computational and analytical approaches for integrating infectious disease information rapidly and providing modeling results to policy makers at the local, state, national, and global levels. While data need to be collected and integrated globally, information must also be fine-grained because public health policies are implemented locally, with needs for data access, management, analysis and archiving.
 
The Structural Genomics Initiative advances the discovery, analysis and dissemination of three-dimensional structures of protein, RNA and other biological macromolecules representing the entire range of structural diversity found in nature to facilitate fundamental understanding and applications in biology, agriculture and medicine. Worldwide efforts include the NIH-funded Protein Structure Initiative, Structural Genomics Centers for Infectious Diseases, Structural Genomics Consortium in Stockholm and the RIKEN Systems and Structural Biology Center in Japan. These efforts coordinate their sequence target selection through a central database, TargetDB, hosted at the Structural Biology Knowledgebase.
 
The WorldWide Protein Data Bank (wwPDB), a repository for the collection, archiving and free distribution of high quality macromolecular structural data to the scientific community on a timely basis, represents the preeminent source of experimentally determined macromolecular structure information for research and teaching in biology, biological chemistry, and medicine. The U.S. component of the project (RCSB PDB) is jointly funded by five Institutes of NIH, DOE/BER and NSF, as well as participants in the UK and Japan. The single databank now contains experimental structures and related annotation for 80,000 macromolecular structures. The Web site receives 211,000 unique visitors per month from 140 different countries. Around 1 terabyte of data are transferred each month from the website.
 
The Biomedical Informatics Research Network (BIRN) (MY NOTE: Not Found), a national initiative to advance biomedical research through data sharing and collaboration, provides a user-driven, software-based framework for research teams to share significant quantities of data rapidly, securely and privately across geographic distance and/or incompatible computing systems, serving diverse research communities.
National Library of Medicine
Informatics for Integrating Biology and the Bedside (i2b2), seeks the creation of tools and approaches that facilitate integration and exchange of the informational by-products of healthcare and biomedical research. Software tools for integrating, mining and representing data that were developed by i2b2 are used at more than 50 organizations worldwide through open source sharing and under open source governance.
Office of Behavioral and Social Sciences (OBSSR)
The National Archive of Computerized Data on Aging (NACDA) program advances research on aging by helping researchers to profit from the under-exploited potential of a broad range of datasets. NACD preserves and makes available the largest library of electronic data on aging in the United States.
 
Data Sharing for Demographic Research (DSDR) provides data archiving, preservation, dissemination and other data infrastructure services. DSDR works toward a unified legal, technical and substantive framework in which to share research data in the population sciences.
Joint NIH - NSF Programs
The Collaborative Research in Computational Neuroscience (CRCNS) is a joint NIH-NSF program to support collaborative research projects between computational scientists and neuroscientists that will advance the understanding of nervous system structure and function, mechanisms underlying nervous system disorders and computational strategies used by the nervous system. In recent years, the German Federal Ministry of Education and Research has also joined the program and supported research in Germany.

NATIONAL SCIENCE FOUNDATION (NSF)

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) is a new joint solicitation between NSF and NIH that aims to advance the core scientific and technological means of managing, analyzing, visualizing and extracting useful information from large, diverse, distributed and heterogeneous data sets. Specifically, it will support the development and evaluation of technologies and tools for data collection and management, data analytics, and/or e-science collaborations, which will enable breakthrough discoveries and innovation in science, engineering, and medicine, laying the foundations for U.S. competitiveness for many decades to come.
 
Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) develops, consolidates, coordinates, and leverages a set of advanced cyberinfrastructure programs and efforts across NSF to create meaningful cyberinfrastructure, as well as a level of integration and interoperability of data and tools to support science and education.
 
NSF has shared with its community plans to establish a new CIF21 track as part of its Integrative Graduate Education and Research Traineeship (IGERT) program. This track aims to educate and support a new generation of researchers able to address fundamental Big Data challenges concerning core techniques and technologies, problems, and cyberinfrastructure across disciplines.
 
Data Citation, which provides transparency and increased opportunities for the use and analysis of data sets, was encouraged in a dear colleague letter initiated by NSF’s Geosciences directorate, demonstrating NSF’s commitment to responsible stewardship and sustainability of data resulting from federally funded research.
 
Data and Software Preservation for Open Science (DASPOS) is a first attempt to establish a formal collaboration of physicists from experiments at the LHC and Fermilab/Tevatron with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure. The intent is to define and execute a compact set of well-defined, entrant-scale activities on which to base a large-scale, long-term program, as well as an index of commonality among various scientific disciplines. 
EarthCube supports the development of community-guided cyberinfrastructure to integrate data into a framework that will expedite the delivery of geoscience knowledge. NSF’s just announced first round of EarthCube awards, made within the CIF21 framework, via the EArly Concept Grants for Exploratory Research (EAGER) mechanism, are the first step in laying the foundation to transform the conduct of research in geosciences.
 
Expeditions in Computing has funded a team of researchers at the University of California, Berkeley to deeply integrate algorithms, machines, and people to address big data research challenges. The combination of fundamental innovations in analytics, new systems infrastructure  that facilitates scalable resources from cloud and cluster computing and crowd sourcing, and human activity and intelligence will provide solutions to problems not solvable by today’s automated data analysis technologies alone.
 
Researchers in a Focused Research Group on stochastic network models are developing a unified theoretical framework for principled statistical approaches to network models with scalable algorithms in order to differentiate knowledge in a network from randomness. Collaborators in biology and mathematics will study relationships between words and phrases in a very large newspaper database in order to provide media analysts with automatic and scalable tools.
 
NSF released a dear colleague letter announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments.
 
Information Integration and Informatics addresses the challenges and scalability problems involved in moving from traditional scientific research data to very large, heterogeneous data, such as the integration of new data types models and representations, as well as issues related to data path, information life cycle management, and new platforms.
 
The Computational and Data-enabled Science and Engineering (CDS&E) in Mathematical and Statistical Sciences (CDS&E-MSS), created by NSF’s Division of Mathematical Sciences (DMS) and the Office of Cyberinfrastructure (OCI), is becoming a distinct discipline encompassing mathematical and statistical foundations and computational algorithms. Proposals in this program are currently being reviewed and new awards will be made in July 2012.
 
Some Research Training Groups (RTG) and Mentoring through Critical Transition Points (MCTP) relate to big data. The RTG project at UC Davis addresses the challenges associated with the analysis of object-data—data that take on many forms including images, functions, graphs, and trees—in a number of fields such as astronomy, computer science, and neuroscience. Undergraduates will be trained in graphical and visualization techniques for complex data, software packages, and computer simulations to assess the validity of models.
 
The development of student sites with big data applications to climate, image reconstruction, networks, cybersecurity and cancer are also underway.
 
The Laser Interferometer Gravitational Wave Observatory (LIGO) detects gravitational waves, previously unobserved form of radiation, which will open a new window on the universe. Processing the deluge of data collected by LIGO is only possible through the use of large computational facilities across the world and the collective work of more than 870 researchers in 77 institutions, as well as the Einstein@Home project.
 
The Open Science Grid (OSG) enables over 8,000 scientists worldwide to collaborate on discoveries, including the search for the Higgs boson. High-speed networks distribute over 15 petabytes of data each year in real-time from the Large Hadron Collider (LHC) at CERN in Switzerland to more than 100 computing facilities. Partnerships of computer and domain scientists and computing facilities in the U.S. provide the advanced fabric of services for data transfer and analysis, job specification and execution, security and administration, shared across disciplines including physics, biology, nanotechnology, and astrophysics.
 
The Theoretical and Computational Astrophysics Networks (TCAN) program seeks to maximize the discovery potential of massive astronomical data sets by advancing the fundamental theoretical and computational approaches needed to interpret those data, uniting researchers in collaborative networks that cross institutional and geographical divides and training the future theoretical and computational scientists.

NATIONAL SECURITY AGENCY (NSA)

Vigilant Net: A Competition to Foster and Test Cyber Defense Situational Awareness at Scale will explore the feasibility of conducting an online contest for developing data visualizations in the defense of massive computer networks, beginning with the identification of best practices in the design and execution of such an event.
 
The Intelligence Community (IC) has identified a set of coordination, outreach and program activities to collaborate with a wide variety of partners throughout the U.S. government, academia and industry, combining Cybersecurity and Big Data and making its perspective accessible to the unclassified science community.
 
The NSA/CSS Commercial Solutions Center (NCSC) hosts vendor capabilities presentations that showcase new commercial technology developments that meet the strategic needs of NSA/CSS and the national security community.

UNITED STATES GEOLOGICAL SURVEY (USGS)

The USGS John Wesley Powell Center for Analysis and Synthesis just announced eight new research projects for transforming big data sets and big ideas about earth science theories into scientific discoveries. At the Center, scientists collaborate to perform state-of-the-art synthesis to leverage comprehensive, long-term data.

NSF Leads Federal Efforts In Big Data

 

Press Release 12-060 
NSF Leads Federal Efforts In Big Data

At White House event, NSF Director announces new Big Data solicitation, $10 million Expeditions in Computing award, and awards in cyberinfrastructure, geosciences, training

Hurricane Ike visualization created by Texas Advanced Computing Center supercomputer Ranger.

Hurricane Ike visualization created by Texas Advanced Computing Center (TACC) supercomputer Ranger.
Credit and Larger Version

March 29, 2012

View the March 29, 2012 webcast of the Federal Government Big Data Rollout.

View video interviews with Farnam Jahanian, assistant director for NSF's Computer and Information Science and Engineering Directorate, Jose Marie Griffiths, vice president for academic affairs at Bryant College, Alan Blatecky, director of NSF's Office of Cyberinfrastructure, and Michael Franklin, professor of computer science at UC Berkeley.

National Science Foundation (NSF) Director Subra Suresh today outlined efforts to build on NSF's legacy in supporting the fundamental science and underlying infrastructure enabling the big data revolution. At an event led by the White House Office of Science and Technology Policy in Washington, D.C., Suresh joined other federal science agency leaders to discuss cross-agency big data plans and announce new areas of research funding across disciplines in this field.

NSF announced new awards under its Cyberinfrastructure for the 21st Century framework and Expeditions in Computing programs, as well as awards that expand statistical approaches to address big data. The agency is also seeking proposals under a Big Data solicitation, in collaboration with the National Institutes of Health (NIH), and anticipates opportunities for cross-disciplinary efforts under its Integrative Graduate Education and Research Traineeship program and an Ideas Lab for researchers in using large datasets to enhance the effectiveness of teaching and learning.

NSF-funded research in these key areas will develop new methods to derive knowledge from data, and to construct new infrastructure to manage, curate and serve data to communities. As part of these efforts, NSF will forge new approaches for associated education and training.

"Data are motivating a profound transformation in the culture and conduct of scientific research in every field of science and engineering," Suresh said. "American scientists must rise to the challenges and seize the opportunities afforded by this new, data-driven revolution. The work we do today will lay the groundwork for new enterprises and fortify the foundations for U.S. competitiveness for decades to come."

NSF released a solicitation, "Core Techniques and Technologies for Advancing Big Data Science & Engineering," or "Big Data," jointly with NIH.  This program aims to extract and use knowledge from collections of large data sets in order to accelerate progress in science and engineering research. Specifically, it will fund research to develop and evaluate new algorithms, statistical methods, technologies, and tools for improved data collection and management, data analytics and e-science collaboration environments.

"The Big Data solicitation creates enormous opportunities for extracting knowledge from large-scale data across all disciplines," said Farnam Jahanian, assistant director for NSF's directorate for computer and information science and engineering. "Foundational research advances in data management, analysis and collaboration will change paradigms of research and education, and promise new approaches to addressing national priorities."

One of NSF's awards announced today includes a $10 million award under the Expeditions in Computing program to researchers at the University of California, Berkeley. The team will integrate algorithms, machines, and people to turn data into knowledge and insight. The objective is to develop new scalable machine-learning algorithms and data management tools that can handle large-scale and heterogeneous datasets, novel datacenter-friendly programming models, and an improved computational infrastructure.

NSF's Cyberinfrastructure Framework for 21st Century Science and Engineering, or "CIF21," is core to strategic efforts. CIF21 will foster the development and implementation of the national cyberinfrastructure for researchers in science and engineering to achieve a democratization of data. In the near term, NSF will provide opportunities and platforms for science research projects to develop the appropriate mechanisms, policies and governance structures to make data available within different research communities. In the longer term, what will result is the integration of ground-up efforts, within a larger-scale national framework, for the sharing of data among disciplines and institutions.

The first round of awards made through an NSF geosciences program called EarthCube, under the CIF21 framework, was also announced today. These awards will support the development of community-guided cyberinfrastructure to integrate big data across geosciences and ultimately change how geosciences research is conducted. Integrating data from disparate locations and sources with eclectic structures and formats that has been stored as well as captured in real time, will expedite the delivery of geoscience knowledge.

"EarthCube is a groundbreaking NSF program," said Tim Killeen, assistant director for NSF's geosciences directorate. "It represents a dynamic new way to access, share and use data of all types to accelerate and transform research for understanding our planet. We are asking experts from all sectors--industry, academia, government and non-U.S. institutions--to form collaborations and tell us what research topics they think are most important. Their enthusiastic and energetic response has resulted in a synergy of exhilarating and novel ideas."

NSF also announced a $1.4 million award for a focused research group that brings together statisticians and biologists to develop network models and automatic, scalable algorithms and tools to determine protein structures and biological pathways.

And, a $2 million award for a research training group in big data will support training for undergraduates, graduates and postdoctoral fellows to use statistical, graphical and visualization techniques for complex data.

"NSF is developing a bold and comprehensive approach for this new data-centric world, from fundamental mathematical, statistical and computational approaches needed to understand the data, to infrastructure at a national and international level needed to support and serve our communities, to policy enabling rapid dissemination and sharing of knowledge," said Ed Seidel, assistant director for NSF's mathematical and physical sciences directorate. "Together, this will accelerate scientific progress, create new possibilities for education, enhance innovation in society and be a driver for job creation. Everyone will benefit from these activities."

In addition, anticipated cross-disciplinary efforts at NSF include encouraging data citation to increase opportunities for the use and analysis of data sets; participation in an Ideas Lab to explore ways to use big data to enhance teaching and learning effectiveness; and the use of NSF's Integrative Graduate Education and Research Traineeship, or IGERT, mechanism to educate and train researchers in data enabled science and engineering.

A full list of NSF data-enabled science and engineering projects follows.

A webcast of the Big Data Rollout may be viewed on the Science360 Web site on Thursday, March 29 at 2 pm ET.

-NSF-

The following is a list of NSF programs in the Big Data space. Hotlinks to programs and contacts are also noted below.

NATIONAL SCIENCE FOUNDATION (NSF)Lisa-Joy Zgorski

Core Techniques and Technologies for Advancing Big Data Science & Engineering (Big Datais a new joint solicitation between NSF and NIH that aims to advance the core scientific and technological means of managing, analyzing, visualizing and extracting useful information from large, diverse, distributed and heterogeneous data sets. Specifically, it will support the development and evaluation of technologies and tools for data collection and management, data analytics, and/or e-science collaborations, which will enable breakthrough discoveries and innovation in science, engineering, and medicine--laying the foundations for U.S. competitiveness for many decades to come. Suzanne Iacono

Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) develops, consolidates, coordinates, and leverages a set of advanced cyberinfrastructure programs and efforts across NSF to create meaningful cyberinfrastructure, as well as develop a level of integration and interoperability of data and tools to support science and education. Alan Blateckyand Mark Suskin

CIF21 Track for IGERT. NSF has shared with its community plans to establish a new CIF21 track as part of its Integrative Graduate Education and Research Traineeship (IGERT) program. This track aims to educate and support a new generation of researchers able to address fundamental Big Data challenges concerning core techniques and technologies, problems, and cyberinfrastructure across disciplines. Mark Suskin and Tom Russell

Data Citation, which provides transparency and increased opportunities for the use and analysis of data sets, was encouraged in a dear colleague letter initiated by NSF's Geosciences directorate, demonstrating NSF's commitment to responsible stewardship and sustainability of data resulting from federally funded research.

Data and Software Preservation for Open Science (DASPOS) is a first attempt to establish a formal collaboration of physicists from experiments at the LHC and Fermilab/Tevatron with experts in digital curation, heterogeneous high-throughput storage systems, large-scale computing systems, and grid access and infrastructure. The intent is to define and execute a compact set of well-defined, entrant-scale activities on which to base a large-scale, long-term program, as well as an index of commonality among various scientific disciplines. Randal RuchtiMarv Goldberg and Saul Gonzalez

Digging into Data Challenge addresses how big data changes the research landscape for the humanities and social sciences, in which new, computationally-based research methods are needed to search, analyze, and understand massive databases of materials such as digitized books and newspapers, and transactional data from web searches, sensors and cell phone records.  Administered by the National Endowment for the Humanities, this Challenge is funded by multiple U.S. and international organizations.  Brett Bobley

EarthCube supports the development of community-guided cyberinfrastructure to integrate data into a framework that will expedite the delivery of geoscience knowledge. NSF's just announced, first round of EarthCube awards, made within the CIF21 framework, via the EArly Concept Grants for Exploratory Research (EAGER) mechanism, are the first step in laying the foundation to transform the conduct of research in geosciences.Clifford Jacobs

Expeditions in Computing has funded a team of researchers at the University of California (UC), Berkeley to deeply integrate algorithms, machines, and people to address big data research challenges. The combination of fundamental innovations in analytics, new systems infrastructure that facilitate scalable resources from cloud and cluster computing and crowd sourcing, and human activity and intelligence will provide solutions to problems not solvable by today's automated data analysis technologies alone. Mitra Basu

Focused Research Group, stochastic network models. Researchers are developing a unified theoretical framework for principled statistical approaches to network models with scalable algorithms in order to differentiate knowledge in a network from randomness. Collaborators in biology and mathematics will study relationships between words and phrases in a very large newspaper database in order to provide media analysts with automatic and scalable tools. Peter Bickel and Haiyan Cai

Ideas Lab. NSF released a dear colleague letter announcing an Ideas Lab, for which cross disciplinary participation will be solicited, to generate transformative ideas for using large datasets to enhance the effectiveness of teaching and learning environments. Doris Carver

Information Integration and Informatics addresses the challenges and scalability problems involved in moving from traditional scientific research data to very large, heterogeneous data, such as the integration of new data types models and representations, as well as issues related to data path, information life cycle management, and new platforms. Sylvia Spengler

The Computational and Data-enabled Science and Engineering (CDS&E) in Mathematical and Statistical Sciences (CDS&E-MSS), created by NSF's Division of Mathematical Sciences (DMS) and the Office of Cyberinfrastructure (OCI), is becoming a distinct discipline encompassing mathematical and statistical foundations and computational algorithms. Proposals in this program are currently being reviewed and new awards will be made in July 2012.  Jia Li

Some Research Training Groups (RTG) and Mentoring through Critical Transition Points (MCTP) relate to big data. The RTG project at the UC Davis addresses the challenges associated with the analysis of object-data--data that take on many forms including images, functions, graphs and trees--in a number of fields such as astronomy, computer science, and neuroscience. Undergraduates will be trained in graphical and visualization techniques for complex data, software packages, and computer simulations to assess the validity of models. The development of student sites with big data applications to climate, image reconstruction, networks, cybersecurity and cancer are also underway. Nandini Kannan

The Laser Interferometer Gravitational Wave Observatory (LIGO) detects gravitational waves, previously unobserved form of radiation, which will open a new window on the universe. Processing the deluge of data collected by LIGO is only possible through the use of large computational facilities across the world and the collective work of more than 870 researchers in 77 institutions, as well as the project. Pedro Marronetti and Tom Carruthers

The Open Science Grid (OSG) enables over 8,000 scientists worldwide to collaborate on discoveries, including the search for the Higgs boson. High-speed networks distribute over 15 petabytes of data each year in real-time from the Large Hadron Collider (LHC) at CERN in Switzerland to more than 100 computing facilities. Partnerships of computer and domain scientists and computing facilities in the U.S. provide the advanced fabric of services for data transfer and analysis, job specification and execution, security and administration, shared across disciplines including physics, biology, nanotechnology, and astrophysics. Marv Goldberg and Saul Gonzalez

The Theoretical and Computational Astrophysics Networks (TCAN) program seeks to maximize the discovery potential of massive astronomical data sets by advancing the fundamental theoretical and computational approaches needed to interpret those data, uniting researchers in collaborative networks that cross institutional and geographical divides and training the future theoretical and computational scientists. Tom Statler and Linda Sparke

Media Contacts
Lisa-Joy Zgorski, NSF (703) 292-8311 lisajoy@nsf.gov

Program Contacts
C. Suzanne Iacono, NSF (703) 292-8900 siacono@nsf.gov

Related Websites
Obama Administration Unveils “Big Data” Initiative:http://www.whitehouse.gov/administration/eop/ostp
Big Data Solicitation:http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504767
Data Citation in the Geosciences Dear Colleague Letter:http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12058
Dear Colleague Letter about IGERT CIF21 Track:http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12059
Dear Colleague Letter announcing Ideas Lab:http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12060

The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2012, its budget is $7.0 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives over 50,000 competitive requests for funding, and makes about 11,000 new funding awards. NSF also awards nearly $420 million in professional and service contracts yearly.

 Get News Updates by Email 

Useful NSF Web Sites:
NSF Home Page: http://www.nsf.gov
NSF News: http://www.nsf.gov/news/
For the News Media: http://www.nsf.gov/news/newsroom.jsp
Science and Engineering Statistics:http://www.nsf.gov/statistics/
Awards Searches: http://www.nsf.gov/awardsearch/

Image of NSF Director Dr. Subra Suresh speaking at the Big Data event.
 View Video
Broadcast of OSTP-led Federal Government Big Data Rollout, March 29, 2012, Washington, DC.
Credit and Larger Version

Screenshot of Farnam Jahanian.
 View Video
Farnam Jahanian serves as assistant director for NSF.
Credit and Larger Version

Screenshot of Jose Marie Griffiths.
 View Video
Jose Marie Griffiths, of Bryant College, has since 2006 served on the U.S. National Science Board.
Credit and Larger Version

Screenshot of Alan Blatecky.
 View Video
Alan Blatecky has served as director of NSF's Office of Cyberinfrastructure since 2010.
Credit and Larger Version

Screenshot of Michael Franklin.
 View Video
Michael Franklin of the University of California, Berkeley.
Credit and Larger Version

Photo of UC Irvine's HIperWall system measuring 23 x 9 feet with 50 flat-panel tiles.
UC Irvine's HIPerWall system advances earth science modeling and visualization for research.
Credit and Larger Version

 

Core Techniques and Technologies for Advancing Big Data Science & Engineering  (BIGDATA)

Source:  http://www.nsf.gov/funding/pgm_summ....pims_id=504767

CONTACTS

Name Email Phone Room
Vasant  G. Honavar vhonavar@nsf.gov (703) 292-7129   
Jia  Li jli@nsf.gov (703) 292-4870   
Dane  Skow dskow@nsf.gov (703) 292-4551   
Peter  H. McCartney pmccartn@nsf.gov (703) 292-8470   
Doris  L. Carver dcarver@nsf.gov (703) 292-5038   
Eduardo  A. Misawa emisawa@nsf.gov (703) 292-5353   
Eva  Zanzerkia ezanzerk@nsf.gov (703) 292-8556   
Peter  Muhlberger pmuhlber@nsf.gov (703) 292-7848   
Vladimir  Papitashvili vpapita@nsf.gov (703) 292-7425   
Tandy  Warnow twarnow@nsf.gov (703) 292-8491   
General Correspondence email

For general correspondence, please reply to bigdata@nsf.gov.

PROGRAM GUIDELINES

Solicitation  12-499

Important Notice to Proposers

A revised version of the NSF Proposal & Award Policies & Procedures Guide (PAPPG), NSF 13-1, was issued on October 4, 2012 and is effective for proposals submitted, or due, on or after January 14, 2013. Please be advised that, depending on the specified due date, the guidelines contained in NSF 13-1 may apply to proposals submitted in response to this funding opportunity.

Please be aware that significant changes have been made to the PAPPG to implement revised merit review criteria based on the National Science Board (NSB) report, National Science Foundation's Merit Review Criteria: Review and Revisions. While the two merit review criteria remain unchanged (Intellectual Merit and Broader Impacts), guidance has been provided to clarify and improve the function of the criteria. Changes will affect the project summary and project description sections of proposals. Annual and final reports also will be affected.

A by-chapter summary of this and other significant changes is provided at the beginning of both the Grant Proposal Guide and the Award & Administration Guide.

SYNOPSIS

The Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) solicitation aims to advance the core scientific and technological means of managing,  analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life. The new knowledge, tools, practices, and infrastructures produced will enable breakthrough discoveries and innovation in science, engineering, medicine, commerce, education, and national security -- laying the foundations for US competitiveness for many decades to come. 

The phrase "big data" in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.

This solicitation is one component in a long-term strategy to address national big data challenges, which include advances in core techniques and technologies; big data infrastructure projects in various science, biomedical research, health and engineering communities; education and workforce development; and a comprehensive integrative program to support collaborations of multi-disciplinary teams and communities to make advances in the complex grand challenge science, biomedical research, and engineering problems of a computational- and data-intensive world.

Today, US government agencies recognize that the scientific, biomedical and engineering research communities are undergoing a profound transformation with the use of large-scale, diverse, and high-resolution data sets that allow for data-intensive decision-making, including clinical decision making, at a level never before imagined.  New statistical and mathematical algorithms, prediction techniques, and modeling methods, as well as multidisciplinary approaches to data collection, data analysis and new technologies for sharing data and information are enabling a paradigm shift in scientific and biomedical investigation. Advances in machine learning, data mining, and visualization are enabling new ways of extracting useful information in a timely fashion from massive data sets, which complement and extend existing methods of hypothesis testing and statistical inference. As a result, a number of agencies are developing big data strategies to align with their missions. This solicitation focuses on common interests in big data research across the National Institutes of Health (NIH) and the National Science Foundation (NSF). 

This initiative will build new capabilities to create actionable information that leads to timely and more informed decisions.  It will both help to accelerate discovery and innovation, as well as support their transition into practice to benefit society.  As the recent President's Council of Advisors on Science and Technology (PCAST) 2010 review of the Networking Information Technology Research and Development (NITRD) [http://www.nitrd.gov/pcast-2010/report/nitrd-program/pcast-nitrd-report-2010.pdf] program notes, the pipeline of data to knowledge to action has tremendous potential in transforming all areas of national priority. This initiative will also lay the foundations for complementary big data activities -- big data infrastructure projects, workforce development, and progress in addressing complex, multi-disciplinary grand challenge problems in science and engineering.

RELATED URLS

Frequently Asked Questions (FAQs) 

1st BIGDATA Webinar (Presentation, Audio File and Transcript) 

2nd BIGDATA Webinar - May 21, 2012 (Presentation, Audio File and Transcript) 

THIS PROGRAM IS PART OF

Additional Funding Opportunities for the CCF Community

Additional Funding Opportunities for the CNS Community

Additional Funding Opportunities for the IIS Community

Special Research Programs

What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)

Map of Recent Awards Made Through This Program

 

Frequently Asked Questions

Source: http://www.nsf.gov/funding/pgm_summ....pims_id=504767

NSF 12-070

 Solicitation NSF 12-499, Core Techniques and Technologies for Advancing Big Data Science and Engineering (BIGDATA)

line

  1. Is my proposal a good fit for the Big Data solicitation?

The BIGDATA solicitation aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets needed to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life. The focus is on core scientific and technological advances (e.g., in computing and information sciences, mathematics and statistics). Proposals that focus primarily on application of existing methods (e.g., machine learning algorithms, statistical analysis) to data sets in a specific science domain or on implementation of tools based on existing techniques are not appropriate for this solicitation.

  1. Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?

No.

  1. Should I submit a "mid scale" or a "small" proposal?

A project with one or two investigators and up to three years of effort is likely to be appropriate as a "small proposal" whereas a proposal with three or more investigators and up to five years of effort is likely to be appropriate as a "mid scale" proposal. However, the type of proposal should be chosen based on the scope and the size of the effort needed.

  1. How do I submit a proposal to this program?

Please carefully read and follow the instructions provided in (i) the solicitation itself (http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm) and (ii) the NSF Proposal and Award Policies and Procedures Guide, Part I: Grant Proposal Guide (GPG) available at (http://www.nsf.gov/publications/pub_summ.jsp?ods_key=gpg) If you need additional help preparing and submitting your proposal, we recommend that you contact your institution's Sponsored Projects Office.

  1. Do I need to use Grants.gov or Fastlane to apply?

You may use either Grants.gov or Fastlane.

  1. Is my project likely to get funded?

If your proposal fulfills the criteria in FAQ # 1 above, then you are encouraged to apply to the program for funding. The proposal will be reviewed using the NSF merit review criteria by panelists or reviewers with expertise in the topics covered in your proposal. Program officers cannot provide proposers with further advice regarding the likelihood that a specific proposal would receive funding.

  1. Can I obtain a postdoctoral fellowship through the BIGDATA program?

A BIGDATA research proposal may request funding for a postdoctoral fellow as part of the project. However, the program does not accept applications for individual postdoctoral traineeships.

  1. Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?

NSF does not normally support research or education activities by scientists, engineers or educators employed by Federal agencies or Federally Funded Research and Development Centers (FFRDCs). A scientist, engineer or educator who has a joint appointment with a university and a Federal agency (such as a Veterans Administration Hospital, or with a university and a FFRDC) may submit proposals through the university and may receive support if he/she is a bona fide faculty member of the university, although part of his/her salary may be provided by the Federal agency (Seehttp://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_1.jsp). Furthermore, scientists, engineers, or educators employed by FFRDCs can be sub-awardees on a project led by an entity, e.g., a university, that is eligible to apply for grants from NSF. Such a sub-award typically does not provide funds for salary, but can provide funds for travel to work with their collaborators on the project or for students to work on the project as interns in FFRDC labs.

  1. Can for-profit entities apply for funding through this solicitation?

US commercial organizations, especially small businesses with strong capabilities in scientific or engineering research or education can submit proposals in response to this solicitation. NSF is interested in supporting projects that couple industrial research resources and perspectives with those of universities; therefore, it especially welcomes proposals for cooperative projects involving both universities and the private commercial sector (http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_1.jsp).
  1. What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?

Plans for data management and sharing of the products of research is a required element of the proposal: The data management plan should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (Seehttp://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp). Proposers should note that the NSF data sharing policy requires investigators to share data gathered under an NSF grant with other researchers "within a reasonable time" after the data are generated. The policy also recognizes that investigators and their employers have a legitimate interest in protecting rights to inventions that are developed under an NSF grant. Details about NSFs Intellectual Property policy can be found athttp://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID. The degree to which the proposed data management plan demonstrates the intellectual merit and broader impacts will be considered by the review panel as part of the standard NSF review criteria (http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_3.jsp).
  1. Can a foreign organization submit a proposal?

NSF rarely provides support to foreign organizations. NSF will consider proposals for cooperative projects involving US and foreign organizations, provided support is requested only for the US portion of the collaborative effort. NIH does not accept proposals from foreign organizations but does allow subcontracts to foreign organizations, so long as there is a demonstrated need.

  1. How do I know if my request for funding is relevant to NIH?

Applicants are encouraged to view the NIH-specific announcement (http://grants.nih.gov/grants/guide/notice-files/NOT-GM-12-109.html), which indicates the Institutes and Centers (ICs) that have signed on to this Initiative. Only applications for funding that fall within the regular missions of those ICs will be considered. Applicants are encouraged to visit the IC web sites and contact program staff with specific questions about their portfolios. All reviews will be conducted at NSF and applications for funding at NIH will then be considered in the September and January cycles of Councils.

  1. Are duplicate submissions allowed?

No. Proposals submitted in response to this solicitation may not duplicate or be substantially similar to other proposals concurrently under consideration by NSF, NIH, or other agencies' programs or study sections.

  1. Will there be future BIGDATA solicitations?

This solicitation is one component in a long-term strategy to address national big data challenges, which include advances in core techniques and technologies; big data infrastructure projects in various science, biomedical research, health and engineering communities; education and workforce development; and a comprehensive integrative program to support collaborations of multi-disciplinary teams and communities to make advances in the complex grand challenge science, biomedical research, and engineering problems posed by an increasingly computation - and data-intensive world.

 

Event 2nd BIGDATA Webinar

Source: http://www.nsf.gov/events/event_summ...24212&org=CISE

May 21, 2012 NSF

The National Science Foundation and the National Institutes of Health invite you to attend a webinar to learn more about their joint Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) solicitation -- NSF 12-499:http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm 

The BIGDATA solicitation aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life.

The phrase "big data" in this solicitation does not refer just to the volume of data, but also to its variety and velocity.  Big data includes large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources. 

The focus is on core scientific and technological advances (e.g., in computer science, mathematics, computational science and statistics). Proposals that focus primarily on the application of existing methods (e.g., machine learning algorithms, statistical analysis) to data sets in a specific science domain or on implementation of software tools or databases based on existing techniques are not appropriate for this solicitation. 

NIH-specific information can be found at: http://grants.nih.gov/grants/guide/n...GM-12-109.html.

An FAQ about the solicitation is available at:http://www.nsf.gov/pubs/2012/nsf12070/nsf12070.jsp

This webinar is designed to describe the goals and focus of the BIGDATA solicitation, help investigators understand its scope, and answer any questions potential Principal Investigators (PIs) may have.

MY NOTE: See Presentation Below

What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)

Source: http://www.nsf.gov/awardsearch/progS...Search#results

Downloaded Spreadsheet (XLS)

Requested Export All Results in XML Format

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499

Source: http://www.nsf.gov/attachments/12405...ay8with508.pdf (PDF)

Title Slide

NSF12-499Slide1.png

Big Data Research and Development Initiative

NSF12-499Slide2.png

The Big Data Team

NSF12-499Slide3.png

Data Deluge 1

NSF12-499Slide5.png

Dealing with Data

NSF12-499Slide6.png

Data Deluge 2

NSF12-499Slide7.png

Opportunities

NSF12-499Slide8.png

Examples of Research Challenges

NSF12-499Slide10.png

BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision

NSF12-499Slide11.png

BIGDATA Solicitation

NSF12-499Slide13.png

Data management, collection and storage (DCM)

NSF12-499Slide14.png

Data Analytics (DA)

NSF12-499Slide15.png

E-science collaboration environments (ESCE)

NSF12-499Slide16.png

NIH BIGDATA Priorities

NSF12-499Slide17.png

National Priorities

NSF12-499Slide18.png

What proposals are not good fits for the BIGDATA Solicitation? 1

NSF12-499Slide19.png

What proposals are not good fits for the BIGDATA Solicitation? 2

NSF12-499Slide20.png

Proposal Submission and Review

NSF12-499Slide21.png

Review Criterion: Intellectual Merit 1

NSF12-499Slide22.png

Review Criterion: Intellectual Merit 2

NSF12-499Slide23.png

Review Criterion – Capacity Building (CB)

NSF12-499Slide24.png

Evaluation Plan

NSF12-499Slide25.png

Data Sharing Plan

NSF12-499Slide26.png

Software Sharing Plan

NSF12-499Slide27.png

Mid-scale proposals: Coordination plan (CP)

NSF12-499Slide28.png

NEXT

Page statistics
12465 view(s) and 75 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments