Table of contents
- Big Data Senior Steering Group Meeting
- Government Challenges With Big Data: A Semantic Web Strategy for Big Data
- Title Slide
- Outline
- Data Science Team
- Semantic Community: Mission Statement for 2013
- Why We Are Here
- NIST Cloud Computing AND Big Data Forum and Workshop
- Spotfire for Big Data Analytics: Microscope
- Data Science Analytics Library: Telescope & Library
- From the Year of Big Data to the Year of the Data Scientist Working With Big Data
- Cross-Walk Table (in progress)
- The Practice of Data Science
- Current US Government Semantic Web Strategy
- Comment From Owen Ambur
- International Linked Open Data Strategy: Linked Open Data Cloud Data
- International Linked Open Data: Comments to David Wood
- International Linked Open Data: My EPA Green App Data App Example
- Our Semantic Web Strategy for Big Data: Previous Presentations
- Our Semantic Web Strategy for Data: Simple Explanation
- Our Semantic Web Strategy for Data: NASA Big Data Example
- Our Semantic Web Strategy for Data: Spotfire Network Analytics
- My 5-Step Method
- Get to 5-Stars With Open Data
- System of Systems Architecture
- Data Federation in Spotfire: In-Memory and In-Database Data
- Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library
- Data Federation in Spotfire: Data Panel
- Data Federation in Spotfire: Information Designer
- 15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
- Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
- Q & A
- Story
- Spotfire Dashboard
- Slides
- Upcoming
- Previous
- Research Notes
- Summary
- Cross-Walk Table
- ELC Track Three: Big Data Bold Horizons
- Big Data At the Hill
- Big Data Case Studies High Level Summary
- Demystifying Big Data — A Practical Guide to Transforming the Business of Government
- NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
- Big Data Exchange Meeting, February 26, 2013
- The Big Data Challenge
- Welcome to the NITRD Big Data Challenge Series!
- Big Data Buzzwords From A to Z
- Chronology For Federal Government
- Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
- Big Data is a Big Deal
- Big Data Across the Federal Government Fact Sheet
- DEPARTMENT OF DEFENSE (DOD)
- DEPARTMENT OF HOMELAND SECURITY (DHS)
- DEPARTMENT OF ENERGY (DOE)
- DEPARTMENT OF VETERANS ADMINISTRATION (VA)
- DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
- NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
- NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
- NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
- NATIONAL INSTITUTES OF HEALTH (NIH)
- National Cancer Institute (NCI)
- National Heart Lung and Blood Institute (NHLBI)
- National Institute of Biomedical Imaging and Bioengineering (NIBIB)
- NIH Blueprint
- NIH Common Fund
- National Institute of General Medical Sciences:
- National Library of Medicine
- Office of Behavioral and Social Sciences (OBSSR)
- Joint NIH - NSF Programs
- NATIONAL SCIENCE FOUNDATION (NSF)
- NATIONAL SECURITY AGENCY (NSA)
- UNITED STATES GEOLOGICAL SURVEY (USGS)
- NSF Leads Federal Efforts In Big Data
- Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)
- Frequently Asked Questions
- Is my proposal a good fit for the Big Data solicitation?
- Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?
- Should I submit a "mid scale" or a "small" proposal?
- How do I submit a proposal to this program?
- Do I need to use Grants.gov or Fastlane to apply?
- Is my project likely to get funded?
- Can I obtain a postdoctoral fellowship through the BIGDATA program?
- Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?
- Can for-profit entities apply for funding through this solicitation?
- What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?
- Can a foreign organization submit a proposal?
- How do I know if my request for funding is relevant to NIH?
- Are duplicate submissions allowed?
- Will there be future BIGDATA solicitations?
- Event 2nd BIGDATA Webinar
- What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
- Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
- Title Slide
- Big Data Research and Development Initiative
- The Big Data Team
- Outline
- Data Deluge 1
- Dealing with Data
- Data Deluge 2
- Opportunities
- Dealing with Data
- Examples of Research Challenges
- BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision
- BIGDATA Solicitation in Context
- BIGDATA Solicitation
- Data management, collection and storage (DCM)
- Data Analytics (DA)
- E-science collaboration environments (ESCE)
- NIH BIGDATA Priorities
- National Priorities
- What proposals are not good fits for the BIGDATA Solicitation? 1
- What proposals are not good fits for the BIGDATA Solicitation? 2
- Proposal Submission and Review
- Review Criterion: Intellectual Merit 1
- Review Criterion: Intellectual Merit 2
- Review Criterion – Capacity Building (CB)
- Evaluation Plan
- Data Sharing Plan
- Software Sharing Plan
- Mid-scale proposals: Coordination plan (CP)
- Proposal Types and Deadlines
- How many awards are anticipated?
- How does one apply?
- Questions and Answers
- Credits
- NEXT
- Big Data Senior Steering Group Meeting
- Government Challenges With Big Data: A Semantic Web Strategy for Big Data
- Title Slide
- Outline
- Data Science Team
- Semantic Community: Mission Statement for 2013
- Why We Are Here
- NIST Cloud Computing AND Big Data Forum and Workshop
- Spotfire for Big Data Analytics: Microscope
- Data Science Analytics Library: Telescope & Library
- From the Year of Big Data to the Year of the Data Scientist Working With Big Data
- Cross-Walk Table (in progress)
- The Practice of Data Science
- Current US Government Semantic Web Strategy
- Comment From Owen Ambur
- International Linked Open Data Strategy: Linked Open Data Cloud Data
- International Linked Open Data: Comments to David Wood
- International Linked Open Data: My EPA Green App Data App Example
- Our Semantic Web Strategy for Big Data: Previous Presentations
- Our Semantic Web Strategy for Data: Simple Explanation
- Our Semantic Web Strategy for Data: NASA Big Data Example
- Our Semantic Web Strategy for Data: Spotfire Network Analytics
- My 5-Step Method
- Get to 5-Stars With Open Data
- System of Systems Architecture
- Data Federation in Spotfire: In-Memory and In-Database Data
- Data Federation in Spotfire: Database Connections, Information Links, & Analytics Library
- Data Federation in Spotfire: Data Panel
- Data Federation in Spotfire: Information Designer
- 15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
- Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
- Q & A
- Story
- Spotfire Dashboard
- Slides
- Upcoming
- Previous
- Research Notes
- Summary
- Cross-Walk Table
- ELC Track Three: Big Data Bold Horizons
- Big Data At the Hill
- Big Data Case Studies High Level Summary
- Demystifying Big Data — A Practical Guide to Transforming the Business of Government
- NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
- Big Data Exchange Meeting, February 26, 2013
- The Big Data Challenge
- Welcome to the NITRD Big Data Challenge Series!
- Big Data Buzzwords From A to Z
- Chronology For Federal Government
- Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
- Big Data is a Big Deal
- Big Data Across the Federal Government Fact Sheet
- DEPARTMENT OF DEFENSE (DOD)
- DEPARTMENT OF HOMELAND SECURITY (DHS)
- DEPARTMENT OF ENERGY (DOE)
- DEPARTMENT OF VETERANS ADMINISTRATION (VA)
- DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
- NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
- NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
- NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
- NATIONAL INSTITUTES OF HEALTH (NIH)
- National Cancer Institute (NCI)
- National Heart Lung and Blood Institute (NHLBI)
- National Institute of Biomedical Imaging and Bioengineering (NIBIB)
- NIH Blueprint
- NIH Common Fund
- National Institute of General Medical Sciences:
- National Library of Medicine
- Office of Behavioral and Social Sciences (OBSSR)
- Joint NIH - NSF Programs
- NATIONAL SCIENCE FOUNDATION (NSF)
- NATIONAL SECURITY AGENCY (NSA)
- UNITED STATES GEOLOGICAL SURVEY (USGS)
- NSF Leads Federal Efforts In Big Data
- Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)
- Frequently Asked Questions
- Is my proposal a good fit for the Big Data solicitation?
- Should every proposal submitted in response to the BIG DATA solicitation address an application of interest to NIH?
- Should I submit a "mid scale" or a "small" proposal?
- How do I submit a proposal to this program?
- Do I need to use Grants.gov or Fastlane to apply?
- Is my project likely to get funded?
- Can I obtain a postdoctoral fellowship through the BIGDATA program?
- Can employees of Federal Agencies or Federally Funded Research and Development Centers submit proposals in response to this solicitation?
- Can for-profit entities apply for funding through this solicitation?
- What are the "intellectual property" implications for a for-profit entity that submits a proposal in response to this solicitation?
- Can a foreign organization submit a proposal?
- How do I know if my request for funding is relevant to NIH?
- Are duplicate submissions allowed?
- Will there be future BIGDATA solicitations?
- Event 2nd BIGDATA Webinar
- What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
- Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
- Title Slide
- Big Data Research and Development Initiative
- The Big Data Team
- Outline
- Data Deluge 1
- Dealing with Data
- Data Deluge 2
- Opportunities
- Dealing with Data
- Examples of Research Challenges
- BIG DATA Initiative in Context: NSF Cyber-infrastructure for 21st Century (CIF21) Vision
- BIGDATA Solicitation in Context
- BIGDATA Solicitation
- Data management, collection and storage (DCM)
- Data Analytics (DA)
- E-science collaboration environments (ESCE)
- NIH BIGDATA Priorities
- National Priorities
- What proposals are not good fits for the BIGDATA Solicitation? 1
- What proposals are not good fits for the BIGDATA Solicitation? 2
- Proposal Submission and Review
- Review Criterion: Intellectual Merit 1
- Review Criterion: Intellectual Merit 2
- Review Criterion – Capacity Building (CB)
- Evaluation Plan
- Data Sharing Plan
- Software Sharing Plan
- Mid-scale proposals: Coordination plan (CP)
- Proposal Types and Deadlines
- How many awards are anticipated?
- How does one apply?
- Questions and Answers
- Credits
- NEXT
Big Data Senior Steering Group Meeting
January 24, 2013
Qinetiq-NA, 4100 North Fairfax Ave, Arlington, VA Suite 800
10AM – 12PM ET
Call in: 1-866-773-0704 Code: 5288814#
WebEx
Agenda
Handouts:
Agenda
January 10 Meeting Notes are on the Wiki
Presentation Slides as made available
OMB/OST NITRD Priorities for 2012
1. Introduction of Brand Niemann, “Government Challenges With Big Data: A Semantic Web Strategy for Big Data” Slides Slides
a. Presentation
b. Q&A
2. Introduction of Celia Merzbacher, SRC (Semiconductor Research Corporation) Slides
a. Presentation
b. Q&A
3. Other Business:
a. OMB/OSTB NITRD Priorities for 2013 – input request
b. Intra-NITRD Collaboration ideas/needs
4. Next meeting February 14, 2013, 10AM ET: guest speakers: Peter Lyster and Allen Dearry will present on the “NIH Big Data Initiative
Government Challenges With Big Data: A Semantic Web Strategy for Big Data
Source:Slides
Semantic Community: Mission Statement for 2013
NIST Cloud Computing AND Big Data Forum and Workshop
Spotfire for Big Data Analytics: Microscope
Data Science Analytics Library: Telescope & Library
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public
From the Year of Big Data to the Year of the Data Scientist Working With Big Data
Cross-Walk Table (in progress)
International Linked Open Data Strategy: Linked Open Data Cloud Data
http://semanticommunity.info/@api/deki/files/8824/=VIVO.xlsx
International Linked Open Data: Comments to David Wood
http://manning.com/dwood/LinkedData_MEAP_ch1.pdf
http://semanticommunity.info/AOL_Government/Exploiting_Linked_Data_with_BI_Tools
International Linked Open Data: My EPA Green App Data App Example
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?EPAGreenAppsDataApp-Spotfire
Our Semantic Web Strategy for Big Data: Previous Presentations
Our Semantic Web Strategy for Data: Simple Explanation
http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline
Our Semantic Web Strategy for Data: NASA Big Data Example
http://semanticommunity.info/@api/deki/files/20313/NASABigData.xls
Our Semantic Web Strategy for Data: Spotfire Network Analytics
Data Federation in Spotfire: In-Memory and In-Database Data
http://semanticommunity.info/A_Spotfire_Gallery/Users_Guide#Working_With_Large_Data_Volumes
Data Federation in Spotfire: Information Designer
15th SOA, Shared Services, and Big Data Analytics Conference (DRAFT)
Comments: Semantic Medline, Noblis, Cray, and ORBIS Technologies
Story
From the Year of Big Data to the Year of the Data Scientist Working With Big Data
As the year of Big Data comes to a close and the Year of the Data Scientist Working With Big Data begins, it is useful to assess what has been accomplished in the Federal Government and what should be accomplished this next year.
Of course the highlights this past year were the announcements:
- Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
- NSF Leads Federal Efforts In Big Data
To aid readers that are new to big data, I created a more detailed Chronology of Big Data in the Federal Government and included a glossary of Big Data Buzzwords From A to Z.
The White House Office of Science and Technology Policy (OSTP) launched the Big Data Initiative through its Networking Information Technology Research and Development (NITRD) Office, led by Dr. George Strawn, and its Big Data Senior Steering Group, Co-Chaired by Suzi Iacono, NSF, and Karin Remington, NIH.
At the Big Data and the Government Enterprise Conference, Dr. Strawn said that if we do not produce big data results of value for government business and science soon, then we will be on to something deemed more important next year. In support of that the NITRD Big Data Challenge Series was launched. The MeriTalk organization launched a discussion forum in support of The Big Data Challenge that will have a Big Data Exchange Meeting in February. MeriTalk also did a survey of the state of big data (The Big Data Gap) that concluded: Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big Data. Government cloud computing leaders have realized there is a need to have something more than just email and collaboration tools in the cloud, like big data, and will discuss that at an upcoming conference (NIST, January 15-17, 2013).
In my role as Knowledge Capture Chair for the ACT-IAC Big Data Committee, I have used the following resources:
- Big Data Case Studies High Level Summary from the TechAmerica Report "Demystifying Big Data — A Practical Guide to Transforming the Business of Government";
- Big Data At the Hill from my report on a meeting with Congressional Staff; and
- ACT-IAC Executive Leadership Conference Track Three: Big Data Bold Horizons (Note: This Conference was cancelled due to Hurricane Sandy)
to produce a Cross-Walk Table of big data pilots by government agency. I found that NASA was conspiciouslu absent, especially since last fall they kicked off The Big Data Challenge series designed to find innovative solutions to the government’s big data problems. The first contest was all about making disparate, incompatible data sets usable and actually valuable across agencies as follows:
“How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”
They also posed the questions: Is creating a contest the right way to get the government to start thinking about how to control big data? What would be your submission?
So I decided to take that challenge and provide a specific example shown elsewhere using NSF and NASA data sources as follows:
- What Has Been Funded (Recent Big Data Awards Made Through This Program, with Abstracts)
- NASA Open Data
The first objective: “How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted?
Was met by using the NSF Spreadsheet of Big Data Awards and the Data.gov Catalog Spreadsheet (about 6000 dissimilar and incompatible data sets) to query for the NASA data sets (only 3 actual data sets and 22 tools) and then getting those 3 data sets (actually only two could be retrieved because the 3rd was a broken link) into a common tool where the knowledge could be extracted for the second objective. Interesting the data.nasa.gov site says they have over 500 data sets in a directory and the NASA Data Resources web site mentions the Global Change Master Directory with 1000s on data sets with high-quality metadata. So i put one of the Directory data sets (Venus Craters Spreadsheet) into the common tool. I requested the Global Change Master Directory in an open format so I could put it into the common tool as well in the future. So I have no problem putting diverse data sets into a common tool once I get and prepare them (see below: "it is the problem.")
The second objective: How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?”
Was met only partially so far because so much work is required to get and prepare the NASA data sets for a common tool.
This reminded me of the experience of Josh Wills, Data Scientist @Cloudera, writing on The Practice of Data Science who said:
- Key trait of all data scientists. Understanding “that the heavy lifting of [data] cleanup and preparation isn’t something that gets in the way of solving the problem: it is the problem.” (DJ Patil)
- Inverse problems. Not every data scientist is a statistician, but all data scientists are interested in extracting information about complex systems from observed data, and so we can say that data science is related to the study of inverse problems. Real-world inverse problems are often ill-posed or ill-conditioned, which means that scientists need substantive expertise in the field in order to apply reasonable regularization conditions in order to solve the problem.
- Data sets that have a rich set of relationships between observations. We might think of this as a kind of Metcalfe’s Law for data sets, where the value of a data set increases nonlinearly with each additional observation. For example, a single web page doesn’t have very much value, but 128 billion web pages can be used to build a search engine.
- Open-source software tools with an emphasis on data visualization. One indicator that a research area is full of data scientists is an active community of open source developers.
So NASA's work with big data seems to be more about tools (software coding) than actual data science about data sets and reinforces Dr. George Strawn's observation that if we do not get some real results with big data of value to the business and science of government, we will be on to something else that will.
Spotfire Dashboard
For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App
Upcoming
Data Transparency Coallition, January 4, Capitol Hill. Wiki, Slides.
Big Data & Activity Based Intelligence, IAC General Membership Meeting, January 16, Falls Church, VA. Wiki. Slides
NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, Gaithersburg, MD
W3C eGov Special Interest Group, Open Government Data for Japan (and the US and Europe): January 21, Slides
Federal Big Data Senior Steering Group,Wiki. January 24, Ballston, VA
Big Data Exchange Meeting, February 26, 2013, 8-10 a.m., The City Club of Washington at Columbia Square, Washington, D.C.
ACT-IAC Collaboration & Transformation SIG, Government Challenges With Big Data, February 23, Fairfax, VA. Wiki. Slides
Previous
Big Data: Big Problem, Big Answer for the CIA, November 17, 2011
The Value Potential of Big Data to Government: Parsing and serving it up “small", February 26, 2012 USE GRAPHIC
Intelligence Community Loves Big Data, March 6, Wiki
Challenges and Opportunities in Big Data, March 29. AAAS, Washington, DC
Big Data Conference, May 8-9, Arlington, VA. Wiki
How To Become a Data Scientist With Spotfire 5, May 30, Slides
Big Data, June 13-14, NIST, Gaithersburg, MD.Wiki
Big Data and the Government Enterprise, June 21, Wiki
Big Data Innovation andSocial Media & Web Analytics Innovation, September 13-14, Boston, MA. Wiki Blogs Slides
BIG DATA at the Hill, September 25, Rayburn House Office Building, Washington, DC
Federal Big Data Senior Steering Group, September 27. Wiki. Moved to January 24th
14th SOA for e-Government Conference, MITRE, McLean, VA, October 2. Federation of SOA Pilot.Slides Slides
Emerging Technology SIG Meeting: Big Data Committee Overview, October 18, Noblis. Wiki
Recorded Future User Conference, October 16, 2012, Newseum, Washington, DC. Wiki
Executive Leadership Conference "Charting a Course", October 28-30, Colonial Williamsburg, Virginia. Wiki
Using Data Science Evidence in Public Policy for Big Data and Elections, George Mason University, University Hall, November 1-2. Slides
ACT-IAC ET SIG Semantic Web (with Big Data), November 29. GSA Fairfax, VA. Slides
Government Information and Analytics Summit, November 28-29, Washington, DC. Proposal Wiki Wiki
NGA Collaboration Forum Outreach Event, December 11, NGA Campus East (NCE), Springfield, VA
Big Data Part II, December 12, Washington, DC. Wiki
Research Notes
- Put the Best Content into a Knowledge Base (e.g. MindTouch*)
- The Japan Statistical Yearbook 2012
- Put the Knowledge Base into a Spreadsheet (Excel*)
- Linked Data to Subparts of the Knowledge Base
- Put the Spreadsheet into a Dashboard (Spotfire*)
- Data Integration and Interoperability Interface
- Put the Dashboard into a Semantic Model (Excel*)
- Data Dictionaries and Models
- Put the Semantic Model into Dynamic Case Management (Be Informed*)
- Structured Process for Updating Data in the Dashboard
| Star | Definition | Example / Tool* |
| ★ | Make your stuff available on the Web (whatever format) under an open license | This Story /MindTouch |
| ★★ | Make it available as structured data (e.g., Excel instead of image scan of a table) | Spreadsheet / Excel |
| ★★★ | Use non-proprietary formats (e.g., CSV instead of Excel) | Table / MindTouch and Spotfire |
| ★★★★ | Use URIs to identify things, so that people can point at your stuff | Table of Contents / MindTouch and Spotfire |
| ★★★★★ | Link your data to other data to provide context | Table / MindTouch and Spotfire |
* Examples of tools used.
"MindTouch is focused on becoming optimized for mobile devices. Currently mobile optimization is performed through our client services department on a case by case basis. These enhancements do require additional costs for implementation and development. Mobile enhancement will become a productized part of our product in the future. I do not have a firm date." Source Cory Ganser, MindTouch
Summary
Cross-Walk Table
| Agency | Contact | BDSSG | Conferences | ELC (4) | Pilots (3) |
| OSTP (1) | John Holdren, Assistant to the President and Director, White House Of..ice of Science and Technology Policy | Through NITRD (2) | George Strawn and Wendy Wigen My DoS Story (6, etc.) | George Strawn | Health, Safety, and Energy Data and Semantic Medline |
| NSF (1) | Subra Suresh, Director, National Science Foundation | Independent Agency (2) | Dashboards | ||
| NIH (1) | Francis Collins, Director, National Institutes of Health | Through DDHS (2) with AHRQ and ONC | Andrea T. Norris and Frank Baitman | Medicare for IOM and SEER | |
| USGS (1) | Marcia McNutt, Director, United States Geological Survey | EarthCube Charet | EarthCube | ||
| DoE (1) | William Brinkman, Director, Department of Energy Office of Science | NNSA and SC (2) | Internal Summit | Peter Tseronis, Michael Franklin, and E. Wes Bethel | DISRE Solar (Government Information and Analytics Summit) |
| DoD (1) | Zach Lemnios, Assistant Secretary of Defense for Research & Engineering, Department of Defense | DARPA, NSA, OSD, & Service Research Organizations (2) | (David Wennergran and Teri Takai) | Dashboards | |
| DARPA (1) | Kaigham “Ken” Gabriel, Deputy Director, Defense Advanced Research Projects Agency | (Martin Hyatt, January 27-29) | |||
| DoC | NIST and NOAA (2) | App Contest Story | Dashboards | ||
| DHS | Independent Agency (2) | Story | |||
| EPA | Malcolm Jackson | Independent Agency (2) | Mine | (Malcolm Jackson) | EnviroFacts and Indicators |
| NASA | Tomas Soderstrom | Independent Agency (2) | Story | Tomas Soderstrom | IN PROCESS |
| NARA | Jason Baron | Independent Agency (2) | Story | (Jason Baron) | IN PROCESS |
| IC | Gus Hunt, CTO, CIA, and Robert Ames, In-Q-Tel | Story and Story | Michael Howell | CIA World Fact Book and Quint | |
| GSA | Dave McClure and Kathleen Turco | Story | (Marie Davie and Johan Bos-Beijer) | Governmentwide Acquisition Contract (GWAC) Dashboard | |
| Treasury | Story | Adam Goldberg and Thomas Vannoy | Bureau of Public Debt |
(1) http://semanticommunity.info/AOL_Government/Challenges_and_Opportunities_in_Big_Data
(2) http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee#Slide_2_Member_Agencies
See details below
ELC Track Three: Big Data Bold Horizons
MY NOTE: This conference was cancelled due to Hurricane Sandy and its content has been included in the Cross-Walk Table above
Government agencies are awash in ever-expanding volumes of data, and providing timely and efficient management and analysis of data assets represents one of the great management challenges of our time. The rapid increase in data generated from mobile devices, sensors, audio/visual tools, web traffic, and electronic customer transactions illustrates the enormity of this challenge – and the new opportunities it offers. The White House has unveiled a new “Big Data Research and Development Initiative” that commits $200 million to new research efforts in the management of “big data”. Most agencies have efforts underway to harness these new sources of information to provide customer and citizen benefit derived from nuggets of valuable information that are buried in diverse and massive data repositories.
Defining and Maximizing Big Data
Invited Participants
Invited Participants
Invited Participants
Invited Participants
Big Data At the Hill
Source: http://semanticommunity.info/AOL_Government/BIG_DATA_at_the_Hill#Story
| Topics | Trends | Issues | Comments |
| Myth vs. Realities | Big Data Solves Everything | Hype Without Demonstrated Business and Scientific Value | See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year? |
| Privacy: Who knows what? | The Intelligence Community Knows Everything | Who Knows Everything the Intelligence Communty Is Doing? | See Intelligence Community Loves Big Data |
| Cloud: Where Big Data belongs? | Terabytes to Zettabytes | Bandwidth Limitations | Amazon: Fedex Your Storage Devices To Us to Upload Your Big Data |
| Mobility – of you and your data | Bring Your Own Device (BYOD) | Conventional Web Sites and Databases Are Not Mobile-Enabled | Your Mobile Device Has Access To a Supercomputer |
| Storage and technology | Scalable single level storage | Collapses the Server, Network, and storage by removing software and replacing them with memory system primitives | Panève’s ZettaLeaf & ZettaTree Products |
| Data Analytics – hidden gems and spurious conclusions | Data Science | Too Few Data Scientists - Need a Government Data Science Community | See My Data Journalism Articles |
| Opportunities and risks in data aggregation | Aggregate Before Analysis To Reduce Size | Needels Could Be Lost | See Data Evolution in the Government Enterprise: Will It Still Be Big Data Next Year? |
| Security concerns for large data sets | Integrate Classified and Unclassified Data Sources | Different Security Levels | Need To Specify/Protect Security at the Row and Element Level |
| Financial Implications | Hadoop for Everything with Big Data | Costs 50 Times Higher Than Expected | Big Data In Memory Could Be More Costs Effective |
Big Data Case Studies High Level Summary
The Commission has compiled a set of 10 case studies detailing the business or mission challenge faced, the initial Big Data use case, early steps the agency took to address the challenge and support the use case, and the business results. Although the full text of these case studies will be posted at the TechAmerica Foundation Website, some are summarized below.
| Agency/Organization/ Company Big Data Project Name | Underpinning Technologies | Big Data Metrics | Initial Big Data Entry Point | Public/User Benefits |
| Case Studies and Use Cases | National Archive and Records Administration (NARA) Electronics Records Archive | Metadata, Submission, Access, Repository, Search and Taxonomy applications for storage and archival systems | Petabytes, Terabytes/ sec, Semi-structured Warehouse | Optimization, Distributed Info Mgt Provides Electronic Records Archive and Online Public Access systems for US records and documentary heritage |
| TerraEchos Perimeter Intrusion Detection | Streams analytic software, predictive analytics | Terabytes/sec | Streaming and Data Analytics | Helps organizations protect and monitor critical infrastructure and secure borders |
| Royal Institute of Technology of Sweden (KTH) | Traffic Pattern Analysis | Streams analytic software, predictive analytics | Gigabits/sec Streaming and Data Analytics | Improve traffic in metropolitan areas by decreasing congestion and reducing traffic accident injury rates |
| Vestas Wind Energy Wind Turbine Placement & Maintenance | Apache Hadoop | Petabytes | Streaming and Data Analytics | Pinpointing the optimal location for wind turbines to maximize power generation and reduce energy cost |
| University of Ontario (UOIT) Medical Monitoring | Streams analytic software, predictive analytics, supporting Relational Database | Petabytes | Streaming and Data Analytics | Detecting infections in premature infants up to 24 hours before they exhibit symptoms |
| National Aeronautics and Space Administration (NASA) Human Space Flight Imagery | Metadata, Archival, Search and Taxonomy applications for tape library systems, GOTS | Petabytes, Terabytes/ sec, Semi-structured | Warehouse Optimization | Provide industry and the public with some of the most iconic and historic human spaceflight imagery for scientific discovery, education and entertainment |
| AM Biotechnologies (AM Biotech) DNA Sequence Analysys for Creating Aptamers | Cloud-based HPC genomic applications and transportable data files | Gigabytes, 107 DNA sequences compared | Streaming Data & Analytics, Warehouse Optimization, Distributed Info Mgt | Creation of a unique aptamer compounds to develop improved therapeutics for many medical conditions and diseases |
| National Oceanic and Atmospheric Administration(NOAA) National Weather Service | HPC modeling, data from satellites, ships, aircraft and deployed sensors | Petabytes, Terabytes/ sec, Semi-structured, ExaFLOPS, PetaFLOPS | Streaming Data & Analytics, Warehouse Optimization, Distributed Info Mgt | Provide weather, water, and climate data, forecasts and warnings for the protection of life and property and enhancement of the national economy |
| Internal Revenue Service (IRS) Compliance Data Warehouse | Columnar database architecture, multiple analytics applications, descriptive, exploratory, and predictive analysis | Petabytes | Streaming Data & Analytics, Warehouse Optimization, Distributed Info Mgt | Provide America's taxpayers top quality service by helping them to understand and meet their tax responsibilities and enforce the law with integrity and fairness to all |
| Centers for Medicare & Medicaid Services (CMS) Medical Records Analytics | Columnar and NoSQL databases, Hadoop being looked at, EHR on the front end, with legacy structured database systems (including DB2 and COBOL) | Petabytes, Terabytes/ day | Streaming Data & Analytics, Warehouse Optimization, Distributed Info Mgt | Protect the health of all Americans and ensure compliant processing of insurance claims |
Demystifying Big Data — A Practical Guide to Transforming the Business of Government
Source: http://federalbriefings.1105cms01.co.../overview.aspx
See: http://semanticommunity.info/AOL_Government/Big_Data_Part_II
“Big Data” is top-of-mind for technologists today—driven by the recognition that we are only just beginning to see the accumulation of data generated by our digitally-dependent and networked lives. Given the increasingly mobile and real-time nature of information collection and sharing, in formats from audio to video to instant messaging and more, there is no doubt the sheer volume of data generated by users and organizations will quickly surpass most near-term forecasts.
How should Federal Government organizations prepare for and plan to face the big data wave of 2013 and beyond? First, by becoming informed about the nature and scope of the topic, and second, by hearing more about how agencies already are taking steps to apply data science to optimizing their operations and results.
This is your opportunity to join the dialogue--plan to join the Tech America Foundation and the 1105 Government Information Group in mid-December to review how big data is affecting government organizations today and will continue to do so into the future. The program will feature the findings of the Tech America Foundation 2012 Big Data Commission Report as well as discussion of:
- How big data trends will impact the way agencies collect, store, manage, and protect their information assets
- How government enterprises can get started to leverage the data they have and how to “scale up”
- Why user demands for more information--anytime and anywhere—will continue to transform how agencies deliver services and data
- The role of data analytics for using information on-hand to improve decision support
- How to design trusted data sharing methods for collaborative environments and valuable results
- Techniques for monitoring and protecting data stores to prevent fraud, breach, and compromise
- Strategies for ensuring privacy protection using available technology and sound compliance policy
- Which technologies enable big data applications, and the impact of wireless devices and sensors on big data accumulation
NIST Cloud Computing AND Big Data Forum & Workshop, January 15-17, 2013
Source: http://www.nist.gov/itl/cloud/cloudbdworkshop.cfm and http://www.nist.gov/itl/cloud/upload/Cloud-Computing-and-Big-Data-Forum-and-Workshop_agenda.pdf (PDF)
The NIST Cloud and Big Data Workshop will bring together leaders and innovators from industry, academia and government in an interactive format that combines keynote presentations, panel discussions, interactive breakout sessions and open discussion. The conference will be led off by Pat Gallagher, Under Secretary of Commerce for Standards and Technology and Director, NIST, and Steven VanRoekel, the Chief Information Officer of the United States.
The second and third days of the workshop focuses on the intersection of Cloud and Big Data. Fully realizing the power of Big Data depends on meeting the unprecedented demands on storage, integration, and analysis presented by massive data sets--demands that Cloud innovators are working to meet today. The workshop will explore possibilities for harmonizing Cloud and Big Data measurement, benchmarking, and standards in ways that bring the power of these two approaches to bear in driving progress and prosperity.
Whether you’re interested in Big Data as a service, analytics and visualization, operational infrastructure, or other areas, the workshop will give you the opportunity to share your ideas and explore those of others on key questions such as:
- What benchmarks are needed to evaluate different Cloud architectures and implementations for Big Data applications?
- How do the ways in which Big Data are structured and measured today either facilitate or impede Cloud solutions? What changes are needed?
- What standards and metrics could be harmonized to create synergy between Cloud and Big Data? What standards are missing?
- How are Big Data analytics changing/influencing Cloud and vice-versa?
- What are the needs for interoperability with regard to Big Data in the Cloud?
- What is your most pressing question regarding Big Data in the cloud?
Big Data Exchange Meeting, February 26, 2013
The City Club of Washington at Columbia Square
Washington, D.C.
The Big Data ChallengeSource: http://meritalk.com/blog.php?user=BigDataExchange&blogentry_id=3340 Posted: 10/17/2012 On October 3, 2012, NASA and a couple other government agencies kicked off The Big Data Challenge series designed to find innovative solutions to the government’s big data problems. The first contest is all about making disparate, incompatible data sets usable and actually valuable across agencies. Here is the first contest: “How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges?” Is creating a contest the right way to get the government to start thinking about how to control big data? What would be your submission? Check out the contest page here and share your thoughts below. MY NOTE: See Below. Big Data GapSource: http://meritalk.com/big-data-report-register.phpDespite the buzz, big data is a new concept that most mid-level decision makers are unfamiliar with. Though Federal thought leaders are extolling the impact of big data, the truth is that to capture the full potential of big data, agencies need the following:
The Big Data Gap report captures insights from big data leaders. Download the Big Data Gap to find out:
Click here to view the press release. MY NOTE: See Below. Click here to view the media coverage. MY NOTE: See Below. Government Agencies Adding A Petabyte of New Data in Next Two Years; Making Little Progress Yet In Big DataSource: http://meritalk.com/pdfs/big-data/MeriTalk_Big_Data_Gap_Press_Release.pdf (PDF) IT professionals estimate that they have less than half of the storage, computing, and personnel resources necessary to leverage big data for efficiency gains, better decision making Alexandria, Va., May 7, 2012 – Government data is growing and agencies are looking to leverage big data to support government mission outcomes. However, most agencies lack the data storage/access, computational power, and personnel they need to take advantage of the big data opportunity, according to a new study by MeriTalk sponsored by NetApp. The new report, “The Big Data Gap,” reveals that Federal IT professionals believe big data can improve government but that the promise of big data is locked away in unused or inaccessible data. President Obama’s recently announced Big Data Research and Development Initiative highlights the big data promise – that improving our ability to extract knowledge and insights from large and complex collections of data will help government solve problems. Federal IT professionals agree. According to the Big Data Gap report, Federal IT professionals say improving overall agency efficiency is the top advantage of big data (59 percent) followed by improving speed/accuracy of decisions (51 percent) and the ability to forecast (30 percent). While Federal IT professionals agree there are many benefits to big data, the technology and applications needed to successfully leverage big data are still emerging. Sixty percent of civilian agencies and 42 percent of Department of Defense/intelligence agencies say they are just now learning about big data and how it can work for their agency. While the promise of big data is strong, most agencies are still years away from using it. Just 60 percent of IT professionals say their agency is analyzing the data it collects and less than half (40 percent) are using data to make strategic decisions. On average, Federal IT professionals report that it will take their agencies three years to take full advantage of big data. Federal IT professionals report that the amount of government data will continue to grow. Eighty-seven percent of Federal IT professionals say their agency’s stored data has grown in the last two years. The majority of Federal IT professionals – 96 percent – expect their agency’s stored data to grow in the next two years by an average of 64 percent. “Government has a gold mine of data at its fingertips,” said Mark Weber, president of U.S. Public Sector for NetApp. “The key is turning that data into high-quality information that can increase efficiencies and inform decisions. Agencies need to look at big data solutions that can help them efficiently process, analyze, manage, and access data, enabling them to more effectively execute their missions.” While agencies have a huge amount of data – that continues to grow – in many agencies the data is locked away. Nearly a third of agency data is unstructured and therefore substantially less useful. The amount of unstructured data is growing – 64 percent of Federal IT professionals report that the amount of unstructured data they store has increased in the past two years. Data ownership further complicates agencies’ ability to use big data. Agencies are unclear on who owns the data, with 42 percent reporting IT departments own the data, 28 percent reporting that the data belongs to the department that generates it, and 12 percent reporting the data belongs to the C-level. Federal IT professionals also identify a gap between the big data possibility and reality with nine out of 10 reporting challenges on the path to harnessing big data. Agencies estimate that they have just 49 percent of the data content storage/access, 46 percent of the bandwidth/computational power, and 44 percent of the personnel they need to leverage big data and drive mission results. In addition, 57 percent say they have a least one dataset that has grown too big to work with using their current management tools and/or infrastructure. Despite the challenges, agencies are working to harness big data. Sixty-four percent of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand. However, they estimate that it would take an average of 10 months to double their short-to medium-term capacity. In addition, some agencies are taking steps to improve their ability to manage and make decisions with big data. Top tactics include investing in IT infrastructure to optimize data storage (39 percent), training IT professionals to manage/analyze big data (33 percent), and improving the security of stored data (31 percent). “The Big Data Gap” is based on a survey of 151 Federal government CIOs and IT managers in March 2012. The report has a margin of error of +/- 7.95 percent at a 95 percent confidence level. To download the full study, please visit http://www.meritalk.com/bigdatagap. About MeriTalk The voice of tomorrow’s government today, MeriTalk is an online community and go-to resource for government IT. Focusing on government’s hot-button issues, MeriTalk hosts Data Center Exchange, Cyber Security Exchange, and Cloud Exchange – platforms dedicated to supporting public-private dialogue and collaboration. MeriTalk connects with an audience of 85,000 government community contacts. For more information, visit http://www.meritalk.com or follow us on Twitter, @meritalk. The Big Data Gap: The 2012 NetApp Study – Media ResultsSource: http://meritalk.com/pdfs/big-data/2012_The_Big_Data_Gap_Media_Coverage_062212.pdf (PDF) As of July 6, 2012 Coverage to DateCiv Source By Bailey McCann May 29, 2012 On the Frontlines June/July 2012 Government Health IT By Katie Spies June 11, 2012 Cloud Times By Saroj Kar June 11, 2012 PC Advisor By Thor Olavsrud June 11, 2012 CIO By Thor Olavsrud June 11, 2012 Fierce CIO By Caron Carlson June 13, 2012 Executive Gov By Katie Noland May 23, 2012 eWeek By Nathan Eddy May 10, 2012 Federal News Radio By Michael O’Connell May 16, 2012 Baseline By Jennifer Lawinski May 16, 2012 Experian QAS By Experian QAS Staff May 10, 2012 Read Write Hack By Scott M. Fulton May 10, 2012 O’Reilly Radar By Audrey Watters May 10, 2012 Government Health By Tom Sullivan May 10, 2012 TechZone 360 By Peter Bernstein May 10, 2012 AOL Government By Kathleen Hickey May 10, 2012 Techno Capital By Techno Capital Staff May 8, 2012 Tech Investor News By CIO Insight Staff May 8, 2012 DataVersity By Angela Guess May 8, 2012 Baseline By Baseline Staff May 8, 2012 eWeek By Nathan Eddy May 8, 2012 CIO Insight By CIO Insight Staff May 8, 2012 Silicon Angle By Maria Deutscher May 8, 2012 WTN News By Nathan Eddy May 7, 2012 Channel Biz By Tamlin Magee May 7, 2012 Federal Computer Week By Camille Tuutti May 7, 2012 Information Management By Jim Ericson May 7, 2012 Potomac Tech Wire By Potomac Tech Wire Staff May 7, 2012 Coverage to Date – Press Release Pick UpsThe Data Center Journal Enhanced Online News Virtual Strategy Magazine IT Briefing Smart Grid, TMC Net Yahoo Finance Reuters TD Ameritrade Sympatico Finance Canada Health Benzinga Yahoo Finance Canada Morning Star Financial Content Newsblaze Infrastructure Canada iStock Analyst TMC Net Green Sun Herald TMC Net Government The Big Data Gap: ReportSource: http://meritalk.com/big-data-report-register.php (PDF) Welcome to the NITRD Big Data Challenge Series!Source: http://community.topcoder.com/coeci/nitrd/ The Big Data Challenge is an effort by the U.S. government to conceptualize new and novel approaches to extracting value from “Big Data” information sets residing in various agency silos and delivering impactful value while remaining consistent with individual agency missions This data comes from the fields of health, energy and Earth science. Competitors will be tasked with imagining analytical techniques, and describe how they may be shared as universal, cross-agency solutions that transcend the limitations of individual agencies. “Big Data is characterized not only by the enormous volume or the velocity of its generation but also by the heterogeneity, diversity and complexity of the data,” said Suzi Iacono, co-chair of the interagency Big Data Senior Steering Group, a part of the Networking and Information Technology Research and Development program. “There are enormous opportunities to extract knowledge from these large-scale diverse data sets, and to provide powerful new approaches to drive discovery and decision-making, and to make increasingly accurate predictions. We’re excited to see what this competition will yield and how it will guide us in funding the next round of big data science and engineering. In this contest series, we will ask competitors to consider big data ideas from several different perspectives. We begin with a contest that will award the best ideas for tools and techniques for homogenizing disparate data sources and topics. In later contests, we’ll ask for ideas that are more focused in the domains of Health, Energy and Earth Science. Check out the contest specifications, and good luck! NITRD Review BoardThe NITRD Big Data Contest Series will be supported by a review board assembled from experts in industry and academia. Once submissions have been screened for completeness, they will be forwarded to the board for final review. You can read about the Board by clicking here. About the ContestsThere are two ways to compete on the first contest. If you prefer to work with TopCoder Studio, you can register for the contest by clicking the Studio link below. If your preference is for the /tc Community, you can register by clicking the “Software” link below. Both choices are embodiments of the same challenge. If you are uncertain which choice is best for you, or you are new to TopCoder, we suggest the Studio link. Click here to compete on TopCoder Studio. MY NOTE: Requires Log-In. Click here to compete on /tc. MY NOTE: See Below. Big Data Challenge - Conceptualization - Idea Generation
Contest OverviewDetailed Requirements The Big Data Challenge is an effort by the U.S. government to find new and inventive ways to use the huge and diverse sets of data maintained by numerous government agencies. There is a lot of data out there, collected for many different purposes and in many different formats that make interoperation very challenging. How can we make heterogeneous (dissimilar and incompatible) data sets homogeneous (uniformly accessible, compatible, able to be grouped and/or matched) so usable information can be extracted? How can information then be converted into real knowledge that can inform critical decisions and solve societal challenges? Those are the questions we'd like you to help us answer. We're looking for your ideas about how to coordinate data sets drawn from multiple domains, and about what end uses we should be working toward. Please note that regardless of what is displayed elsewhere on this page, we will be paying out $750 for each of the top three ideas. 1st place: $750 We will also pay a 30% bonus to prize-winning ideas that make use of streaming data, as noted on the contest wiki page. TechnologiesN/A Final Submission GuidelinesPlease see the contest wiki page for complete details. EligibilityYou must be a TopCoder member, at least 18 years of age, meeting all of the membership requirements. In addition, you must fit into one of the following categories.
Results
The NITRD Big Data Challenge Review BoardSource: http://community.topcoder.com/coeci/nitrd/judges/ To support this contest series, NITRD has assembled a review board of professional and academic professionals. Once submissions have been screened, they will be forwarded to this board for a final review. Read about the board members below. Robert W. Bectel serves as Chief Technology Officer and Senior Policy Advisor for EERE where he helps accelerate the commercialization of new energy solutions and seeks to establish a uniform, efficient, agile, and user-friendly IT infrastructure which facilitates the performance of every employee’s work and the achievement of EERE’s mission. To do this, he brings a unique passion, focus and expertise on using cutting-edge technologies to enable the development of low cost and easily accessible distributable content, mobile applications, robust software and interactive solutions – all with the goal of helping transform and expedite market acceptance, transparency and efficiency. Prior to his transition to government, Rob’s more than 16 years of experience including both the non-profit and commercial sectors. He built and managed the first vertical business network for pharmacists, http://ww.pharmacistelink.com, for the National Community Pharmacists Association, and led its Committee on Innovation and Technology. Prior to NCPA, Rob directed the development of the USDA consumer website and was the marketing director for ChainDrugStore.net. As the portal manager for the TruSecure Corporation, he performed interactive outreach activities to customers and the public on matters of information and network security. Rob has co-authored several publications and textbooks on long-term care systems in the United States and Europe. Austin L. Brown, Ph.D. is senior analyst in the Washington, DC office of the National Renewable Energy Laboratory (NREL). His work focuses on clean transportation, including efficient and electrified vehicles, renewable fuels, and transportation system interactions with the built environment. He also moonlights as Deputy Chief Technology Officer for the Office of Energy Efficiency and Renewable Energy in the U.S. Department of Energy, specializing in energy analysis, tools, and opening up data sets for innovation. Austin was a scientist in his previous life. He has a B.S. in Physics from Harvey Mudd College and received his Ph.D. in Biophysics from Stanford University. With this scientific training, he transitioned to Washington to connect science to policy decisions, especially federal clean energy research funding. Austin’s primary career interest is to see the United States begin on a pathway that leads towards a future where energy is clean, sustainable, affordable, and reliable domestically and worldwide. This overall goal is at least partially selfish as it will ensure he can continue doing the things he loves – appreciating the outdoors, SCUBA diving, skiing – and help save these invaluable resources for future generations. Will Barkis is the Project Lead for the Mozilla Ignite Challenge, a partnership with the National Science Foundation to create apps to change the world on next-gen networks. He is a co-founder of Bill-Doctor.com and served as technology policy fellow at NSF. Will has a PhD in Neuroscience from the University of California, San Diego. Ian J. Kalin is passionate about energy and empowering people through data. Ian started his professional career as a Counter-Terrorism Officer for the US Navy, later serving as a Nuclear Engineer onboard the USS Ronald Reagan. After leaving the Navy, Ian joined a rising company called PowerAdvocate, which delivers market intelligence solutions to the electric and gas sectors. His entrepreneurial work led to significant cost savings for utility companies and their customers. Ian has a BS in International Politics from Georgetown and a MA in Engineering Management from Old Dominion. He lives in San Francisco with his wife, Amanda, and is a musician in his spare time. Ryan McKeel is a technologist with a passion for web-based entrepreneurship and data visualization. He started his first business at the age of 14 creating database-driven websites, and has since worked with the Air Force Research Laboratory, the DARPA COORDINATORs program and the National Renewable Energy Laboratory on advanced data sharing and visualization tools. Ryan is one of the lead developers on the Open Energy Information platform (OpenEI.org), a collaborative website using linked open data to provide simple access to international energy information and data. He holds a Bachelor of Science degree in IT and Entrepreneurship from Rensselaer Polytechnic Institute. Ryan lives in Denver, Colorado with his wife and three young children; he is also a classical and jazz pianist who has had the honor of performing with Colorado symphony orchestras. |
Big Data Buzzwords From A to Z
Introduction
Data Warehousing
ETL
Flume
Geospatial Analysis
Hadoop
In-Memory Database
Java
Kafka
Latency
Map/reduce
NoSQL Databases
Oozie
Pig
Quantitative Data Analysis
Relational Database
Sharding
Text Analytics
Unstructured Data
Visualization
Whirr
XML
Yottabyte
ZooKeeper
Chronology For Federal Government
Obama Administration Unveils “Big Data” Initiative: Announces $200 Million In New R&D Investments
Source: http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release.pdf (PDF)
Office of Science and Technology Policy
- Advance state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data.
- Harness these technologies to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning; and
- Expand the workforce needed to develop and use Big Data technologies
- Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers;
- Funding a $10 million project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information - machine learning, cloud computing, and crowd sourcing;
- Providing the first round of grants to support “EarthCube” – a system that will allow geoscientists to access, analyze and share information about our planet;
- Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
- Providing $1.4 million in support for a focused research group of statisticians and biologists to tell us about protein structures and biological pathways.
- Convening researchers across disciplines to determine how Big Data can transform teaching and learning.
- Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own.
- Improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.
- Developing scalable algorithms for processing imperfect data in distributed data stores; and
- Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.
For more information about OSTP, visit http://WhiteHouse.gov/OSTP
Big Data is a Big Deal
Source: http://www.whitehouse.gov/blog/2012/...-data-big-deal
Posted by on March 29, 2012 at 09:23 AM EST
Today, the Obama Administration is announcing the “Big Data Research and Development Initiative.” By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning.
To launch the initiative, six Federal departments and agencies will announce more than $200 million in new commitments that, together, promise to greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data. Learn more about ongoing Federal government programs that address the challenges of, and tap the opportunities afforded by, the big data revolution in our Big Data Fact Sheet. MY NOTE: See below
We also want to challenge industry, research universities, and non-profits to join with the Administration to make the most of the opportunities created by Big Data. Clearly, the government can’t do this on its own. We need what the President calls an “all hands on deck” effort.
Some companies are already sponsoring Big Data-related competitions, and providing funding for university research. Universities are beginning to create new courses—and entire courses of study—to prepare the next generation of “data scientists.” Organizations like Data Without Borders are helping non-profits by providing pro bono data collection, analysis, and visualization. OSTP would be very interested in supporting the creation of a forum to highlight new public-private partnerships related to Big Data.
Tom Kalil is Deputy Director for Policy at OSTP
Big Data Across the Federal Government Fact Sheet
Source: http://www.whitehouse.gov/sites/defa...et_final_1.pdf (PDF)
DEPARTMENT OF DEFENSE (DOD)
Defense Advanced Research Projects Agency (DARPA)
DEPARTMENT OF HOMELAND SECURITY (DHS)
DEPARTMENT OF ENERGY (DOE)
The Office of Science
The Office of Basic Energy Sciences (BES)
The Office of Fusion Energy Sciences (FES)
The Office of High Energy Physics (HEP)
The Office of Nuclear Physics (NP)
The Office of Scientific and Technical Information (OSTI)
DEPARTMENT OF VETERANS ADMINISTRATION (VA)
DEPARTMENT OF HEALTH AND HUMAN SERVICES (HHS)
Center for Disease Control & Prevention (CDC)
Center for Medicare & Medicaid Services (CMS)
Food & Drug Administration (FDA)
NATIONAL ARCHIVES & RECORDS ADMINISTRATION (NARA)
NATIONAL AERONAUTIC & SPACE ADMINISTRATION (NASA)
NATIONAL ENDOWMENT FOR THE HUMANITIES (NEH)
NATIONAL INSTITUTES OF HEALTH (NIH)
National Cancer Institute (NCI)
National Heart Lung and Blood Institute (NHLBI)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
NIH Blueprint
NIH Common Fund
National Institute of General Medical Sciences:
National Library of Medicine
Office of Behavioral and Social Sciences (OBSSR)
Joint NIH - NSF Programs
NATIONAL SCIENCE FOUNDATION (NSF)
NATIONAL SECURITY AGENCY (NSA)
UNITED STATES GEOLOGICAL SURVEY (USGS)
NSF Leads Federal Efforts In Big Data
| ||||||
Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)
Source: http://www.nsf.gov/funding/pgm_summ....pims_id=504767
![]()
CONTACTS
![]()
| Name | Phone | Room | |
| Vasant G. Honavar | vhonavar@nsf.gov | (703) 292-7129 | |
| Jia Li | jli@nsf.gov | (703) 292-4870 | |
| Dane Skow | dskow@nsf.gov | (703) 292-4551 | |
| Peter H. McCartney | pmccartn@nsf.gov | (703) 292-8470 | |
| Doris L. Carver | dcarver@nsf.gov | (703) 292-5038 | |
| Eduardo A. Misawa | emisawa@nsf.gov | (703) 292-5353 | |
| Eva Zanzerkia | ezanzerk@nsf.gov | (703) 292-8556 | |
| Peter Muhlberger | pmuhlber@nsf.gov | (703) 292-7848 | |
| Vladimir Papitashvili | vpapita@nsf.gov | (703) 292-7425 | |
| Tandy Warnow | twarnow@nsf.gov | (703) 292-8491 | |
| General Correspondence email For general correspondence, please reply to bigdata@nsf.gov. | |||
PROGRAM GUIDELINES
![]()
Solicitation 12-499
Important Notice to Proposers
A revised version of the NSF Proposal & Award Policies & Procedures Guide (PAPPG), NSF 13-1, was issued on October 4, 2012 and is effective for proposals submitted, or due, on or after January 14, 2013. Please be advised that, depending on the specified due date, the guidelines contained in NSF 13-1 may apply to proposals submitted in response to this funding opportunity.
Please be aware that significant changes have been made to the PAPPG to implement revised merit review criteria based on the National Science Board (NSB) report, National Science Foundation's Merit Review Criteria: Review and Revisions. While the two merit review criteria remain unchanged (Intellectual Merit and Broader Impacts), guidance has been provided to clarify and improve the function of the criteria. Changes will affect the project summary and project description sections of proposals. Annual and final reports also will be affected.
A by-chapter summary of this and other significant changes is provided at the beginning of both the Grant Proposal Guide and the Award & Administration Guide.
SYNOPSIS
![]()
The Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) solicitation aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life. The new knowledge, tools, practices, and infrastructures produced will enable breakthrough discoveries and innovation in science, engineering, medicine, commerce, education, and national security -- laying the foundations for US competitiveness for many decades to come.
The phrase "big data" in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.
This solicitation is one component in a long-term strategy to address national big data challenges, which include advances in core techniques and technologies; big data infrastructure projects in various science, biomedical research, health and engineering communities; education and workforce development; and a comprehensive integrative program to support collaborations of multi-disciplinary teams and communities to make advances in the complex grand challenge science, biomedical research, and engineering problems of a computational- and data-intensive world.
Today, US government agencies recognize that the scientific, biomedical and engineering research communities are undergoing a profound transformation with the use of large-scale, diverse, and high-resolution data sets that allow for data-intensive decision-making, including clinical decision making, at a level never before imagined. New statistical and mathematical algorithms, prediction techniques, and modeling methods, as well as multidisciplinary approaches to data collection, data analysis and new technologies for sharing data and information are enabling a paradigm shift in scientific and biomedical investigation. Advances in machine learning, data mining, and visualization are enabling new ways of extracting useful information in a timely fashion from massive data sets, which complement and extend existing methods of hypothesis testing and statistical inference. As a result, a number of agencies are developing big data strategies to align with their missions. This solicitation focuses on common interests in big data research across the National Institutes of Health (NIH) and the National Science Foundation (NSF).
This initiative will build new capabilities to create actionable information that leads to timely and more informed decisions. It will both help to accelerate discovery and innovation, as well as support their transition into practice to benefit society. As the recent President's Council of Advisors on Science and Technology (PCAST) 2010 review of the Networking Information Technology Research and Development (NITRD) [http://www.nitrd.gov/pcast-2010/report/nitrd-program/pcast-nitrd-report-2010.pdf] program notes, the pipeline of data to knowledge to action has tremendous potential in transforming all areas of national priority. This initiative will also lay the foundations for complementary big data activities -- big data infrastructure projects, workforce development, and progress in addressing complex, multi-disciplinary grand challenge problems in science and engineering.
RELATED URLS
![]()
Frequently Asked Questions (FAQs)
![]()
1st BIGDATA Webinar (Presentation, Audio File and Transcript)
![]()
2nd BIGDATA Webinar - May 21, 2012 (Presentation, Audio File and Transcript)
![]()
THIS PROGRAM IS PART OF
![]()
Additional Funding Opportunities for the CCF Community
![]()
Additional Funding Opportunities for the CNS Community
![]()
Additional Funding Opportunities for the IIS Community
![]()
Special Research Programs
What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
Map of Recent Awards Made Through This Program
|
Event 2nd BIGDATA Webinar
Source: http://www.nsf.gov/events/event_summ...24212&org=CISE
May 21, 2012 NSF
The National Science Foundation and the National Institutes of Health invite you to attend a webinar to learn more about their joint Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) solicitation -- NSF 12-499:http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.htm.
The BIGDATA solicitation aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life.
The phrase "big data" in this solicitation does not refer just to the volume of data, but also to its variety and velocity. Big data includes large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources.
The focus is on core scientific and technological advances (e.g., in computer science, mathematics, computational science and statistics). Proposals that focus primarily on the application of existing methods (e.g., machine learning algorithms, statistical analysis) to data sets in a specific science domain or on implementation of software tools or databases based on existing techniques are not appropriate for this solicitation.
NIH-specific information can be found at: http://grants.nih.gov/grants/guide/n...GM-12-109.html.
An FAQ about the solicitation is available at:http://www.nsf.gov/pubs/2012/nsf12070/nsf12070.jsp
This webinar is designed to describe the goals and focus of the BIGDATA solicitation, help investigators understand its scope, and answer any questions potential Principal Investigators (PIs) may have.
MY NOTE: See Presentation Below
What Has Been Funded (Recent Awards Made Through This Program, with Abstracts)
Source: http://www.nsf.gov/awardsearch/progS...Search#results
Downloaded Spreadsheet (XLS)
Requested Export All Results in XML Format
Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
Source: http://www.nsf.gov/attachments/12405...ay8with508.pdf (PDF)




































































Comments