Table of contents
  1. A Semantic Web Strategy for Big Data
  2. Turning Big Data into Big Benefits
    1. LOOKING PAST THE HYPE
    2. MAPREDUCE IS NOT OUR SILVER BULLET
    3. REPLACING FEAR WITH DISCIPLINE
    4. IN THIS ISSUE
    5. ENDNOTES
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8
  3. The Semantic Web
    1. A Learner’s Guide to the Semantic Web
    2. Semantic MEDLINE: A Proof of Concept
  4. Big Data is a Big Deal
    1. Slide 1 Title
    2. Slide 2 Member Agencies
    3. Slide 3 Quotes
    4. Slide 4 Definition
    5. Slide 5 Big Data Senior Steering Group
    6. Slide 6 Vision
    7. Slide 7 Goals
    8. Slide 8 Core Technologies
    9. Slide 9 March 29 White House Event 
    10. Slide 10 Big Data Fact Sheet
    11. Slide 11 Challenges and Competitions
    12. Slide 12 Recap of Challenge
    13. Slide 13 Domain Research Projects
    14. Slide 14 Workforce Development
    15. Slide 15 Contacts
  5. Action Items
    1. Compile initial list of Big Data initiatives and disseminate via site
    2. Highlight active and planned initiatives/issues raised from ELC Big Data Track
    3. Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
  6. Big Data (BD SSG)
    1. Overview
    2. Scope
    3. Functions
  7. January 8, 2013 Meeting Notes
  8. December 5, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
  9. November 7, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
    3. Agenda Item Discussions
  10. My Meeting Notes
  11. October 18, 2012 Meeting Notes
    1. Introduction
    2. Meeting Agenda
    3. Meeting Notes
  12. October 18, 2012 Meeting Invitation
    1. Introduction
    2. Big Data Committee Objectives
    3. Agenda
  13. Big Data Committee Charter
    1. Authority
    2. Mission
    3. Objectives
    4. Membership
    5. Areas within Scope
    6. Expected Outcomes
    7. Roles & Responsibilities
    8. Meetings and Communications
    9. Key Relationships
    10. Timeframe and Schedule
    11. Operating Principles
    12. Governance Structure
    13. Leadership
    14. Timeline for Review and Approval
  14. EMERGING TECHNOLOGY SIG NEWSLETTER
    1. ET SIG Mission
    2. News
    3. Announcements

Emerging Technology SIG Big Data Committee

Last modified
Table of contents
  1. A Semantic Web Strategy for Big Data
  2. Turning Big Data into Big Benefits
    1. LOOKING PAST THE HYPE
    2. MAPREDUCE IS NOT OUR SILVER BULLET
    3. REPLACING FEAR WITH DISCIPLINE
    4. IN THIS ISSUE
    5. ENDNOTES
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8
  3. The Semantic Web
    1. A Learner’s Guide to the Semantic Web
    2. Semantic MEDLINE: A Proof of Concept
  4. Big Data is a Big Deal
    1. Slide 1 Title
    2. Slide 2 Member Agencies
    3. Slide 3 Quotes
    4. Slide 4 Definition
    5. Slide 5 Big Data Senior Steering Group
    6. Slide 6 Vision
    7. Slide 7 Goals
    8. Slide 8 Core Technologies
    9. Slide 9 March 29 White House Event 
    10. Slide 10 Big Data Fact Sheet
    11. Slide 11 Challenges and Competitions
    12. Slide 12 Recap of Challenge
    13. Slide 13 Domain Research Projects
    14. Slide 14 Workforce Development
    15. Slide 15 Contacts
  5. Action Items
    1. Compile initial list of Big Data initiatives and disseminate via site
    2. Highlight active and planned initiatives/issues raised from ELC Big Data Track
    3. Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
  6. Big Data (BD SSG)
    1. Overview
    2. Scope
    3. Functions
  7. January 8, 2013 Meeting Notes
  8. December 5, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
  9. November 7, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
    3. Agenda Item Discussions
  10. My Meeting Notes
  11. October 18, 2012 Meeting Notes
    1. Introduction
    2. Meeting Agenda
    3. Meeting Notes
  12. October 18, 2012 Meeting Invitation
    1. Introduction
    2. Big Data Committee Objectives
    3. Agenda
  13. Big Data Committee Charter
    1. Authority
    2. Mission
    3. Objectives
    4. Membership
    5. Areas within Scope
    6. Expected Outcomes
    7. Roles & Responsibilities
    8. Meetings and Communications
    9. Key Relationships
    10. Timeframe and Schedule
    11. Operating Principles
    12. Governance Structure
    13. Leadership
    14. Timeline for Review and Approval
  14. EMERGING TECHNOLOGY SIG NEWSLETTER
    1. ET SIG Mission
    2. News
    3. Announcements

  1. A Semantic Web Strategy for Big Data
  2. Turning Big Data into Big Benefits
    1. LOOKING PAST THE HYPE
    2. MAPREDUCE IS NOT OUR SILVER BULLET
    3. REPLACING FEAR WITH DISCIPLINE
    4. IN THIS ISSUE
    5. ENDNOTES
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8
  3. The Semantic Web
    1. A Learner’s Guide to the Semantic Web
    2. Semantic MEDLINE: A Proof of Concept
  4. Big Data is a Big Deal
    1. Slide 1 Title
    2. Slide 2 Member Agencies
    3. Slide 3 Quotes
    4. Slide 4 Definition
    5. Slide 5 Big Data Senior Steering Group
    6. Slide 6 Vision
    7. Slide 7 Goals
    8. Slide 8 Core Technologies
    9. Slide 9 March 29 White House Event 
    10. Slide 10 Big Data Fact Sheet
    11. Slide 11 Challenges and Competitions
    12. Slide 12 Recap of Challenge
    13. Slide 13 Domain Research Projects
    14. Slide 14 Workforce Development
    15. Slide 15 Contacts
  5. Action Items
    1. Compile initial list of Big Data initiatives and disseminate via site
    2. Highlight active and planned initiatives/issues raised from ELC Big Data Track
    3. Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
  6. Big Data (BD SSG)
    1. Overview
    2. Scope
    3. Functions
  7. January 8, 2013 Meeting Notes
  8. December 5, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
  9. November 7, 2012 Meeting Notes
    1. Introduction
    2. Meeting Notes
    3. Agenda Item Discussions
  10. My Meeting Notes
  11. October 18, 2012 Meeting Notes
    1. Introduction
    2. Meeting Agenda
    3. Meeting Notes
  12. October 18, 2012 Meeting Invitation
    1. Introduction
    2. Big Data Committee Objectives
    3. Agenda
  13. Big Data Committee Charter
    1. Authority
    2. Mission
    3. Objectives
    4. Membership
    5. Areas within Scope
    6. Expected Outcomes
    7. Roles & Responsibilities
    8. Meetings and Communications
    9. Key Relationships
    10. Timeframe and Schedule
    11. Operating Principles
    12. Governance Structure
    13. Leadership
    14. Timeline for Review and Approval
  14. EMERGING TECHNOLOGY SIG NEWSLETTER
    1. ET SIG Mission
    2. News
    3. Announcements

A Semantic Web Strategy for Big Data

   

The ACT-IAC ET SIG Presentations/Discussion, November 29, 2012, GSA Willow Wood Office, 10304 Eaton Place, Fairfax, VA 22030. Attendance is free.

Agenda
Welcome:
–John Geraghty: MITRE and ET SIG Chair
–Scott Larkin: Advanced Systems COO and Host
–Johan Bos-Beijer: GSA, Government Chair of the Big Data Committee, and SIG GAP
 
Agenda:
–8:30 – Networking
–9:00 - ET SIG Business Meeting – John Geraghty
–9:30 – Semantic Technology Presentations / Discussion - Brand Niemann (Semantic Community) (Slides),  Eric Little (Orbis Technologies) (Slides) and Victor Pollara (Noblis) (Slides)
Remarks by Dr. Tom Rindflesch (NLM) and Dr. George Strawn (OSTP/NITRD)
–10:40 - Close
 
For more technical details see the previous discussion and the ET Big Data Committee Knowledge Base

Purpose
  • Current projects that are implementing Semantic Web Standards and Technologies. 
  • Reusable solutions for datasets communicated between machines (Big Data).
  • Coordination with the Big Data Committee, Collaboration & Transformation SIG and the Advanced Mobility Working Group.
  • Thoughts on next steps and resources for interested organizations.
 
The Semantic Web
 
The Semantic Web, a collection of technologies designed to add meaning and enable intelligent search across the Web, is now taking shape as a strategy to help navigate the challenges of Big Data, including how to find insights and leverage the ever-increasing volume of data available to us from sensors, social media posts, videos, pictures, and purchase transactions, just to name a few.
Source: Turning Big Data Into Big Benefits, MAKE IT A TRIPLE -A Semantic Web Strategy for Big Data by Frank P. Coyle
Cutter IT Journal Vol. 25, No. 10 October 2012.

Turning Big Data into Big Benefits

Source: http://www.cutter.com/content-and-an...roduction.html

 
In This Issue

by Ralph Hughes

Interest in Big Data analytics (BDA) has certainly skyrocketed in the past few years to reach a fevered pitch, with the market for this technology projected to reach a 58% compounded annual growth rate over the next five years.1 Indeed, when I walked the vendor exhibit halls at several TDWI World Conferences during the past year, it seemed that nearly all the application vendors had introduced a new package offering a "Big Data" solution. At every booth, plenty of curious attendees lined up to hear about these new features. The vendors were certainly happy for the attention, but they also confided to me that they had grown tired of answering the same question day after day, namely "What is Big Data?"

I believe this lament is actually more emblematic of the state of BDA today than any particular solution being offered. When vendors rush to cater to needs that many customers do not yet understand, are we at risk of solving the wrong problem or cementing in place a basic strategy we will later regret? Perhaps at this early juncture we should carefully dodge the hype about Big Data and offer a sober appraisal of this new technology before acting.

LOOKING PAST THE HYPE

Industry pundits, in the area of data warehousing at least, take a jaundiced view of the buzz surrounding Big Data. "When haven't business intelligence applications had to deal with 'Big Data'?" they ask. Any type of data requires deliberate engineering to acquire, store, summarize, and present it in way that generates business insights. The cynics among us discount the fever over Big Data as a vendor-stoked overreaction to a few white papers by computer science wonks at Google and Yahoo! who found a couple of processing shortcuts while taming their own flood of Web stream data. These cynics see Big Data as a craze that will quickly fade.

Such skepticism might be too extreme, however. New technologies do frequently follow quick lifecycles, but several considerations suggest that Big Data represents a sea change for enterprise information. With the cost of processing and data storage falling so rapidly each year, our society no longer seems constrained as to the amount of information it can create and retain. Today's burgeoning numbers of online users now leave a trail of "digital exhaust" as they cruise social networking sites; e-commerce continues to grow at 35% per year; and RFID tags are steadily appearing on wholesalers' pallets and manufacturers' products. We are entering the "Internet of Things," in which phones, cars, trains, and planes -- plus process controllers, appliances, and medical devices -- all transmit a steady stream of data for interested parties to mine. Even dairy cows now sport portable monitors announcing when they come into heat.2 The data our society generates in a single year recently surpassed a zettabyte (a trillion gigabytes), which is a hundred million times more information than is contained in the print collection of the US Library of Congress -- and this onslaught is doubling every two years.3

Naturally, people worry about how much of this data they should capture, manage, and analyze. We frequently read about creative entrepreneurs discovering riches hidden in this information. For example, companies can now measure customer sentiment toward their products by mining the comments, ratings, and even images shared on the Web. They can correlate these sentiment statistics with purchase records provided by loyalty programs at grocers and retail stores, empowering marketers to customize advertising campaigns for individual consumers. As we move between websites today, we encounter a sequence of offers that are so subtle they go unnoticed but are so aligned with our individual preferences and behavioral triggers that we are almost certain to buy. With world Internet usage quintupling per decade,4 there is no upper limit on the number and value of new business opportunities for those who can bend the swelling flood of data to their purposes. In this context, the frenzied interest in Big Data makes sense because the power of such analytics has been proven, and rational companies should be actively seeking to profit from it.

MAPREDUCE IS NOT OUR SILVER BULLET

Unfortunately, the best method of channeling this informational deluge is far from clear, because the term "Big Data" has not yet been well defined. Big data analytics is frequently described as the management of information volumes much larger than our ordinary data management tools can handle. Pundits usually refer to Doug Laney's "3Vs" 5 -- volume, velocity, and variety -- which will be explored in the articles in this issue. Yet the 3Vs are only a description of the problem, one that leaves most of us searching for an industry standard approach proven to overcome the challenge. Such a search does not uncover a single direction, however, but instead a myriad of competing strategies. Despite the fact that experts have been discussing Big Data for over 10 years now, the field is still very new, and for all the urgency we feel, no silver bullet yet exists.

The most commonly cited solution for BDA involves a technology pioneered by the large Internet search engines, called "MapReduce" (MR). So frequently do Big Data conversations gravitate to MR that Hadoop, the open source implementation of MapReduce, is now a standard component of most mainstream databases. 67

Yet MapReduce is not a universal solution to all Big Data problems, for several reasons. First, it solves only problems that can be formulated in terms of key-value pairs. This approach is capable of some powerful insights, but it has a distinct sweet spot that generally requires the input data to be already assembled into a flat file. Second, as anyone who has tried to join multiple tables using MR (or even wrestle it into printing "Hello, world!") can tell you, MR is not a general solution to many common data management challenges. Third, the interface to MR data stores is fairly primitive in comparison to the standard DBMSs -- a team must know Java well. Fourth, attempts to provide an SQL-like querying tool for MR still lack many ANSI SQL-92 commands and other common SQL extensions. Fifth, solid MR programmers are difficult to find, so the added cost and risk of building MR applications can far exceed the investment required by the many alternatives.

Because MapReduce is not the only solution available for high data volume, velocity, and variety, a solid Big Data strategy should look at the other technologies. There are many more columnar databases available today than there are MR implementations. Many of these columnar DBMSs are imbedded in data warehouse appliances that allow our existing business intelligence (BI) applications to handle very large volumes of data using a standard SQL interface. Furthermore, many columnar databases are more mature than MR, allowing Big Data applications to be designed and developed by developers with more typical skills. For organizations willing to consider newer offerings by smaller vendors, there are also the numerous types of Big Data solutions found in the NoSQL ("Not Only SQL") universe, such as key-value pair databases that do not require MR programming; graphic databases that use "triples" rather than key-value pairs; and in-memory relational databases that settle for "eventual consistency" in the interest of very fast read-write operations. These products, too, often look more like our traditional tools, making them easier to work with, and several of them can tackle analytical questions that MR cannot begin to address.

REPLACING FEAR WITH DISCIPLINE

Given the limits of MapReduce and the presence of many alternative solutions, it is odd that so many conversations about Big Data turn instantly to Hadoop. This knee-jerk reaction is driven mostly by fear. Both business and IT executives feel threatened by the accelerating flood of data coming from a proliferating number of sources. They worry that they should be doing something creative and profitable with it today, before competitors blindside them with new capabilities. They naturally want to start storing everything now, even if they cannot articulate the value for this information, and they hope against all odds that grabbing hold of this information is going to be quick and easy. Indeed, Forbes 8 notes that Big Data today is ill-definedintimidating, and immediate(i.e., demanding action now) -- all of which adds up to a set of "3 Is" that may be more important to consider than the 3 Vs.

A more sober view of the situation might suggest that data streams in the exabytes are only another chapter in data management, just as terabytes and petabytes challenged us in previous decades. We must remind ourselves that new technologies frequently get overhyped by the media and vendors, and that our search for a silver bullet often leads to profound disappointment. We will need time and discipline to see what Big Data can realistically offer. A disciplined approach should begin with compelling use cases that express clearly attainable business impacts. Only by articulating realistic objectives can we rationally choose a technical solution from the several competing Big Data technologies. Moreover, any Big Data solutions must integrate into our existing strategies for "not-so-Big Data," so that the information flood from the coming "Internet of everything" calmly fills our carefully architected BI ecosystems with usable data rather than washing them away.

IN THIS ISSUE

The articles selected for this issue of Cutter IT Journal provide a handy opportunity to conduct that sober evaluation of Big Data technology. The discussion first provides a solid introduction to the world of BDA and then explores a set of important extensions of the technology. Richard Walsh, Richard O'Callaghan, and Sabine Yoffou start off our collection by systematically defining Big Data so that we can begin successfully planning a serious implementation effort. Next, IBM's Matthew Ganis and Avinash Kohirkar examine one of the most common uses of BDA, namely mining social media discussions. Rich Johnson and Ron Zahavi of Microsoft then address the essential topic of incorporating this new style of analytics into our traditional data warehousing programs, so that we end up with well-integrated BI platforms.

The theme of extending Big Data technology begins with Frank Coyle, who discusses one of the primary competitors to MapReduce -- the RDF triple, which will someday soon enable the Semantic Web. Holly Korda, Ann Magee, and Lori Damiano then explore Big Data's potential in a specific industry, showing how it can be leveraged to bring transparency and accountability to the world of healthcare. Finally, Saeed Lajami, Anson Mok, Mario Wahyu Prabowo, and Cutter Senior Consultant Sara Cullen provide an interesting alternative for our solutions toolkit by advocating the use of crowdsourcing to solve Big Data challenges.

Together these articles introduce insights of breadth and depth into the new and quickly evolving world of BDA. We hope they will help you begin to explore and understand how this technology can solve what will be some of IT's most pressing challenges for the foreseeable future.

ENDNOTES

2

Tagliabue, John. "Swiss Cows Send Texts to Announce They're in Heat." The New York Times, 1 October 2012.

4

Internet World Stats (www.internetworldstats.com/stats.htm).

5

Laney, Doug. "3D Data Management: Controlling Data Volume, Velocity, and Variety." Meta Group, 2001.

6

Groenfeldt, Tom. "Microsoft Does Big Data -- Hadoop on Windows." Forbes, 5 June 2012.

8

Feinleib, Dave. "The 3 I's of Big Data." Forbes, 9 July 2012.

The Semantic Web

Source: http://www.cendi.gov/minutes/pa_1111.html

A Learner’s Guide to the Semantic Web

Dr. George Strawn, NITRD (PDF)

The web comprises web pages linked together. Links are crucial to what the web is. The pages have information for humans to read. While HTML has hidden metadata, it is basically designed for people to read. By contrast, the semantic web is data for computers to read with semantic searches yielding answers, not just pages that may have answers. In this sense, it is more like a relational database system.

 
Semantics refers to the meaning as opposed to syntax which refers to the form. A key element of the semantic web is the ability to use inference across the meaning. Semantic web might have been appropriately called the inferred web, the computed web or the atomic web. It stores all your data in atomic format and is a remarkable new way to federate data (combine datasets, merge data, do mash-ups, etc.).
 
Dr. Strawn went on to explain not only what the semantic web can do but how it does it.  He said that the semantic web is an attempt to graph traditional studies of knowledge systems and language understanding onto the web platform to enable “meaningful data”. Traditional computer science built the intelligence into the programming code rather than into the data resulting in the need to constantly develop and make changes. The semantic web is a step to putting the intelligence into the data resulting in simpler code and increased data re-usability.
 
While the traditional web links pages to pages, the semantic web links data elements to data elements (nouns linked to nouns by links labeled by verbs).  Currently, there are unnamed links on the web that require human interpretation. The named links allow the computer to make some decisions.
 
The semantic web uses the URL (or URI) naming system to create globally unique names for identified nouns and verbs in a text or table. This subject-predicate-object structure is referred to as an rdf triple. A semantic web database is a set of rdf triples in a triple store. Converters and scrapers can be used to create them.
 
RDF triple stores have asserted triples, instruction triples and inferred triples. There is an inference and query engine over the triple store which is accessed by an application. Inference engines are then used to create more triples that can be inferred from the explicitly stated relationships. In text, the rdf triples are extracted from key sentences by natural language processing. With tables, triples are created from the key name_key value which becomes the subject. The column names become the predicate and the table value for the column becomes the object. It should be noted that the rdf for a text is smaller than the text and the rdf for a table is larger than the table. Because data from both text and tables are transformed into triples, it is a way to bring structured and unstructured data together.
 
Storage is no longer the issue it has been. However, the question is whether you can get the information back out of the system. Part of the process of developing triples is to discard triples that don’t have much meaning. It is also possible to graphically represent triples.
 
Inferencing is as much about classes and properties as it is about rdf triples. Classes are sets of rdf subjects/objects, and properties are sets of rdf predicates. An ontology is made up of classes and predicates.  It can be thought of as a graph where the nodes are classes and the arcs are labeled by properties. This ontology is referred to as the semantics of the domain that is being described.
 
Linked data is an approach that is based on linked URIs rather than full-blown ontologies. In Germany, professors and students are doing DBpedia, a semantic version of the tables that are already in Wikipedia. The whole field seems to be tending back from the probabilistic aspects of text mining to more traditional linguistic methods or a hybrid approach. The semantic web may be ready to move from the experimental phase to the early adopter phase, but the question is “what is the motivator”?
 
The vocabulary for describing classes as well as the relationships, or properties appropriate for the domain is always a bottleneck in the ontology development. Dr. Strawn would like to see UMLS-like work done in other disciplines. The CENDI work in the terminology area could serve as a starting point.
 
The business world is already doing some of the vocabulary work needed to make linked data and the semantic web a reality. Dr. Strawn would like to see STI catch up in order to make the scientific record more useful and adequately include both the data and the document. 

Semantic MEDLINE: A Proof of Concept

Dr. Thomas Rindflesch, National Library of Medicine (See Semantic Medline)
 
Semantic MEDLINE is a proof of concept to improve access to the wealth of textual resources available through PubMed by adding semantic technologies. The current document retrieval systems, such as Google and PubMed, manipulate textual tokens, include frequency of occurrence or distribution patterns, but the system doesn’t actually “know the meaning” of what it accesses. Queries, as well as the text they operate against, are seen as strings of numbers.
 
There are emerging applications in the academic world where text mining is being done to extract facts and observe trends, connect text and structured data, perform question answering, and assist in literature-based discovery. These applications require more effective language processing and automatic semantic interpretation.
 
Automatic semantic interpretation requires mapping of something that is expressed in some kind of representation to something more abstract such as an ontology. Automatic semantic interpretation can augment but not supplant traditional document retrieval systems, manipulating information and not just documents. The goal is to bridge the gap between language in the text and meaning. This is like having a research assistant working for you. In the final analysis, you still have to look at the text. These same principles can be applied to other domains if you have the terminology.
 
Semantic MEDLINE sits on top of PubMed which is the retrieval engine for MEDLINE. Traditional PubMed searching is used to retrieve citations, including abstracts, which are sent to a natural language processing system. Abstracts are being used now. Without additional knowledge about the information expressed in full text, you would get much more of an information tsunami. Processing full text would require a different level of processing and an understanding of the discourse structure of full text.
This system creates semantic relationships in rdf triples. Automatic summarization techniques are also used to eliminate useless statements. A graphical summary is created which presents a lot of information in a more accessible human format.
 
The Semantic MEDLINE process is based on the NLM’s Unified Medical Language System (UMLS). It has three key components: a purely linguistic lexicon of more than 430,000 medical and general English terms; a Metathesaurus of more than 2 million biomedical concepts and synonyms (nouns put into semantic types or sets); and a Semantic Network of approximately 135 semantic types and 50 verb relationships, which provide classes of relationships between concepts. This Semantic MEDLINE processes use these UMLS components. Natural language processing using linguistic techniques based on language structure is performed sentence by sentence through the abstracts retrieved by the traditional PubMed searching. The nouns are created using terms in the UMLS Metathesaurus Concepts. The resulting nouns are controlled based on the way they are stated in the UMLS. The UMLS Semantic Network controls the predicates based on the relationships between the concepts in the Semantic Network. 
 
Ontological relationships are the core aspects of a domain ontology. How do we conceptually cut up this area of the world, and what can we say about it? Humans working in a particular domain know them, but we must express them explicitly in order for the computer to use them. Finding new ontologies and terminologies is a new role for libraries and librarians.
 
Semantic MEDLINE was initially developed for clinical medicine. It has been extended to pharmacogenomics, influenza epidemic preparedness, and the genetic etiology of disease. Dr. Rindflesch showed an example using a clock gene which would allow a researcher to identify, through a search of the literature using PubMed and Semantic MEDLINE, that there is a connection between cancer and obesity.
 
NLM is currently working on extension of Semantic MEDLINE in the areas of public health and climate and health. (Dr. Donald Lindberg, the Director of the NLM, is on an interagency committee related to climate and health). There are prospects of extending beyond biomedicine. 
 
The system has been run on the last 10 years work of the MedLine database, or about 1/3 of the MedLine database. The results can be made available as an rdp triple store or as a traditional SQL database. They have performed mid- to large-scale evaluations. What should the system extract from these documents and how should they be represented in the UMLS language? Precision is around 75 percent (lower for molecular biology), while recall is about 60 percent.
 
The interface allows the user to use the graphical visualization and then view the linked text sentence from the PubMed display. The system does not provide the answers for you but makes it easier to identify and sort through the content. It facilitates literature-based discovery, the observation of trends, and decision making, especially portfolio analysis. This is particularly important for researchers who are interested in related or new fields with which they are somewhat unfamiliar.

Big Data is a Big Deal

Wendy Wigen, Technical Coordinator, National Coordination Office for NITRD

Slide 1 Title

WigenBigDataMay2012Slide1.png

Slide 2 Member Agencies

WigenBigDataMay2012Slide2.png

Slide 3 Quotes

WigenBigDataMay2012Slide3.png

Slide 4 Definition

WigenBigDataMay2012Slide4.png

Slide 5 Big Data Senior Steering Group

WigenBigDataMay2012Slide5.png

Slide 6 Vision

WigenBigDataMay2012Slide6.png

Slide 7 Goals

WigenBigDataMay2012Slide7.png

Slide 9 March 29 White House Event 

WigenBigDataMay2012Slide9.png

Slide 10 Big Data Fact Sheet

http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.pdf (MY NOTE: Same as Slide 8 above.)

WigenBigDataMay2012Slide10.png

Slide 11 Challenges and Competitions

WigenBigDataMay2012Slide11.png

Slide 12 Recap of Challenge

WigenBigDataMay2012Slide12.png

Slide 13 Domain Research Projects

WigenBigDataMay2012Slide13.png

Slide 14 Workforce Development

WigenBigDataMay2012Slide14.png

Slide 15 Contacts

WigenBigDataMay2012Slide15.png

Action Items

Compile initial list of Big Data initiatives and disseminate via site

As Knowledge Capture Chair you would coordinate the creation and publication of any program content for the committee along with taking minutes of meetings and distributing the content to membership for corrections and additions.

I'm going to send a note out shortly to the membership to capture ideas for a pilot and panel/symposium ideas.  As the Knowledge Capture Chair, you would coordinate with the membership on the development of this content and would have the opportunity to take part in the closed-door session with government to capture their insights and report back to the committee.

Big Data Committee Work Area: http://www.actgov.org/sigcom/SIGs/SIGs/ETSIG/BigData/pages/default.aspx

I have already got the following Big Data Sets:

Agency Contact Subject Topic
CIA Gus Hunt CIA World Fact Book Intelligence Community (Unstructured and Structured Data Integration)
NGIA Leticia Long Quint Intelligence Community (New Mission)
HHS/CMS Niall Brennan and Scott Depuy Medicare for IOM and SEER Topic 1 - Large Scale Records Management in Health Care
OSTP Todd Park Health, Safety, and Energy Data Audit of New Data.gov Communities
NLM and OSTP Tom Rindflesch and George Strawn Semantic Medline Topic 4 - Conducting Extreme Scale Semantic Data Analysis
GMU Professor Kirk Borne Using Data Science Evidence in Public Policy for Big Data and ElectionsSpotfire Learning Network for Class Projects, and County Health Rankings Federal Big Data Senior Steering Group Work Force Training
EPA Malcolm Jackson EnviroFacts and Indicators Environmental Data
GSA Marie Davie and Johan Bos-Beijer Governmentwide Acquisition Contract (GWAC) Dashboard Topic 3 - Addressing Large Scale Fraud, Waste & Abuse in Federal Procurement and Entitlements Programs
Energy Peter Tseronis DISRE Solar (Government Information and Analytics Summit) Topic 2 - Cybersecurity (Victor Pollara, Noblis)
Treasury Bureau of Public Debt Thomas Vannoy? Bureau of Public Debt Financial Data
NASA Deborah Diaz IN PROCESS Many to select from
State Alex Ross Recorded Future Protests Topic 5 - Social Media Data Management
DC Government Office of the CTO Data Catalog and 311 Message Services Open Data Handbook
 
It would be interesting to have an agenda item for our next meeting for us to discuss opportunities and limitations in using academic labs/forums for the evaluation of methods etc.  While PII and other privacy provisions restrict the type of detailed information, it would be good to have a discussion around formulating methods and approaches that would enhance public sector objectives.

Highlight active and planned initiatives/issues raised from ELC Big Data Track

I will do the knowledge capture for the ELC Big Data Track:

http://semanticommunity.info/AOL_Government/ACT-IAC_2012_Executive_Leadership_Conference

MY NOTE: This was cancelled and I was told to wait before pursuing this.

Since the ELC was cancelled and I did not get to do the knowledge capture for that, I have the following suggestion:

Use the list of ELC Big Data Panel individuals in the attached spreadsheet to ask them the following:

What were they going to say?

What big data sets do they have?

What big data sets could be used for a pilot?

If you agree, then we would need all of their email addresses (John Shaw?) and a formal email from the ACT-IAC ET SIG (Johan and Mile?) requesting the answers.

Then I would follow-up to get the information in a form that we could use for several meetings with pilots.

Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee

I have done the following AOL Gov stories recently on Big Data:

http://gov.aol.com/2012/10/10/big-data-reaches-the-hill-a-guide-to-making-it-more-actionable/

http://gov.aol.com/2012/10/11/what-the-white-house-learned-from-linkedin-and-the-use-of-big-da/

http://gov.aol.com/2012/10/15/open-government-data-and-statistical-data-havent-we-been-here/

http://gov.aol.com/2012/10/30/tempor...-intelligence/

I am working on a White Paper for the ET SIG Agile Committee that includes Big Data Analytics:

http://semanticommunity.info/AOL_Government/ACT-IAC_Agile_Development#Story

I am very familiar with the IRS efforts and it would be one of a few good examples to use in our upcoming symposium.  One of the reasons we had included analytics in the charter was to be sure to be able to capture practitioner real use examples, often in their infancy or developing a succession of wins strategy.

This is an automated email to let you know that a new entry has been posted to the blog for Revenue & Finance Update.
"IRS Project Shows How Analytics Can Improve Performance" by Susan Gogos
http://blogs.mitre.org/blogs/permalink.cfm?username=RevenueFinanceUpdate&id=37008

MY NOTE: Link does not work and article requested

Big Data (BD SSG)

Source: https://connect.nitrd.gov/nitrdgroups/index.php?title=Big_Data_(BD_SSG)

Overview

Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. – Wikipedia, May 2011

The Big Data Senior Steering Group (BD SSG) has been formed to identify current big data research and development activities across the Federal government, offer opportunities for coordination, and begin to identify what the goal of a national initiative in this area would look like. As data volumes grow exponentially, so does the concern over data preservation, access, dissemination, and usability. Research into areas such as automated analysis techniques, data mining, machine learning, privacy, and database interoperability are underway at many agencies and will help identify how big data can enable science in new ways and at new levels. The science of data includes the processes of turning data into knowledge, data mining and visualization, interoperability, search and discovery, and semantics.

Scope

BD SSG was formed to identify programs across the Federal government and bring together experts to help define a potential national initiative in this area. BD SSG has been asked to identify current technology projects as well as educational offerings, competitions, and funding mechanisms that take advantage of innovation in the private sector.

Functions

Current functions and activities include:

  • Collecting information on current activities across the Federal Government.
  • Creating a high-level vision of the goals of a potential national initiative.
  • Developing the appropriate documents and descriptions to aid discussion within the government, and where appropriate, the private sector.
  • Developing implementation strategies that leverage current investments and resources.

 

January 8, 2013 Meeting Notes

Preliminary

And below is the invitation to the upcoming “Data Innovation Day” event later this month in D.C. that I mentioned on the call that is going to focus on issues/topics related to Data Innovation and Government.

Shannon L. Kellogg

Amazon.com

Director of U.S. Public Policy, Amazon Web Services

e-mail: shannonk@amazon.com

phone: 703-309-9636

Data Innovation in Government

THURSDAY, JANUARY 24, 2013 
9:00 AM - 10:30 AM
The Information Technology and Innovation Foundation
1101 K Street NW
(Suite 610A)
Washington,
DC
20005

As part of Data Innovation Day, ITIF will host a panel discussion on how government agencies are using data to make government work more effectively and efficiently. This panel will discuss recent examples of data innovation in the federal government, including efforts to open government data sets and collaborate with the private sector. Join us for a conversation with experts from the public and private sectors who are leading these changes.

Follow the Data Innovation Day conversation with #datainnovation.

PARTICIPANTS:

 

This is what Isaiah Goodall just mentioned on the phone call in case it got buried in your holiday emails:

Knowledge Capture in preparation for our next meeting:

http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data

With Upcoming: http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data#Upcoming

Cross –Walk Table of Big Data Pilot Projects and Activities:

http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data#Cross-Walk_Table

For use with the Data Transparency Coalition on the new Data Act for the 113th Congress:

http://semanticommunity.info/DataTransparencyCoalition.org

Dr. Brand Niemann

Director and Senior Data Scientist

Semantic Community

http://semanticommunity.info

http://gov.aol.com/bloggers/brand-niemann/

703-268-9314

December 5, 2012 Meeting Notes

Introduction

All, See below for tomorrow’s agenda.   Also attached are the meeting notes from our December meeting.   
For those who missed it, last week an new article was posted for Big Data consumers:
 
Agenda:
 
Attendee Roll Call  & Introductions of New Members
Co-chair Opening Remarks 
BDC Government Symposium Planning
Cross SIG Activities & Upcoming ACT-IAC Membership Meeting – 1/16
Research & Discovery Updates
Review Action Items from Prior Meeting

Mile Corrigan, PMP, Six Sigma Blackbelt

Center of Digital Excellence Leader, Noblis

Meeting Notes

PDF

 
December 5, 2012 10:00 am – 11 am EST Call-in: 866.962.6634 Passcode: 1936132
 
Meeting called by: Johan Bos-Beijer, Government Chair
Mile Corrigan, Industry Chair
 
Attendees:
Mile Corrigan, Johan Bos-Beijer, Matt Salter, Ben Pecheux, Victor Pollara, Sterling Thomas, Aric Labarr,
John Beachboard, Mike Thorp, Karen Hogan, Brand Niemann, Tom McCullough, Renee Maisel, John Shaw,
Susan Stolting, Lucca Decchesi, Bill Brantley
 
Agenda Item Discussions
 
1. Attendee Roll Call & Introductions of New Members - Johan Bos-Beijer & Mile Corrigan
The BDC welcomes new members:
  • Frederick Walker, NSA
  • Dimitris Geragas, Gartner
  • Industry Interest received from Kforce and Booz Allen
  • Government and Academic Interest received from Treasury, MIT, and Stanford
 
2. Co-chair Opening Remarks - Johan Bos-Beijer & Mile Corrigan
John Shaw provided an overview of ACT-IAC and operating procedures
  • ACT-IAC is a non-profit organization to help educate and inform; ACT side covers full-time government members and IAC side covers industry
  • SIGs produce content to address issues to guide/advise government
  • Four main operating principles that must be adhered to at all times: (1) everything we do must be ethical, (2) everything we do must be transparent (3) government drives the agenda, and (4) everything we do must be vendor-neutral
  • These principles are key to create a safe haven for government and industry to collaborate together
 
3. ELC Updates - John Shaw
  • John Shaw provided updates regarding reprogramming for the Executive Leadership Conference. Dates have been distributed to panel track leads between Jan 15-February 28 to hold up to16 total program committee events (2-3 sessions) over 8 days. John will distribute further details regarding ELC reprogramming to Johan and Mile.
 
4. BDC Government Symposium Planning – Isaiah Goodall, Johan Bos-Beijer & Mile Corrigan
Isaiah Goodall, Symposium Planning Chair covered draft construct for the closed-door government-only half-day event:
  • The government-only Big Data Symposium will be focused on addressing current Federal Government’s big data challenges with dialogue-driven roundtables with participation from real big data practitioners and academia. The symposium will be focused on the following four (4) key challenges:
  • Making the business case for big data analytics / how to justify a big data strategy
    • Government lead/moderator: Dave Nelson, CMS (alternates: Ted Doolitte or Tony Trenkle) - NOT CONFIRMED
  • Training – how to get the right skill sets with lack of big data / analytics talent
    • Government lead/moderator: Susan Stolting, ICE, DHS – CONFIRMED
    • Academic roundtable member: Dr. Aric Labarr, NC State Institute for Advanced Analytics
  • Acquisition – how to form the right strategy to acquire big data analytics
    • Government lead: Kevin Youel Page – NOT CONFIRMED
  • Data Management / Security – how to manage and secure large and complex volumes of data
    • Government lead/moderator: Steven Hernandez, HHS OIG - CONFIRMED

Johan encouraged the committee to focus on commonalities across the 4 tracks and report out on similarity of process and purpose. Anyone interested in assisting Isaiah Goodall in the planning of the event should contact Mile Corrigan.

 
5. Review Action Items from Prior Meeting – Mile Corrigan
Mile Corrigan reviewed action items from prior meeting.
 
6. ACT-IAC and concurrent Big Data Initiatives – Open Discussion
  • December 12 - FCW & Tech America Agency Perspectives on Big Data
  • Mile Corrigan briefed the BDC Committee on the NVTC Task Force on Big Data, focused on economic development for the region to help distinguish NoVA as a big data leader
  • The NVTC task force is seeking to:
    • Organize a commission around the Big Data Initiative unique to NoVA region
    • Similar to Massachusetts model – matching grant program to encourage R&D in big data area, internship program to build talent, big data consortium includes academia, industry & government, test bed for big data projects, HPC center in Holyoke (public/private partnership), collaborative hack/reduce space in MIT to work on big data projects
    • 1 - Conference with NVTC membership to address big data topics
    • 2 - Would ideally like to connect the two initiatives to present to the state
  • Mile Corrigan inquired about potential collaboration opportunities with NVTC
    • Is there a potential opportunity for ACT-IAC to partner with NVTC given the mutual interest in federal government? Has ACT-IAC partnered with other associations in the past?
    • Renee Maisel suggested offline coordination with John Shaw.
 
Action Items Person(s) Responsible Status/Deadline
Compile initial list of Big Data initiatives and disseminate via site
Mile Corrigan
Brand Niemann
Ongoing
Highlight active and planned initiatives/issues raised from ELC Big Data Track
Mile Corrigan
Brand Niemann
Deferred
Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting BDC Members Complete
Outreach to Collaboration & Transformation SIG to discuss optimized collaboration
Mile Corrigan
Johan Bos-Beijer
Complete
Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties
Mile Corrigan
Johan Bos-Beijer
Complete
Distribute compiled list of initiatives to government workshop invitees

Isaiah Goodall

Matt Salter
In Progress
BDC member outreach communication for those who were unable to attend the working session on the 18th of October
Matt Salter
Johan Bos-Beijer
Mile Corrigan
Complete
Coordinate Government Workshop/Symposium

Isaiah Goodall

BDC Workshop Planning Group

In Progress
Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative

Isaiah Goodall

BDC Workshop Planning Group

In Progress
Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
Brand Niemann
Johan Bos-Beijer
Mile Corrigan
Late-November/early-December 2012
 

November 7, 2012 Meeting Notes

Introduction

Big Data Committee, Thank you for those who participated in our brainstorming session on 11/7. For those of you unable to make the last meeting, we generated a list of big data challenges facing the federal government to be discussed at an upcoming government-led symposium. Isaiah Goodall has stepped up to serve as the Big Data Government Symposium Chair and will lead subcommittee planning activities. Please reach directly to Isaiah for interest in this subcommittee and stay tuned for more information on the symposium coming soon!

Attached are the meeting notes – please let me know if there is anything I missed. MY NOTE: See below

On behalf of our government chair, Johan Bos-Beijer and the ET SIG, the ACT-IAC Big Data Committee welcomes several new members:

  • Aric LaBarr, Assistant Professor from NC State University – Institute for Advanced Analytics
  • Raman Marway, Business Intelligence, Analytics and Enterprise Performance Management Leader from Mythics Inc.
  • Shannon L. Kellog, Director of US Public Policy from Amazon Web Services
  • Paul Norcini, Civilian Accounts from MarkLogic Corporation
  • Ramon C. Barquin, President of Barquin International
  • Frederick Walker, Technical Director Knowledge Management – Office of Counterintelligence, US Department of Defense
  • Dimitris Geragas, Senior Director from Gartner

Mile Corrigan, PMP, Six Sigma Blackbelt

Center of Digital Excellence Leader, Noblis

Meeting Notes

PDF

10:00 am – 11 am EST Call-in: 866.962.6634 Passcode: 1936132
Meeting called by: Johan Bos-Beijer, Government Chair and Mile Corrigan, Industry Chair
Attendees: Matt Salter, Steve Hernandez, Paul Norcini, Sterling Thomas, Susan Stolting, Isaiah Goodall, John Beachboard, Luca Ducchesci, and Brand Niemann

Agenda Item Discussions

1. Attendee Roll Call & Introductions of New Members
  • Shannon Kellogg, Director of US Public Policy from Amazon Web Services
  • Paul Norcini from MarkLogic
  • Dr. Ramon Barquin, President of Barquin International
  • Aric LaBarr, NC State University
  • Raman Marway, Business Intelligence, Analytics and Enterprise Performance Management Practice Leader
  • Sterling Thomas from Noblis
  • Johan Bos-Beijer has also received additional interest received from Gartner, Forrestor, IBM, SAP, and US Department of Treasury
 
2. Co-chair opening remarks
  • Brand Niemann will serve as the BDC Knowledge Capture Chair. Brand has created a new Big Data page on Semantic Community – posted to the ACT-IAC website. The ACT-IAC website will still remain the primary portal for information dissemination and collaboration for the Big Data Committee.
 
3. Confirmation of new meeting date/time
  • BDC members confirmed that monthly meetings worked best with everyone’s schedules and that as needed, subcommittees and active working groups will meet more regularly on an ad-hoc basis.
 
3. Review Action Items from Prior Meeting
  • Mile Corrigan reviewed action items from prior meeting. Due to the cancellation of ELC, several actions have been deferred until ACT-IAC determines how to repurpose the big data track.
 
4. Research & Discovery Updates
  • John Geraghty provided several new articles that have been added to the “Resources” folder on the collaboration site:
 
5. ACT-IAC and concurrent Big Data Initiatives
  • November 15 - Mark Logic – Big Data Conference
  • November 29 – Semantic Web ET Meeting
  • November 28-29 – Government Information and Analytics Summit
  • December 6 – Meritalk Big Data Breakfast
 
6. BDC Government Workshop/Symposium Planning & Coordination
  • Mile Corrigan provided a brief summary of the October 18th meeting where Victor Koo proposed a closed-door event where government feels safe discussing problems and key issues with their peers. The BDC will narrow the list of ideas generated down to four key topics.
  • Isaiah Goodall volunteered to serve as the lead for the government symposium sub-committee and will lead planning activities for the event. Mile Corrigan and Johan Bos-Beijer will work with Isaiah to identify subcommittee members and government panel leads for down-selected topics.
  • The BDC team identified the following potential topics from the government perspective:
    • PII Aggregation – Steve Hernandez discussed how to leverage the high impact of billing records, desk records, & procedural history of medical data? How do you leverage big data for trust analysis and creditworthiness?
    • Classification of Data – How to predict the sensitivity of data as it moves from unclassified to classified, etc. This challenge was discussed by DoD in the Big Data track at the Smart Technology & Sustainability event in October.
    • Big Structured/Unstructured Data – what are the key government challenges?
    • Workforce training for data scientists – Lucca identified challenges with Hadoop and MapReduce and affordability of data scientists for federal agencies and cited DISA’s hiring needs for 80 data scientists. Susan Stolting referenced the need for workforce training in government per the TechAmerica report recommendations. How do we get the right skills/training in place? Can we possibly leverage work done by IE group?
    • Acquisition – Susan Stolting described acquisition challenges for government - how do we procure these products/services? How do you build a multi-year acquisition strategy?
    • Business case for Big Data – Several members discussed government challenges in developing the business case for big data (also related to acquisition) How do you justify your big data strategy? How do you embed analytics into business decisions?
  • The BDC discussed several emerging academic programs – Institute of Analytics at NC State, Teradata University, Information Studies program at Syracuse
  • Next Steps:
    • Narrow down list of topics for the symposium
    • Identify government leads for each topic
    • Identify list of government workshop invitees (DHS S&T, CMS HHS, others?)
 
Action Items Person(s) Responsible Status/Deadline
Distribute list of potential meeting dates/times for recurring BDC meetings Mile Corrigan Completed
Compile consolidated list of recommendations for next BDC meeting on November 7th Mile Corrigan 11/4/2012
Compile initial list of Big Data initiatives and disseminate via site
Mile Corrigan
Brand Niemann
In progress (11/7/2012)
Highlight active and planned initiatives/issues raised from ELC Big Data Track
Mile Corrigan
Brand Niemann
Deferred (11/7/2012)
Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting BDC Members 11/7/2012
Outreach to Collaboration & Transformation SIG to discuss optimized collaboration
Mile Corrigan
Johan Bos-Beijer
Mid-November (11/7/2012)
Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties
Mile Corrigan
Johan Bos-Beijer
Mid-November (11/9/2012)
Distribute compiled list of initiatives to government workshop invitees

Isaiah Goodall

Matt Salter

11/23/2012 (11/9/2012)
BDC member outreach communication for those who were unable to attend the working session on the 18th of October
Matt Salter
Johan Bos-Beijer
Mile Corrigan
Ongoing in October- November 2012
Coordinate Government Workshop/Symposium BDC Workshop Planning Group (TBD) mid-November 2012
Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative BDC Workshop Planning Group (TBD) Mid-November 2012
Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
Brand Niemann
Johan Bos-Beijer
Mile Corrigan
Late-November/early-December 2012

My Meeting Notes

November 29th - ET SIG Semantic Web (with Big Data) - see email I just forwarded November 28-29 - Government Information and Analytics Summit

http://www.govinfosummit.com/Events/2012/Home.aspx?utm_source=AttendeeMktg&utm_medium=E-Mail&utm_campaign=AX2

January 24, 2013, Federal Big Data Senior Steering Group http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline

https://connect.nitrd.gov/nitrdgroups/index.php?title=Big_Data_(BD_SSG)

November 14, Chief Data Scientist Summit http://www.theiegroup.com/IE_Group/Upcoming_events.html

November 14, Analytics Solution Center Seminar "Demystifying Big Data: Decoding the Big Data Commission Report"

https://ibm.biz/Bdx2tP

January 15-17, 2013, NIST announces the Cloud Computing AND Big Data Forum & Workshop http://www.nist.gov/itl/cloud/

December 11, Hot Topics in Big Data:  What You Need to Know Now! http://cendi.gov/activities/12_11_2012_CENDI_NFAIS_FEDLINK.html

Linkedin Big Data Group: http://www.linkedin.com/groups?homeNewMember=&gid=4520336&trk=eml-anet_wlcm-h-visit&ut=0fkucJklQ3Wls1

Linkedin Data Science Training Survey: http://www.linkedin.com/e/ipyoqf-h8xabwxn-18/vaq/180064857/4520336/102064273/view_disc/?hs=false&tok=2n59r-l8fPKRs1

Wendy Wigen at NSF is working on Workforce Training for the Big Data SSC and a list of universities and Professor Borne at GMU has a list in his slides: http://semanticommunity.info/@api/deki/files/18229/kirkborne-NIST-june2012.ppt

October 18, 2012 Meeting Notes

PDF

Introduction

As co-chairs, we want to thank everyone who attended the October 18th session for a very productive, idea filled and interaction time together.  Following the meeting we have received a number of excellent suggestions, expressions of interest in topics to cover, and recommendations for next steps as the ACT/IAC board determines how to repurpose the ELC work since that annual event had to be cancelled due to weather.  You will notice in the attached minutes and action items from our 10/18 meeting, that there is new and expanded content.  We will be contacting individual members to reaffirm your interests as well as make personal contact to ensure we are including relevant topics and participants to meet the BD Committee objectives.  In addition, we have new members from the government and academic sectors who have joined our committee eager to contribute insight, practitioner expertise and delve into the work sessions.  We are currently in communication with members of the legal profession in the government sector as well as contracting subject matter experts to ensure that our work aligns with compliance, policy and pressing needs of the agencies as they face data management, budgetary and accountability objectives over the next several years.  We have also had very productive discussion since our past meeting in determining how to be prepared in advance to be responsive to the outcome from the big data report to the Hill in January 2013.

We are available to field your questions, suggestions and observations as we move forward together.  We very much appreciate the value of the members who have been in contact with us and look forward to more interaction with you both in our regular meetings, or as the need arises between those sessions.

Johan Bos-Beijer

Government Committee Chair

Mile K. Corrigan

Industry Committee Chair

Meeting Agenda

1.    Attendee roll call & introductions of New Members

2.    Co-chair opening remarks

3.    Confirmation of new monthly meeting date/time

4.    Review Action Items and Outcomes since Prior Meeting

5.    Research & Discovery Updates from members

6.    ACT-IAC and concurrent Big Data Initiatives

7.    BDC Government Workshop/Symposium Planning & Coordination

8.    Upcoming Events

Meeting Notes

Oct. 18, 2012 9:30 am – 11 am EST
Noblis Innovation & Collaboration Center
Meeting called by: Johan Bos-Beijer, Government Chair and Mile Corrigan, Industry Chair
 
Registered Attendees:
Johan Bos-Beijer, Mile Corrigan, John Geraghty, Kimberly Gianni, Victor Koo, Steve Olshefski, John Shaw, Renee Maisel, Jim Soltys, Scott Larkin, Ben Pecheux, Victor Pollara, Mike Thorp, Kathleen McBride, Brand Niemann, Matt Salter
 
Agenda Item Discussions
1. Overview, Introductions & New Members
  • Johan Bos-Beijer and Mile Corrigan reviewed objectives of the committee including its government practitioner focus. The committee will work on identifying government’s big data challenges. Johan described a few examples - OPM’s data analysis for succession/human resource planning or consolidating grants management and student loan data in the finance sector, health sector benefits and management between VHA, DOD, and HHS, etc. What are the best outreach and engagement methods to identify big data problem areas and use case examples – fraud, waste & abuse, information assurance, security, others?
  • Mile Corrigan encouraged industry members to share research, technology, best practices & tools and to reach out to their government customers faced with big data challenges to join the committee.
  • The Big Data Committee welcomes new members Sydney Smith, Judith Rutkin, and William Brantley from OPM, Susan Stolting from DHS, and Matt Salter from Noblis.
  • Matt Salter will serve as the BDC Communications Chair
 
2. Review Action Items from Prior Meeting
  • Mile Corrigan reviewed action items from the September meeting which included final revisions to the draft charter and the review and acceptance by John Shaw, Director of Shared Interest Groups.
 
3. Committee Charter Overview & Group Discussion
  • Mile and Johan provided an overview of the Big Data Committee Charter and the committee’s advisory role in solving big data challenges ranging from analytics, data management, data visualization, business intelligence, and other government driven topics and will distribute the charter via the ACT-IAC Big Data Work Area located at: http://www.actgov.org/sigcom/SIGs/SI...s/default.aspx
  • Johan described the need for government-driven objectives and a symposium that would allow government to convey both successes and failures. John Geraghty stated that this format worked well for the Mobility working group – 5 specific areas were identified with a government chair for each area.
  • Johan asked that all members focus in particular on the scope and objectives sections of the approved charter and to provide the committee chairs suggestions, recommendations and opportunities. Johan stressed that the committee has an opportunity to focus on aligned responsibilities with the report to the Hill which was recommended by Brand Neimann as well as specific agency priorities/needs in an environment of budgetary creativity and constraints.
  • Victor proposed a closed-door event where government feels safe discussing problems and key issues with their peers. A few key members from industry will participate to record & document topics generated from the discussion.
  • Brand Niemann suggested conducting meaningful pilots (with scientific basis) and conferences that allow collaboration between both industry and government. Brand also suggested leveraging participation from the Big Data Steering group established by TechAmerica and the NLM team working with semantic medline data.
  • Mile described additional leadership roles needed for the BDC in the areas of knowledge capture (Knowledge Capture Chair) and coordination across ACT-IAC (BDC Committee Liaison)
  • BDC members Jim Soltys and Scott Larkin suggested adding versatility and value as additional V’s for the charter. BDC attendees concurred with the proposals.
  • Victor recommended adding the Collaboration & Transformation SIG under the “Key Relationships” section. Adelaide O’Brien is a panelist on the Big Data track and is very active in big data initiatives across ACT-IAC.
 
4. Research & Discovery Updates
  • There were no research & discovery items identified.
 
5. Big Data Events
  • ELC Big Data Track – Williamsburg, VA – October 28-30, 2012
    • o BDC Committee members attending ELC will capture new initiatives/issues from the Big Data track
  • NIST announces the Cloud Computing AND Big Data Forum & Workshop to be held on January 15-17, 2013.
    • o Event will cover how the Cloud Computing Information Technology model can be used to improve public services, provide an update on NIST Cloud Computing working group progress, and to showcase examples of academic, industry, standards organizations and government partner efforts which are making progress related to USG Cloud Computing Technology Roadmap priorities.
    • The event has been expanded to focus on the emerging trend of Big Data in the context of its convergence with and complementary relationship to Cloud Computing.
 
Action Items Person(s) Responsible Status/Deadline
Distribute list of potential meeting dates/times for recurring BDC meetings Mile Corrigan Completed
Compile consolidated list of recommendations for next BDC meeting on November 7th Mile Corrigan 11/4/2012
Compile initial list of Big Data initiatives and disseminate via site
Mile Corrigan
Brand Niemann
11/7/2012
Highlight active and planned initiatives/issues raised from ELC Big Data Track
Mile Corrigan
Brand Niemann
11/7/2012
Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting BDC Members 11/7/2012
Outreach to Collaboration & Transformation SIG to discuss optimized collaboration
Mile Corrigan
Johan Bos-Beijer
11/7/2012
Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties
Mile Corrigan
Johan Bos-Beijer
11/9/2012
Distribute compiled list of initiatives to government workshop invitees Matt Salter 11/9/2012
BDC member outreach communication for those who were unable to attend the working session on the 18th of October
Matt Salter
Johan Bos-Beijer
Mile Corrigan
Ongoing in October- November 2012
Coordinate Government Workshop/Symposium BDC Workshop Planning Group (TBD) mid-November 2012
Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative BDC Workshop Planning Group (TBD) mid-November 2012
Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
Brand Niemann
Johan Bos-Beijer
Mile Corrigan
late-November/early-December 2012

October 18, 2012 Meeting Invitation

Source: https://members.actgov.org/eweb/DynamicPage.aspx?expires=yes&Site=ACT&WebKey=6c63d18e-c3c8-4694-a230-51e8f646f04f
Date: October 18, 2012
Event start time: 9:30 AM
Location: Noblis

Introduction

Emerging Technology SIG October 2012 Meeting 
Topic
: ET SIG Big Data Committee Overview
Date: Thursday, October 18, 2012
Time: 9:30 – 11:30 AM 
Location: Noblis Innovation and Conference Center, 3150 Fairview Park Drive South, Falls Church, VA 22042 ** Please allow for 15 minutes for visitor check-in upon arrival **
Teleconference: 1-303-218-2664 or 1-800-371-9219; ID 759 153 6404# (ACT-IAC is a non-profit organization. In order to best support our mission, we respectfully request that you use the first conference # listed if possible.) 
Fee: None
  
Dear ET SIG Member,
We would like to invite you to our next meeting Thursday, October 18, 2012 from 9:30-11:30 AM featuring an overview of our Big Data Committee. To further the ET SIG's charter mission, the Big Data Committee seeks to enable government agencies to make actionable data-driven decisions through the analysis, management, integration, and representation of large and complex data stores. The committee’s focus areas will be in the areas of big data management, enterprise data integration and storage, business intelligence, data analytics, governance services, information assurance, and financial and fiscal management.

Big Data Committee Objectives

1. Provide a forum for information sharing and collaboration between federal, state, and local government agencies seeking to leverage their data for better informed decision-making.
2. Advise or recommend approaches to developing Big Data technical frameworks and capability maturity model assessments.
3. Promote Big Data best practices through increasing awareness of Big Data research, technologies, use cases, and high performance computing within the Federal Government.
4. Collaborate across ACT-IAC on Big Data initiatives and events.

Agenda

9:30 – Networking
10:00 - Introductions & New Members
10:15 - Action Items from Prior Meeting
10:30 - Committee Charter Overview
11:00 - Research & Discovery Updates
11:15 - Big Data Events

We look forward to seeing you there!

Sincerely,
John Geraghty, Chair
Victor Koo, Vice Chair
ET SIG

 

Big Data Committee Charter

PDF

Authority

The Big Data Committee (BDC) is a formally chartered, voluntary organization of the American Council for Technology-Industry Advisory Council (ACT-IAC) Emerging Technology (ET) Shared Interest Group (SIG). The ET SIG serves federal CxOs (CIOS/CTOs) and other government executives (e.g., Science & Technology (S&T) Directors) responsible for identifying, assessing, and deploying emerging technology and maturing it to fulfill mission objectives.

Mission

To further the ET SIG’s charter mission, the BDC seeks to enable government agencies to make better data-driven decisions through the analysis, management, integration, and representation of large and complex data stores.

Objectives

In the context of the ET SIG’s charter mission, the BDC seeks to:
 
1. Provide a forum for information sharing and collaboration between federal, state, and local government agencies seeking to leverage their data for better informed decisionmaking.
2. Advise or recommend approaches to developing Big Data technical frameworks and capability maturity model assessments.
3. Promote Big Data best practices through increasing awareness of Big Data research, technologies, use cases, and high performance computing within the Federal Government.
4. Collaborate across ACT-IAC on Big Data initiatives and events.

Membership

The BDC will incorporate SIG designated liaisons, program liaisons, project leaders, and members at large among its membership. The BDC will be of unlimited size, varying by ongoing activities. Membership shall be open to all ACT-IAC members in good standing, and may
include a government-only component. All BDC activities, including committee meetings and teleconferences, shall be posted on the ACT-IAC calendar and shall be open to participation by all ACT-IAC members.
 
The BDC will use a government/industry co-chair model and is chaired by Johan Bos-Beijer (government) and Mile Corrigan (industry).

Areas within Scope

This committee is comprised of government and industry professionals that desire to serve their nation through voluntary service; time constraints dictate a clear definition of priorities and scope. For the initial 12 months, this committee’s areas within scope will include:
  • Big Data Management – includes the collection, transformation, storage, analysis, and presentation of Big Data sets as described by the “3 V’s” that challenge traditional management tactics, techniques, methods, and procedures of storing and retrieving it to support analysis, production and distribution of actionable intelligence as the intended outcome. The 3 V’s include:
    • Volume: vastly large data sets.
    • Velocity: dynamic and streaming as they are ingested from dispersed locations.
    • Variety: multiple data types and structures (unstructured text, full motion video, image, voice, machine based, transactional, biometric, sensor-based, etc.).
  • Enterprise Data Integration and Storage – includes data marts, data warehouses, integrated data management, data virtualization, data federation, and other data management structures in use in the Federal Government. Automating the consolidated, selective extraction and collection of disparate authoritative external and internal source master and transactional data in all types (structured, semi-structured, and unstructured, transactional, and master data) where the data is stored, cleansed, transformed, standardized, and integrated, for access by analytical stakeholders for their actionable use in improved decision making.
  • Business Intelligence – service access to enterprise information for operational and strategic reporting. This includes the development of operational reports, ad-hoc reports, dashboards, balanced scorecards, geospatial analyses, and “what-if” type of analytical reporting requiring multidimensional online analytical processing (MOLAP) capabilities. Business intelligence tools leverage the power of Big Data by making actionable information and analytic results available to every employee in the organization.
  • Data Analytics – finding patterns, correlations, and actionable information from data using a variety of advanced mathematical algorithms, data mining, text mining, and predictive modeling. Data analytics services can be applied across enterprises to enable data-driven decision making, improve product development and service delivery, optimize workload and workforce allocation, improve budget execution, identify cost savings, and to combat fraud, waste, abuse, and other inefficiencies in government programs.
  • Governance Services – standards definitions and best practices such as the development of a data management specific maturity model that helps agencies assess their readiness, business case, and data quality fitness as they prepare to embark upon or evaluate an ongoing data management initiative.
  • Information Assurance – standards definitions and best practices to ensure the confidentiality, integrity, and availability of data and compliance with privacy regulations particularly when Big Data related nuances prevail or must be specifically addressed.
  • Financial and Fiscal Management – incorporate financial management, reporting, investment, planning, and evaluation (i.e., TCO, ROI) to enhance budget, appropriations and fiscal accountability requirements consistent with statute, acts, regulation, and applicable policies.

Expected Outcomes

  • Federal Government Focus Group sessions comprised of Big Data practitioners in the areas such as, but not limited to, finance, compliance, security, analytics, information assurance, privacy, legal, policy, and human capital.
  • Town Hall panels of SMEs carefully planned, located, scheduled, selected, and moderated to provide insight of value to Big Data Practitioners.
  • Development of an agile Big Data framework and maturity model from a technology perspective which can be matured as technology progresses.
  • Support of a public collaboration site for ACT-IAC membership.
  • Deliverables executed in accordance with the ACT-IAC Request for Assistance format.

Roles & Responsibilities

The roles and responsibilities of the committee include:
  • Ensure the committee’s activities are executed consistent with the ACT-IAC defined mission, vision, and principles. See:
  • http://www.actgov.org/about/missionv...s/default.aspx
  • Receive, record, vet, prioritize, and respond to official Requests for Assistance.
  • Exercise governance to ensure the committee’s work is performed subject to governing statutes (i.e., Federal Advisory Committee Act).
  • Identify and assign willing and qualified volunteers to execute responsibilities and oversee results.

Meetings and Communications

  • Co-Chairs – facilitate meetings and establishes workgroups to achieve committee objectives and regularly communicates with ET SIG leadership and government executives.
  • Communications Chair – distributes communications to committee members with meeting logistics, updates the committee webpage, and creates text for announcements and other communications.
  • Knowledge Capture Chair – takes minutes of all meetings and distributes to membership for corrections and additions; coordinates the creation and publication of any program content.
  • Committee Liaison – attends meetings and events across ACT-IAC program areas, to include other SIGs, to ensure successful coordination of Big Data initiatives for the committee.

Key Relationships

  • Emerging Technology SIG
  • Enterprise Architecture SIG
  • ACT-IAC Program Committee
  • ACT-IAC conference planning committees

Timeframe and Schedule

The timeframe for achieving the above deliverables shall be developed by the committee and submitted for review by the ACT-IAC within 30 days of approval of the charter. The schedule of deliverables will be determined based on Requests for Assistance and committee recommendations.

Operating Principles

The BDC will follow SIG operating principles:
  • Activities should advance Government
  • Activities must be objective, ethical, and vendor neutral
  • No business development or promotion
  • Transparent and open to all interested ACT-IAC members

Governance Structure

The Big Data Committee will operate under Shared Interest Group governance rules and procedures. It will report to the IAC ET SIG proponent.

Leadership

The Committee shall have an Industry Chair and a Government Chair. The chairs shall be appointed by the ACT-IAC ET SIG.

Timeline for Review and Approval

Name of persons drafting charter: Mile Corrigan, Johan Bos-Beijer, Steven Hernandez, Ron Berry, Thomas McCullough, Isaiah Goodall.
 
Date submitted for review and approval: Originally submitted to John Shaw on 10/3/12, approved on 10/4/12.
 
Reviewer and date of review: John Shaw, ACT-IAC Director of Shared Interest Groups & Program Events, 10/4/12

EMERGING TECHNOLOGY SIG NEWSLETTER

Source: http://www.actgov.org/sigcom/SIGs/SIGs/ETSIG/Documents/Pulse/September%202012%20ET%20SIG%20Pulse%20newsletter.pdf

 
September 2012 Pulse
VOLUME 2, ISSUE 5 SEPTEMBER 2012

ET SIG Mission

  • Constituency Served: Federal CXOs (CIOs/CTOs) and other Government executives (e.g., S&T Directors) responsible for identifying, assessing and deploying emerging technology and maturing it to become a major component of the IT & business strategy.
  • Who is on this SIG: Industry, Government, academia and others within ACT-IAC that are involved with emerging technology and provide products, services, processes, and business models enabling innovative approaches to solving Government issues and challenges.
  • What We Do: Come together to evaluate and react to game-changing events; share lessons learned from the Emerging Technology Process Model in the Federal space; form committees around and provide insight on high-potential technologies to accelerate awareness and adoption; actively monitor the emerging technology landscape.

News

  • Jul 26 — ET SIG GAP meeting was held. The focus included discussing areas of interest, finalizing topics through the remainder of the year, and extending GAP membership commitments through June 2013.
  • Aug — ET SIG submitted input and metrics for ACT-IAC Quadrennial Review.
  • Sep — ET SIG invited 12 senior managers and executives to serve as Government advisory panelists for the 2012-2013 term.
  • Sep — The Agile Committee collaborated with the Programs Committee to help plan the ACT-IAC Agile Development Executive Panel scheduled for September 21.
  • Sep — The Big Data Committee is finalizing their charter for submission to the ET SIG for approval. Planned and Highlighted Events
  • Sep 21 — SIG Agile Committee Overview — Please contact Sandi Van Valkenburg; sandra.vanvalkenburg@cgifederal.com
  • Oct 18 — SIG Big Data Committee Overview — Please contact Victor Koo; victor.koo@k3-solutions.com
  • Oct 28-30 — ACT-IAC Executive Leadership Conference– Theme: Charting a Course — Williamsburg, VA — Visit the ELC Homepage for more information.
  • Nov — SIG Program on Semantic Web — Seeking Lead volunteer! — Please contact John Geraghty; jgeraghty@mitre.org

Announcements

  • The ET SIG Big Data Committee is actively recruiting federal big data practitioners in the areas of finance, compliance, security, analytics, information assurance, privacy, legal, policy, and human capital. Please contact MileCorrigan; mile.corrigan@noblis.org or Johan Bos-Beijer; johan.bos-beijer@gsa.gov
  • Seeking volunteers to serve as Leads to research and coordinate program events for the following ET SIG initiatives: Agile, Semantic Web, and ET Research. Please contact John Geraghty; jgeraghty@mitre.org
  • Let us know your ideas! Visit us at our ET SIG Homepage or join our discussion on GovLoop. In addition, please contact John Shaw for general information; jshaw@actgov.org
 
Meetings are generally scheduled for the third Thursday of each Month! ...See you then!
Page statistics
7328 view(s) and 42 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments