Table of contents
- A Semantic Web Strategy for Big Data
- Turning Big Data into Big Benefits
- The Semantic Web
- Big Data is a Big Deal
- Slide 1 Title
- Slide 2 Member Agencies
- Slide 3 Quotes
- Slide 4 Definition
- Slide 5 Big Data Senior Steering Group
- Slide 6 Vision
- Slide 7 Goals
- Slide 8 Core Technologies
- Slide 9 March 29 White House Event
- Slide 10 Big Data Fact Sheet
- Slide 11 Challenges and Competitions
- Slide 12 Recap of Challenge
- Slide 13 Domain Research Projects
- Slide 14 Workforce Development
- Slide 15 Contacts
- Action Items
- Big Data (BD SSG)
- January 8, 2013 Meeting Notes
- December 5, 2012 Meeting Notes
- November 7, 2012 Meeting Notes
- My Meeting Notes
- October 18, 2012 Meeting Notes
- October 18, 2012 Meeting Invitation
- Big Data Committee Charter
- EMERGING TECHNOLOGY SIG NEWSLETTER
- A Semantic Web Strategy for Big Data
- Turning Big Data into Big Benefits
- The Semantic Web
- Big Data is a Big Deal
- Slide 1 Title
- Slide 2 Member Agencies
- Slide 3 Quotes
- Slide 4 Definition
- Slide 5 Big Data Senior Steering Group
- Slide 6 Vision
- Slide 7 Goals
- Slide 8 Core Technologies
- Slide 9 March 29 White House Event
- Slide 10 Big Data Fact Sheet
- Slide 11 Challenges and Competitions
- Slide 12 Recap of Challenge
- Slide 13 Domain Research Projects
- Slide 14 Workforce Development
- Slide 15 Contacts
- Action Items
- Big Data (BD SSG)
- January 8, 2013 Meeting Notes
- December 5, 2012 Meeting Notes
- November 7, 2012 Meeting Notes
- My Meeting Notes
- October 18, 2012 Meeting Notes
- October 18, 2012 Meeting Invitation
- Big Data Committee Charter
- EMERGING TECHNOLOGY SIG NEWSLETTER
A Semantic Web Strategy for Big Data
Agenda
Remarks by Dr. Tom Rindflesch (NLM) and Dr. George Strawn (OSTP/NITRD)
- Current projects that are implementing Semantic Web Standards and Technologies.
- Reusable solutions for datasets communicated between machines (Big Data).
- Coordination with the Big Data Committee, Collaboration & Transformation SIG and the Advanced Mobility Working Group.
- Thoughts on next steps and resources for interested organizations.
Turning Big Data into Big Benefits
Source: http://www.cutter.com/content-and-an...roduction.html
Big Data Is a Solution -- So Where's the Problem?
Ensuring the Accuracy of Your Social Media Analysis
Traditional Data Warehousing Meets Big Data: What Does It Mean for the Enterprise?
A Semantic Web Strategy for Big Data
Big Data Tools for Population Health Management
Big Data Analytics: Outsourcing vs. Crowdsourcing
by Ralph Hughes
Interest in Big Data analytics (BDA) has certainly skyrocketed in the past few years to reach a fevered pitch, with the market for this technology projected to reach a 58% compounded annual growth rate over the next five years.1 Indeed, when I walked the vendor exhibit halls at several TDWI World Conferences during the past year, it seemed that nearly all the application vendors had introduced a new package offering a "Big Data" solution. At every booth, plenty of curious attendees lined up to hear about these new features. The vendors were certainly happy for the attention, but they also confided to me that they had grown tired of answering the same question day after day, namely "What is Big Data?"
I believe this lament is actually more emblematic of the state of BDA today than any particular solution being offered. When vendors rush to cater to needs that many customers do not yet understand, are we at risk of solving the wrong problem or cementing in place a basic strategy we will later regret? Perhaps at this early juncture we should carefully dodge the hype about Big Data and offer a sober appraisal of this new technology before acting.
LOOKING PAST THE HYPE
Industry pundits, in the area of data warehousing at least, take a jaundiced view of the buzz surrounding Big Data. "When haven't business intelligence applications had to deal with 'Big Data'?" they ask. Any type of data requires deliberate engineering to acquire, store, summarize, and present it in way that generates business insights. The cynics among us discount the fever over Big Data as a vendor-stoked overreaction to a few white papers by computer science wonks at Google and Yahoo! who found a couple of processing shortcuts while taming their own flood of Web stream data. These cynics see Big Data as a craze that will quickly fade.
Such skepticism might be too extreme, however. New technologies do frequently follow quick lifecycles, but several considerations suggest that Big Data represents a sea change for enterprise information. With the cost of processing and data storage falling so rapidly each year, our society no longer seems constrained as to the amount of information it can create and retain. Today's burgeoning numbers of online users now leave a trail of "digital exhaust" as they cruise social networking sites; e-commerce continues to grow at 35% per year; and RFID tags are steadily appearing on wholesalers' pallets and manufacturers' products. We are entering the "Internet of Things," in which phones, cars, trains, and planes -- plus process controllers, appliances, and medical devices -- all transmit a steady stream of data for interested parties to mine. Even dairy cows now sport portable monitors announcing when they come into heat.2 The data our society generates in a single year recently surpassed a zettabyte (a trillion gigabytes), which is a hundred million times more information than is contained in the print collection of the US Library of Congress -- and this onslaught is doubling every two years.3
Naturally, people worry about how much of this data they should capture, manage, and analyze. We frequently read about creative entrepreneurs discovering riches hidden in this information. For example, companies can now measure customer sentiment toward their products by mining the comments, ratings, and even images shared on the Web. They can correlate these sentiment statistics with purchase records provided by loyalty programs at grocers and retail stores, empowering marketers to customize advertising campaigns for individual consumers. As we move between websites today, we encounter a sequence of offers that are so subtle they go unnoticed but are so aligned with our individual preferences and behavioral triggers that we are almost certain to buy. With world Internet usage quintupling per decade,4 there is no upper limit on the number and value of new business opportunities for those who can bend the swelling flood of data to their purposes. In this context, the frenzied interest in Big Data makes sense because the power of such analytics has been proven, and rational companies should be actively seeking to profit from it.
MAPREDUCE IS NOT OUR SILVER BULLET
Unfortunately, the best method of channeling this informational deluge is far from clear, because the term "Big Data" has not yet been well defined. Big data analytics is frequently described as the management of information volumes much larger than our ordinary data management tools can handle. Pundits usually refer to Doug Laney's "3Vs" 5 -- volume, velocity, and variety -- which will be explored in the articles in this issue. Yet the 3Vs are only a description of the problem, one that leaves most of us searching for an industry standard approach proven to overcome the challenge. Such a search does not uncover a single direction, however, but instead a myriad of competing strategies. Despite the fact that experts have been discussing Big Data for over 10 years now, the field is still very new, and for all the urgency we feel, no silver bullet yet exists.
The most commonly cited solution for BDA involves a technology pioneered by the large Internet search engines, called "MapReduce" (MR). So frequently do Big Data conversations gravitate to MR that Hadoop, the open source implementation of MapReduce, is now a standard component of most mainstream databases. 6, 7
Yet MapReduce is not a universal solution to all Big Data problems, for several reasons. First, it solves only problems that can be formulated in terms of key-value pairs. This approach is capable of some powerful insights, but it has a distinct sweet spot that generally requires the input data to be already assembled into a flat file. Second, as anyone who has tried to join multiple tables using MR (or even wrestle it into printing "Hello, world!") can tell you, MR is not a general solution to many common data management challenges. Third, the interface to MR data stores is fairly primitive in comparison to the standard DBMSs -- a team must know Java well. Fourth, attempts to provide an SQL-like querying tool for MR still lack many ANSI SQL-92 commands and other common SQL extensions. Fifth, solid MR programmers are difficult to find, so the added cost and risk of building MR applications can far exceed the investment required by the many alternatives.
Because MapReduce is not the only solution available for high data volume, velocity, and variety, a solid Big Data strategy should look at the other technologies. There are many more columnar databases available today than there are MR implementations. Many of these columnar DBMSs are imbedded in data warehouse appliances that allow our existing business intelligence (BI) applications to handle very large volumes of data using a standard SQL interface. Furthermore, many columnar databases are more mature than MR, allowing Big Data applications to be designed and developed by developers with more typical skills. For organizations willing to consider newer offerings by smaller vendors, there are also the numerous types of Big Data solutions found in the NoSQL ("Not Only SQL") universe, such as key-value pair databases that do not require MR programming; graphic databases that use "triples" rather than key-value pairs; and in-memory relational databases that settle for "eventual consistency" in the interest of very fast read-write operations. These products, too, often look more like our traditional tools, making them easier to work with, and several of them can tackle analytical questions that MR cannot begin to address.
REPLACING FEAR WITH DISCIPLINE
Given the limits of MapReduce and the presence of many alternative solutions, it is odd that so many conversations about Big Data turn instantly to Hadoop. This knee-jerk reaction is driven mostly by fear. Both business and IT executives feel threatened by the accelerating flood of data coming from a proliferating number of sources. They worry that they should be doing something creative and profitable with it today, before competitors blindside them with new capabilities. They naturally want to start storing everything now, even if they cannot articulate the value for this information, and they hope against all odds that grabbing hold of this information is going to be quick and easy. Indeed, Forbes 8 notes that Big Data today is ill-defined, intimidating, and immediate(i.e., demanding action now) -- all of which adds up to a set of "3 Is" that may be more important to consider than the 3 Vs.
A more sober view of the situation might suggest that data streams in the exabytes are only another chapter in data management, just as terabytes and petabytes challenged us in previous decades. We must remind ourselves that new technologies frequently get overhyped by the media and vendors, and that our search for a silver bullet often leads to profound disappointment. We will need time and discipline to see what Big Data can realistically offer. A disciplined approach should begin with compelling use cases that express clearly attainable business impacts. Only by articulating realistic objectives can we rationally choose a technical solution from the several competing Big Data technologies. Moreover, any Big Data solutions must integrate into our existing strategies for "not-so-Big Data," so that the information flood from the coming "Internet of everything" calmly fills our carefully architected BI ecosystems with usable data rather than washing them away.
IN THIS ISSUE
The articles selected for this issue of Cutter IT Journal provide a handy opportunity to conduct that sober evaluation of Big Data technology. The discussion first provides a solid introduction to the world of BDA and then explores a set of important extensions of the technology. Richard Walsh, Richard O'Callaghan, and Sabine Yoffou start off our collection by systematically defining Big Data so that we can begin successfully planning a serious implementation effort. Next, IBM's Matthew Ganis and Avinash Kohirkar examine one of the most common uses of BDA, namely mining social media discussions. Rich Johnson and Ron Zahavi of Microsoft then address the essential topic of incorporating this new style of analytics into our traditional data warehousing programs, so that we end up with well-integrated BI platforms.
The theme of extending Big Data technology begins with Frank Coyle, who discusses one of the primary competitors to MapReduce -- the RDF triple, which will someday soon enable the Semantic Web. Holly Korda, Ann Magee, and Lori Damiano then explore Big Data's potential in a specific industry, showing how it can be leveraged to bring transparency and accountability to the world of healthcare. Finally, Saeed Lajami, Anson Mok, Mario Wahyu Prabowo, and Cutter Senior Consultant Sara Cullen provide an interesting alternative for our solutions toolkit by advocating the use of crowdsourcing to solve Big Data challenges.
Together these articles introduce insights of breadth and depth into the new and quickly evolving world of BDA. We hope they will help you begin to explore and understand how this technology can solve what will be some of IT's most pressing challenges for the foreseeable future.
ENDNOTES
1
Kelly, Jeff. "Big Data Market Size and Vendor Revenues." Wikibon, 16 October 2012 (http://wikibon.org/wiki/v/Big_Data_M...endor_Revenues).
2
Tagliabue, John. "Swiss Cows Send Texts to Announce They're in Heat." The New York Times, 1 October 2012.
3
Gantz, John F., and David Reinsel. "The 2011 Digital Universe Study: Extracting Value from Chaos." IDC, June 2011 (http://www.emc.com/digital_universe).
4
Internet World Stats (www.internetworldstats.com/stats.htm).
5
Laney, Doug. "3D Data Management: Controlling Data Volume, Velocity, and Variety." Meta Group, 2001.
6
Groenfeldt, Tom. "Microsoft Does Big Data -- Hadoop on Windows." Forbes, 5 June 2012.
7
Dijcks, Jean-Pierre. "Oracle: Big Data for the Enterprise" (PDF). Oracle, October 2011 (http://www.oracle.com/technetwork/da...cle-521209.pdf).
8
Feinleib, Dave. "The 3 I's of Big Data." Forbes, 9 July 2012.
The Semantic Web
Source: http://www.cendi.gov/minutes/pa_1111.htmlA Learner’s Guide to the Semantic Web
Dr. George Strawn, NITRD (PDF)
The web comprises web pages linked together. Links are crucial to what the web is. The pages have information for humans to read. While HTML has hidden metadata, it is basically designed for people to read. By contrast, the semantic web is data for computers to read with semantic searches yielding answers, not just pages that may have answers. In this sense, it is more like a relational database system.
Semantic MEDLINE: A Proof of Concept
Big Data is a Big Deal
Slide 10 Big Data Fact Sheet
http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.pdf (MY NOTE: Same as Slide 8 above.)Action Items
Compile initial list of Big Data initiatives and disseminate via site
As Knowledge Capture Chair you would coordinate the creation and publication of any program content for the committee along with taking minutes of meetings and distributing the content to membership for corrections and additions.
I'm going to send a note out shortly to the membership to capture ideas for a pilot and panel/symposium ideas. As the Knowledge Capture Chair, you would coordinate with the membership on the development of this content and would have the opportunity to take part in the closed-door session with government to capture their insights and report back to the committee.
Big Data Committee Work Area: http://www.actgov.org/sigcom/SIGs/SIGs/ETSIG/BigData/pages/default.aspx
I have already got the following Big Data Sets:
| Agency | Contact | Subject | Topic |
| CIA | Gus Hunt | CIA World Fact Book | Intelligence Community (Unstructured and Structured Data Integration) |
| NGIA | Leticia Long | Quint | Intelligence Community (New Mission) |
| HHS/CMS | Niall Brennan and Scott Depuy | Medicare for IOM and SEER | Topic 1 - Large Scale Records Management in Health Care |
| OSTP | Todd Park | Health, Safety, and Energy Data | Audit of New Data.gov Communities |
| NLM and OSTP | Tom Rindflesch and George Strawn | Semantic Medline | Topic 4 - Conducting Extreme Scale Semantic Data Analysis |
| GMU | Professor Kirk Borne | Using Data Science Evidence in Public Policy for Big Data and Elections, Spotfire Learning Network for Class Projects, and County Health Rankings | Federal Big Data Senior Steering Group Work Force Training |
| EPA | Malcolm Jackson | EnviroFacts and Indicators | Environmental Data |
| GSA | Marie Davie and Johan Bos-Beijer | Governmentwide Acquisition Contract (GWAC) Dashboard | Topic 3 - Addressing Large Scale Fraud, Waste & Abuse in Federal Procurement and Entitlements Programs |
| Energy | Peter Tseronis | DISRE Solar (Government Information and Analytics Summit) | Topic 2 - Cybersecurity (Victor Pollara, Noblis) |
| Treasury Bureau of Public Debt | Thomas Vannoy? | Bureau of Public Debt | Financial Data |
| NASA | Deborah Diaz | IN PROCESS | Many to select from |
| State | Alex Ross | Recorded Future Protests | Topic 5 - Social Media Data Management |
| DC Government | Office of the CTO | Data Catalog and 311 Message Services | Open Data Handbook |
Highlight active and planned initiatives/issues raised from ELC Big Data Track
I will do the knowledge capture for the ELC Big Data Track:
http://semanticommunity.info/AOL_Government/ACT-IAC_2012_Executive_Leadership_Conference
MY NOTE: This was cancelled and I was told to wait before pursuing this.
Since the ELC was cancelled and I did not get to do the knowledge capture for that, I have the following suggestion:
Use the list of ELC Big Data Panel individuals in the attached spreadsheet to ask them the following:
What were they going to say?
What big data sets do they have?
What big data sets could be used for a pilot?
If you agree, then we would need all of their email addresses (John Shaw?) and a formal email from the ACT-IAC ET SIG (Johan and Mile?) requesting the answers.
Then I would follow-up to get the information in a form that we could use for several meetings with pilots.
Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee
I have done the following AOL Gov stories recently on Big Data:
http://gov.aol.com/2012/10/10/big-data-reaches-the-hill-a-guide-to-making-it-more-actionable/
http://gov.aol.com/2012/10/11/what-the-white-house-learned-from-linkedin-and-the-use-of-big-da/
http://gov.aol.com/2012/10/15/open-government-data-and-statistical-data-havent-we-been-here/
http://gov.aol.com/2012/10/30/tempor...-intelligence/
I am working on a White Paper for the ET SIG Agile Committee that includes Big Data Analytics:
http://semanticommunity.info/AOL_Government/ACT-IAC_Agile_Development#Story
I am very familiar with the IRS efforts and it would be one of a few good examples to use in our upcoming symposium. One of the reasons we had included analytics in the charter was to be sure to be able to capture practitioner real use examples, often in their infancy or developing a succession of wins strategy.
This is an automated email to let you know that a new entry has been posted to the blog for Revenue & Finance Update.
"IRS Project Shows How Analytics Can Improve Performance" by Susan Gogos
http://blogs.mitre.org/blogs/permalink.cfm?username=RevenueFinanceUpdate&id=37008
MY NOTE: Link does not work and article requested
Big Data (BD SSG)
Source: https://connect.nitrd.gov/nitrdgroups/index.php?title=Big_Data_(BD_SSG)
Overview
Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. – Wikipedia, May 2011
The Big Data Senior Steering Group (BD SSG) has been formed to identify current big data research and development activities across the Federal government, offer opportunities for coordination, and begin to identify what the goal of a national initiative in this area would look like. As data volumes grow exponentially, so does the concern over data preservation, access, dissemination, and usability. Research into areas such as automated analysis techniques, data mining, machine learning, privacy, and database interoperability are underway at many agencies and will help identify how big data can enable science in new ways and at new levels. The science of data includes the processes of turning data into knowledge, data mining and visualization, interoperability, search and discovery, and semantics.
Scope
BD SSG was formed to identify programs across the Federal government and bring together experts to help define a potential national initiative in this area. BD SSG has been asked to identify current technology projects as well as educational offerings, competitions, and funding mechanisms that take advantage of innovation in the private sector.
Functions
Current functions and activities include:
- Collecting information on current activities across the Federal Government.
- Creating a high-level vision of the goals of a potential national initiative.
- Developing the appropriate documents and descriptions to aid discussion within the government, and where appropriate, the private sector.
- Developing implementation strategies that leverage current investments and resources.
January 8, 2013 Meeting Notes
Preliminary
And below is the invitation to the upcoming “Data Innovation Day” event later this month in D.C. that I mentioned on the call that is going to focus on issues/topics related to Data Innovation and Government.
Shannon L. Kellogg
Amazon.com
Director of U.S. Public Policy, Amazon Web Services
e-mail: shannonk@amazon.com
phone: 703-309-9636
Data Innovation in Government
9:00 AM - 10:30 AM
As part of Data Innovation Day, ITIF will host a panel discussion on how government agencies are using data to make government work more effectively and efficiently. This panel will discuss recent examples of data innovation in the federal government, including efforts to open government data sets and collaborate with the private sector. Join us for a conversation with experts from the public and private sectors who are leading these changes.
Follow the Data Innovation Day conversation with #datainnovation.
PARTICIPANTS:
This is what Isaiah Goodall just mentioned on the phone call in case it got buried in your holiday emails:
Knowledge Capture in preparation for our next meeting:
With Upcoming: http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data#Upcoming
Cross –Walk Table of Big Data Pilot Projects and Activities:
For use with the Data Transparency Coalition on the new Data Act for the 113th Congress:
http://semanticommunity.info/DataTransparencyCoalition.org
Dr. Brand Niemann
Director and Senior Data Scientist
Semantic Community
http://gov.aol.com/bloggers/brand-niemann/
703-268-9314
December 5, 2012 Meeting Notes
Introduction
Mile Corrigan, PMP, Six Sigma Blackbelt
Center of Digital Excellence Leader, Noblis
Meeting Notes
- Frederick Walker, NSA
- Dimitris Geragas, Gartner
- Industry Interest received from Kforce and Booz Allen
- Government and Academic Interest received from Treasury, MIT, and Stanford
- ACT-IAC is a non-profit organization to help educate and inform; ACT side covers full-time government members and IAC side covers industry
- SIGs produce content to address issues to guide/advise government
- Four main operating principles that must be adhered to at all times: (1) everything we do must be ethical, (2) everything we do must be transparent (3) government drives the agenda, and (4) everything we do must be vendor-neutral
- These principles are key to create a safe haven for government and industry to collaborate together
- John Shaw provided updates regarding reprogramming for the Executive Leadership Conference. Dates have been distributed to panel track leads between Jan 15-February 28 to hold up to16 total program committee events (2-3 sessions) over 8 days. John will distribute further details regarding ELC reprogramming to Johan and Mile.
- The government-only Big Data Symposium will be focused on addressing current Federal Government’s big data challenges with dialogue-driven roundtables with participation from real big data practitioners and academia. The symposium will be focused on the following four (4) key challenges:
- Making the business case for big data analytics / how to justify a big data strategy
- Government lead/moderator: Dave Nelson, CMS (alternates: Ted Doolitte or Tony Trenkle) - NOT CONFIRMED
- Training – how to get the right skill sets with lack of big data / analytics talent
- Government lead/moderator: Susan Stolting, ICE, DHS – CONFIRMED
- Academic roundtable member: Dr. Aric Labarr, NC State Institute for Advanced Analytics
- Acquisition – how to form the right strategy to acquire big data analytics
- Government lead: Kevin Youel Page – NOT CONFIRMED
- Data Management / Security – how to manage and secure large and complex volumes of data
- Government lead/moderator: Steven Hernandez, HHS OIG - CONFIRMED
Johan encouraged the committee to focus on commonalities across the 4 tracks and report out on similarity of process and purpose. Anyone interested in assisting Isaiah Goodall in the planning of the event should contact Mile Corrigan.
- December 12 - FCW & Tech America Agency Perspectives on Big Data
- Mile Corrigan briefed the BDC Committee on the NVTC Task Force on Big Data, focused on economic development for the region to help distinguish NoVA as a big data leader
- The NVTC task force is seeking to:
- Organize a commission around the Big Data Initiative unique to NoVA region
- Similar to Massachusetts model – matching grant program to encourage R&D in big data area, internship program to build talent, big data consortium includes academia, industry & government, test bed for big data projects, HPC center in Holyoke (public/private partnership), collaborative hack/reduce space in MIT to work on big data projects
- 1 - Conference with NVTC membership to address big data topics
- 2 - Would ideally like to connect the two initiatives to present to the state
- Mile Corrigan inquired about potential collaboration opportunities with NVTC
- Is there a potential opportunity for ACT-IAC to partner with NVTC given the mutual interest in federal government? Has ACT-IAC partnered with other associations in the past?
- Renee Maisel suggested offline coordination with John Shaw.
| Action Items | Person(s) Responsible | Status/Deadline |
| Compile initial list of Big Data initiatives and disseminate via site | Mile Corrigan Brand Niemann | Ongoing |
| Highlight active and planned initiatives/issues raised from ELC Big Data Track | Mile Corrigan Brand Niemann | Deferred |
| Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting | BDC Members | Complete |
| Outreach to Collaboration & Transformation SIG to discuss optimized collaboration | Mile Corrigan Johan Bos-Beijer | Complete |
| Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties | Mile Corrigan Johan Bos-Beijer | Complete |
| Distribute compiled list of initiatives to government workshop invitees | Isaiah Goodall Matt Salter | In Progress |
| BDC member outreach communication for those who were unable to attend the working session on the 18th of October | Matt Salter Johan Bos-Beijer Mile Corrigan | Complete |
| Coordinate Government Workshop/Symposium | Isaiah Goodall BDC Workshop Planning Group | In Progress |
| Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative | Isaiah Goodall BDC Workshop Planning Group | In Progress |
| Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee | Brand Niemann Johan Bos-Beijer Mile Corrigan | Late-November/early-December 2012 |
November 7, 2012 Meeting Notes
Introduction
Big Data Committee, Thank you for those who participated in our brainstorming session on 11/7. For those of you unable to make the last meeting, we generated a list of big data challenges facing the federal government to be discussed at an upcoming government-led symposium. Isaiah Goodall has stepped up to serve as the Big Data Government Symposium Chair and will lead subcommittee planning activities. Please reach directly to Isaiah for interest in this subcommittee and stay tuned for more information on the symposium coming soon!
Attached are the meeting notes – please let me know if there is anything I missed. MY NOTE: See below
On behalf of our government chair, Johan Bos-Beijer and the ET SIG, the ACT-IAC Big Data Committee welcomes several new members:
- Aric LaBarr, Assistant Professor from NC State University – Institute for Advanced Analytics
- Raman Marway, Business Intelligence, Analytics and Enterprise Performance Management Leader from Mythics Inc.
- Shannon L. Kellog, Director of US Public Policy from Amazon Web Services
- Paul Norcini, Civilian Accounts from MarkLogic Corporation
- Ramon C. Barquin, President of Barquin International
- Frederick Walker, Technical Director Knowledge Management – Office of Counterintelligence, US Department of Defense
- Dimitris Geragas, Senior Director from Gartner
Mile Corrigan, PMP, Six Sigma Blackbelt
Center of Digital Excellence Leader, Noblis
Meeting Notes
Agenda Item Discussions
- Shannon Kellogg, Director of US Public Policy from Amazon Web Services
- Paul Norcini from MarkLogic
- Dr. Ramon Barquin, President of Barquin International
- Aric LaBarr, NC State University
- Raman Marway, Business Intelligence, Analytics and Enterprise Performance Management Practice Leader
- Sterling Thomas from Noblis
- Johan Bos-Beijer has also received additional interest received from Gartner, Forrestor, IBM, SAP, and US Department of Treasury
- Brand Niemann will serve as the BDC Knowledge Capture Chair. Brand has created a new Big Data page on Semantic Community – posted to the ACT-IAC website. The ACT-IAC website will still remain the primary portal for information dissemination and collaboration for the Big Data Committee.
- BDC members confirmed that monthly meetings worked best with everyone’s schedules and that as needed, subcommittees and active working groups will meet more regularly on an ad-hoc basis.
- Mile Corrigan reviewed action items from prior meeting. Due to the cancellation of ELC, several actions have been deferred until ACT-IAC determines how to repurpose the big data track.
- John Geraghty provided several new articles that have been added to the “Resources” folder on the collaboration site:
- Turning Big Data Into Big Benefits – Cutter
- Demystifying Big Data – Tech America
- November 15 - Mark Logic – Big Data Conference
- November 29 – Semantic Web ET Meeting
- November 28-29 – Government Information and Analytics Summit
- December 6 – Meritalk Big Data Breakfast
- Mile Corrigan provided a brief summary of the October 18th meeting where Victor Koo proposed a closed-door event where government feels safe discussing problems and key issues with their peers. The BDC will narrow the list of ideas generated down to four key topics.
- Isaiah Goodall volunteered to serve as the lead for the government symposium sub-committee and will lead planning activities for the event. Mile Corrigan and Johan Bos-Beijer will work with Isaiah to identify subcommittee members and government panel leads for down-selected topics.
- The BDC team identified the following potential topics from the government perspective:
- PII Aggregation – Steve Hernandez discussed how to leverage the high impact of billing records, desk records, & procedural history of medical data? How do you leverage big data for trust analysis and creditworthiness?
- Classification of Data – How to predict the sensitivity of data as it moves from unclassified to classified, etc. This challenge was discussed by DoD in the Big Data track at the Smart Technology & Sustainability event in October.
- Big Structured/Unstructured Data – what are the key government challenges?
- Workforce training for data scientists – Lucca identified challenges with Hadoop and MapReduce and affordability of data scientists for federal agencies and cited DISA’s hiring needs for 80 data scientists. Susan Stolting referenced the need for workforce training in government per the TechAmerica report recommendations. How do we get the right skills/training in place? Can we possibly leverage work done by IE group?
- Acquisition – Susan Stolting described acquisition challenges for government - how do we procure these products/services? How do you build a multi-year acquisition strategy?
- Business case for Big Data – Several members discussed government challenges in developing the business case for big data (also related to acquisition) How do you justify your big data strategy? How do you embed analytics into business decisions?
- The BDC discussed several emerging academic programs – Institute of Analytics at NC State, Teradata University, Information Studies program at Syracuse
- Next Steps:
- Narrow down list of topics for the symposium
- Identify government leads for each topic
- Identify list of government workshop invitees (DHS S&T, CMS HHS, others?)
| Action Items | Person(s) Responsible | Status/Deadline |
| Distribute list of potential meeting dates/times for recurring BDC meetings | Mile Corrigan | Completed |
| Compile consolidated list of recommendations for next BDC meeting on November 7th | Mile Corrigan | 11/4/2012 |
| Compile initial list of Big Data initiatives and disseminate via site | Mile Corrigan Brand Niemann | In progress (11/7/2012) |
| Highlight active and planned initiatives/issues raised from ELC Big Data Track | Mile Corrigan Brand Niemann | Deferred (11/7/2012) |
| Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting | BDC Members | 11/7/2012 |
| Outreach to Collaboration & Transformation SIG to discuss optimized collaboration | Mile Corrigan Johan Bos-Beijer | Mid-November (11/7/2012) |
| Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties | Mile Corrigan Johan Bos-Beijer | Mid-November (11/9/2012) |
| Distribute compiled list of initiatives to government workshop invitees | Isaiah Goodall Matt Salter | 11/23/2012 (11/9/2012) |
| BDC member outreach communication for those who were unable to attend the working session on the 18th of October | Matt Salter Johan Bos-Beijer Mile Corrigan | Ongoing in October- November 2012 |
| Coordinate Government Workshop/Symposium | BDC Workshop Planning Group | (TBD) mid-November 2012 |
| Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative | BDC Workshop Planning Group (TBD) | Mid-November 2012 |
| Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee | Brand Niemann Johan Bos-Beijer Mile Corrigan | Late-November/early-December 2012 |
My Meeting Notes
November 29th - ET SIG Semantic Web (with Big Data) - see email I just forwarded November 28-29 - Government Information and Analytics Summit
January 24, 2013, Federal Big Data Senior Steering Group http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline
https://connect.nitrd.gov/nitrdgroups/index.php?title=Big_Data_(BD_SSG)
November 14, Chief Data Scientist Summit http://www.theiegroup.com/IE_Group/Upcoming_events.html
November 14, Analytics Solution Center Seminar "Demystifying Big Data: Decoding the Big Data Commission Report"
January 15-17, 2013, NIST announces the Cloud Computing AND Big Data Forum & Workshop http://www.nist.gov/itl/cloud/
December 11, Hot Topics in Big Data: What You Need to Know Now! http://cendi.gov/activities/12_11_2012_CENDI_NFAIS_FEDLINK.html
Linkedin Big Data Group: http://www.linkedin.com/groups?homeNewMember=&gid=4520336&trk=eml-anet_wlcm-h-visit&ut=0fkucJklQ3Wls1
Linkedin Data Science Training Survey: http://www.linkedin.com/e/ipyoqf-h8xabwxn-18/vaq/180064857/4520336/102064273/view_disc/?hs=false&tok=2n59r-l8fPKRs1
Wendy Wigen at NSF is working on Workforce Training for the Big Data SSC and a list of universities and Professor Borne at GMU has a list in his slides: http://semanticommunity.info/@api/deki/files/18229/kirkborne-NIST-june2012.ppt
October 18, 2012 Meeting Notes
Introduction
As co-chairs, we want to thank everyone who attended the October 18th session for a very productive, idea filled and interaction time together. Following the meeting we have received a number of excellent suggestions, expressions of interest in topics to cover, and recommendations for next steps as the ACT/IAC board determines how to repurpose the ELC work since that annual event had to be cancelled due to weather. You will notice in the attached minutes and action items from our 10/18 meeting, that there is new and expanded content. We will be contacting individual members to reaffirm your interests as well as make personal contact to ensure we are including relevant topics and participants to meet the BD Committee objectives. In addition, we have new members from the government and academic sectors who have joined our committee eager to contribute insight, practitioner expertise and delve into the work sessions. We are currently in communication with members of the legal profession in the government sector as well as contracting subject matter experts to ensure that our work aligns with compliance, policy and pressing needs of the agencies as they face data management, budgetary and accountability objectives over the next several years. We have also had very productive discussion since our past meeting in determining how to be prepared in advance to be responsive to the outcome from the big data report to the Hill in January 2013.
We are available to field your questions, suggestions and observations as we move forward together. We very much appreciate the value of the members who have been in contact with us and look forward to more interaction with you both in our regular meetings, or as the need arises between those sessions.
Johan Bos-Beijer
Government Committee Chair
Mile K. Corrigan
Industry Committee Chair
Meeting Agenda
1. Attendee roll call & introductions of New Members
2. Co-chair opening remarks
3. Confirmation of new monthly meeting date/time
4. Review Action Items and Outcomes since Prior Meeting
5. Research & Discovery Updates from members
6. ACT-IAC and concurrent Big Data Initiatives
7. BDC Government Workshop/Symposium Planning & Coordination
8. Upcoming Events
Meeting Notes
- Johan Bos-Beijer and Mile Corrigan reviewed objectives of the committee including its government practitioner focus. The committee will work on identifying government’s big data challenges. Johan described a few examples - OPM’s data analysis for succession/human resource planning or consolidating grants management and student loan data in the finance sector, health sector benefits and management between VHA, DOD, and HHS, etc. What are the best outreach and engagement methods to identify big data problem areas and use case examples – fraud, waste & abuse, information assurance, security, others?
- Mile Corrigan encouraged industry members to share research, technology, best practices & tools and to reach out to their government customers faced with big data challenges to join the committee.
- The Big Data Committee welcomes new members Sydney Smith, Judith Rutkin, and William Brantley from OPM, Susan Stolting from DHS, and Matt Salter from Noblis.
- Matt Salter will serve as the BDC Communications Chair
- Mile Corrigan reviewed action items from the September meeting which included final revisions to the draft charter and the review and acceptance by John Shaw, Director of Shared Interest Groups.
- Mile and Johan provided an overview of the Big Data Committee Charter and the committee’s advisory role in solving big data challenges ranging from analytics, data management, data visualization, business intelligence, and other government driven topics and will distribute the charter via the ACT-IAC Big Data Work Area located at: http://www.actgov.org/sigcom/SIGs/SI...s/default.aspx
- Johan described the need for government-driven objectives and a symposium that would allow government to convey both successes and failures. John Geraghty stated that this format worked well for the Mobility working group – 5 specific areas were identified with a government chair for each area.
- Johan asked that all members focus in particular on the scope and objectives sections of the approved charter and to provide the committee chairs suggestions, recommendations and opportunities. Johan stressed that the committee has an opportunity to focus on aligned responsibilities with the report to the Hill which was recommended by Brand Neimann as well as specific agency priorities/needs in an environment of budgetary creativity and constraints.
- Victor proposed a closed-door event where government feels safe discussing problems and key issues with their peers. A few key members from industry will participate to record & document topics generated from the discussion.
- Brand Niemann suggested conducting meaningful pilots (with scientific basis) and conferences that allow collaboration between both industry and government. Brand also suggested leveraging participation from the Big Data Steering group established by TechAmerica and the NLM team working with semantic medline data.
- Mile described additional leadership roles needed for the BDC in the areas of knowledge capture (Knowledge Capture Chair) and coordination across ACT-IAC (BDC Committee Liaison)
- BDC members Jim Soltys and Scott Larkin suggested adding versatility and value as additional V’s for the charter. BDC attendees concurred with the proposals.
- Victor recommended adding the Collaboration & Transformation SIG under the “Key Relationships” section. Adelaide O’Brien is a panelist on the Big Data track and is very active in big data initiatives across ACT-IAC.
- There were no research & discovery items identified.
- ELC Big Data Track – Williamsburg, VA – October 28-30, 2012
- o BDC Committee members attending ELC will capture new initiatives/issues from the Big Data track
- NIST announces the Cloud Computing AND Big Data Forum & Workshop to be held on January 15-17, 2013.
- o Event will cover how the Cloud Computing Information Technology model can be used to improve public services, provide an update on NIST Cloud Computing working group progress, and to showcase examples of academic, industry, standards organizations and government partner efforts which are making progress related to USG Cloud Computing Technology Roadmap priorities.
- The event has been expanded to focus on the emerging trend of Big Data in the context of its convergence with and complementary relationship to Cloud Computing.
| Action Items | Person(s) Responsible | Status/Deadline |
| Distribute list of potential meeting dates/times for recurring BDC meetings | Mile Corrigan | Completed |
| Compile consolidated list of recommendations for next BDC meeting on November 7th | Mile Corrigan | 11/4/2012 |
| Compile initial list of Big Data initiatives and disseminate via site | Mile Corrigan Brand Niemann | 11/7/2012 |
| Highlight active and planned initiatives/issues raised from ELC Big Data Track | Mile Corrigan Brand Niemann | 11/7/2012 |
| Identify list of government workshop invitees and BDC members for workshop planning group at next BDC meeting | BDC Members | 11/7/2012 |
| Outreach to Collaboration & Transformation SIG to discuss optimized collaboration | Mile Corrigan Johan Bos-Beijer | 11/7/2012 |
| Socialize Government Workshop/Symposiums with ELC Big Data chairs and panelists as well as interested parties | Mile Corrigan Johan Bos-Beijer | 11/9/2012 |
| Distribute compiled list of initiatives to government workshop invitees | Matt Salter | 11/9/2012 |
| BDC member outreach communication for those who were unable to attend the working session on the 18th of October | Matt Salter Johan Bos-Beijer Mile Corrigan | Ongoing in October- November 2012 |
| Coordinate Government Workshop/Symposium | BDC Workshop Planning Group | (TBD) mid-November 2012 |
| Prioritize initiative list (down to 4) & assign government chair/sponsor and industry chair for each initiative | BDC Workshop Planning Group (TBD) | mid-November 2012 |
| Develop BDC committee communication and engagement process for January 2013 Hill briefing and outcome of the Big Data Advisory Committee | Brand Niemann Johan Bos-Beijer Mile Corrigan | late-November/early-December 2012 |
October 18, 2012 Meeting Invitation
Source: https://members.actgov.org/eweb/DynamicPage.aspx?expires=yes&Site=ACT&WebKey=6c63d18e-c3c8-4694-a230-51e8f646f04f
Date: October 18, 2012
Event start time: 9:30 AM
Location: Noblis
Introduction
Topic: ET SIG Big Data Committee Overview
Date: Thursday, October 18, 2012
Time: 9:30 – 11:30 AM
Location: Noblis Innovation and Conference Center, 3150 Fairview Park Drive South, Falls Church, VA 22042 ** Please allow for 15 minutes for visitor check-in upon arrival **
Teleconference: 1-303-218-2664 or 1-800-371-9219; ID 759 153 6404# (ACT-IAC is a non-profit organization. In order to best support our mission, we respectfully request that you use the first conference # listed if possible.)
Fee: None
Dear ET SIG Member,
We would like to invite you to our next meeting Thursday, October 18, 2012 from 9:30-11:30 AM featuring an overview of our Big Data Committee. To further the ET SIG's charter mission, the Big Data Committee seeks to enable government agencies to make actionable data-driven decisions through the analysis, management, integration, and representation of large and complex data stores. The committee’s focus areas will be in the areas of big data management, enterprise data integration and storage, business intelligence, data analytics, governance services, information assurance, and financial and fiscal management.
Big Data Committee Objectives
2. Advise or recommend approaches to developing Big Data technical frameworks and capability maturity model assessments.
3. Promote Big Data best practices through increasing awareness of Big Data research, technologies, use cases, and high performance computing within the Federal Government.
4. Collaborate across ACT-IAC on Big Data initiatives and events.
Agenda
10:00 - Introductions & New Members
10:15 - Action Items from Prior Meeting
10:30 - Committee Charter Overview
11:00 - Research & Discovery Updates
11:15 - Big Data Events
We look forward to seeing you there!
Sincerely,
John Geraghty, Chair
Victor Koo, Vice Chair
ET SIG
Big Data Committee Charter
Authority
Mission
Objectives
Membership
Areas within Scope
- Big Data Management – includes the collection, transformation, storage, analysis, and presentation of Big Data sets as described by the “3 V’s” that challenge traditional management tactics, techniques, methods, and procedures of storing and retrieving it to support analysis, production and distribution of actionable intelligence as the intended outcome. The 3 V’s include:
- Volume: vastly large data sets.
- Velocity: dynamic and streaming as they are ingested from dispersed locations.
- Variety: multiple data types and structures (unstructured text, full motion video, image, voice, machine based, transactional, biometric, sensor-based, etc.).
- Enterprise Data Integration and Storage – includes data marts, data warehouses, integrated data management, data virtualization, data federation, and other data management structures in use in the Federal Government. Automating the consolidated, selective extraction and collection of disparate authoritative external and internal source master and transactional data in all types (structured, semi-structured, and unstructured, transactional, and master data) where the data is stored, cleansed, transformed, standardized, and integrated, for access by analytical stakeholders for their actionable use in improved decision making.
- Business Intelligence – service access to enterprise information for operational and strategic reporting. This includes the development of operational reports, ad-hoc reports, dashboards, balanced scorecards, geospatial analyses, and “what-if” type of analytical reporting requiring multidimensional online analytical processing (MOLAP) capabilities. Business intelligence tools leverage the power of Big Data by making actionable information and analytic results available to every employee in the organization.
- Data Analytics – finding patterns, correlations, and actionable information from data using a variety of advanced mathematical algorithms, data mining, text mining, and predictive modeling. Data analytics services can be applied across enterprises to enable data-driven decision making, improve product development and service delivery, optimize workload and workforce allocation, improve budget execution, identify cost savings, and to combat fraud, waste, abuse, and other inefficiencies in government programs.
- Governance Services – standards definitions and best practices such as the development of a data management specific maturity model that helps agencies assess their readiness, business case, and data quality fitness as they prepare to embark upon or evaluate an ongoing data management initiative.
- Information Assurance – standards definitions and best practices to ensure the confidentiality, integrity, and availability of data and compliance with privacy regulations particularly when Big Data related nuances prevail or must be specifically addressed.
- Financial and Fiscal Management – incorporate financial management, reporting, investment, planning, and evaluation (i.e., TCO, ROI) to enhance budget, appropriations and fiscal accountability requirements consistent with statute, acts, regulation, and applicable policies.
Expected Outcomes
- Federal Government Focus Group sessions comprised of Big Data practitioners in the areas such as, but not limited to, finance, compliance, security, analytics, information assurance, privacy, legal, policy, and human capital.
- Town Hall panels of SMEs carefully planned, located, scheduled, selected, and moderated to provide insight of value to Big Data Practitioners.
- Development of an agile Big Data framework and maturity model from a technology perspective which can be matured as technology progresses.
- Support of a public collaboration site for ACT-IAC membership.
- Deliverables executed in accordance with the ACT-IAC Request for Assistance format.
Roles & Responsibilities
- Ensure the committee’s activities are executed consistent with the ACT-IAC defined mission, vision, and principles. See:
- http://www.actgov.org/about/missionv...s/default.aspx
- Receive, record, vet, prioritize, and respond to official Requests for Assistance.
- Exercise governance to ensure the committee’s work is performed subject to governing statutes (i.e., Federal Advisory Committee Act).
- Identify and assign willing and qualified volunteers to execute responsibilities and oversee results.
Meetings and Communications
- Co-Chairs – facilitate meetings and establishes workgroups to achieve committee objectives and regularly communicates with ET SIG leadership and government executives.
- Communications Chair – distributes communications to committee members with meeting logistics, updates the committee webpage, and creates text for announcements and other communications.
- Knowledge Capture Chair – takes minutes of all meetings and distributes to membership for corrections and additions; coordinates the creation and publication of any program content.
- Committee Liaison – attends meetings and events across ACT-IAC program areas, to include other SIGs, to ensure successful coordination of Big Data initiatives for the committee.
Key Relationships
- Emerging Technology SIG
- Enterprise Architecture SIG
- ACT-IAC Program Committee
- ACT-IAC conference planning committees
Timeframe and Schedule
Operating Principles
- Activities should advance Government
- Activities must be objective, ethical, and vendor neutral
- No business development or promotion
- Transparent and open to all interested ACT-IAC members
Governance Structure
Leadership
Timeline for Review and Approval
EMERGING TECHNOLOGY SIG NEWSLETTER
ET SIG Mission
- Constituency Served: Federal CXOs (CIOs/CTOs) and other Government executives (e.g., S&T Directors) responsible for identifying, assessing and deploying emerging technology and maturing it to become a major component of the IT & business strategy.
- Who is on this SIG: Industry, Government, academia and others within ACT-IAC that are involved with emerging technology and provide products, services, processes, and business models enabling innovative approaches to solving Government issues and challenges.
- What We Do: Come together to evaluate and react to game-changing events; share lessons learned from the Emerging Technology Process Model in the Federal space; form committees around and provide insight on high-potential technologies to accelerate awareness and adoption; actively monitor the emerging technology landscape.
News
- Jul 26 — ET SIG GAP meeting was held. The focus included discussing areas of interest, finalizing topics through the remainder of the year, and extending GAP membership commitments through June 2013.
- Aug — ET SIG submitted input and metrics for ACT-IAC Quadrennial Review.
- Sep — ET SIG invited 12 senior managers and executives to serve as Government advisory panelists for the 2012-2013 term.
- Sep — The Agile Committee collaborated with the Programs Committee to help plan the ACT-IAC Agile Development Executive Panel scheduled for September 21.
- Sep — The Big Data Committee is finalizing their charter for submission to the ET SIG for approval. Planned and Highlighted Events
- Sep 21 — SIG Agile Committee Overview — Please contact Sandi Van Valkenburg; sandra.vanvalkenburg@cgifederal.com
- Oct 18 — SIG Big Data Committee Overview — Please contact Victor Koo; victor.koo@k3-solutions.com
- Oct 28-30 — ACT-IAC Executive Leadership Conference– Theme: Charting a Course — Williamsburg, VA — Visit the ELC Homepage for more information.
- Nov — SIG Program on Semantic Web — Seeking Lead volunteer! — Please contact John Geraghty; jgeraghty@mitre.org
Announcements
- The ET SIG Big Data Committee is actively recruiting federal big data practitioners in the areas of finance, compliance, security, analytics, information assurance, privacy, legal, policy, and human capital. Please contact MileCorrigan; mile.corrigan@noblis.org or Johan Bos-Beijer; johan.bos-beijer@gsa.gov
- Seeking volunteers to serve as Leads to research and coordinate program events for the following ET SIG initiatives: Agile, Semantic Web, and ET Research. Please contact John Geraghty; jgeraghty@mitre.org
- Let us know your ideas! Visit us at our ET SIG Homepage or join our discussion on GovLoop. In addition, please contact John Shaw for general information; jshaw@actgov.org





















Comments