Table of contents
  1. Story
    1. DataAct Datathon
    2. OMB Standard Data Act Data Elements and 18F Data Act Pilot
    3. NSF Agency Financial Report and Grants Spreadsheet
    4. Seeking Participation in DATA Act - Open Data Standards Survey from Don Geiger
    5. DATA Act Survey - Delphi Round #1
      1. Section 1 - Tools to be used for the DATA Act Schema
        1. Question #1
        2. Question #2
        3. Questions #3
        4. Question #4
      2. Section 2 -- Multi-dimensional Elements
        1. Question #5
        2. Question #6
        3. Question #7
      3. Section 3 - Extensibility
        1. Question #8
        2. Question #9
        3. Question #10
      4. Section 4
        1. Question #11
    6. DATA Act Delphi Study – Round #1 Summary
      1. Question #1
      2. Question #2
      3. Question #3
      4. Question #4
      5. Question #6 & #7
      6. Questions #8, #9 and #10
    7. DATA Act Survey - Delphi Round #2
    8. MORE TO FOLLOW
  2. Slides
    1. Slide 1 Data Science for the DataActDatathon
    2. Slide 2 Data Mining – Data Science Process
    3. Slide 3 IAC/ACT DatathonSocrata Catalog 1
    4. Slide 4 IAC/ACT Datathon Socrata Catalog 2
    5. Slide 5 IAC/ACT Datathon Socrata Catalog 3
    6. Slide 6 Data Act File Inventory
    7. Slide 7 Data Act 2015 Spreadsheet
    8. Slide 8 Data Act 2015 Datathon-Spotfire1 Cover Page
    9. Slide 9 Data Science for Data Act Datathon Knowledge Base
    10. Slide 10 Data Act 2015 Datathon-Spotfire1 11 Data Sets
    11. Slide 11 Data Act 2015 Datathon-Spotfire2 Awards: All Contracts, 2010-2014
    12. Slide 12 Data Act 2015 Datathon-Spotfire3 Awards: 2010 through 2014 - All Direct Payments Full
    13. Slide 13 Conclusions and Recommendations
    14. Slide 14 Data Science for the Data Act at Treasury
  3. Spotfire Dashboard
  4. Research Notes
  5. Your Data Is Crap, and It Isn't Your Fault
  6. DATA Act Forum Datathon Call for Participants
    1. Call for Participants
    2. Logistics
    3. Summary of Data and Infrastructure
    4. Guidelines for Applicants
    5. Contact Information
    6. Skills
    7. Supporting Documentation
  7. DATA Act Forum-The Art of the Possible
    1. DATA Act Forum Agenda
    2. Data Zoo Technology Showcase
    3. Datathon
    4. Who Should Attend?
    5. Registration Fees
  8. DATA Act Forum Data Zoo Technology Showcase Application
    1. Call for Participation
    2. Logistics
    3. Contact Information
    4. Program Description
  9. Datathon Resources Available on Socrata
    1. Available Data Resources on ACT-IAC.demo.socrata.com
    2. Highlights of the Socrata Platform at ACT-IAC.demo.socrata.com
  10. Teradata Aster Discovery Analytics for the Datathon
  11. Entity Insight For Awardees: D&B DIRECT 2.0
    1. Why Test D&B DIRECT 2.0?
    2. Track Federal Spending More Effectively By
    3. Tackle Fraud, Waste, and Abuse By
    4. Simplify Reporting + Improving Data Quality
    5. Contact
  12. NSF Agency Financial Report
  13. NEXT

Data Science for the DataAct Datathon

Last modified
Table of contents
  1. Story
    1. DataAct Datathon
    2. OMB Standard Data Act Data Elements and 18F Data Act Pilot
    3. NSF Agency Financial Report and Grants Spreadsheet
    4. Seeking Participation in DATA Act - Open Data Standards Survey from Don Geiger
    5. DATA Act Survey - Delphi Round #1
      1. Section 1 - Tools to be used for the DATA Act Schema
        1. Question #1
        2. Question #2
        3. Questions #3
        4. Question #4
      2. Section 2 -- Multi-dimensional Elements
        1. Question #5
        2. Question #6
        3. Question #7
      3. Section 3 - Extensibility
        1. Question #8
        2. Question #9
        3. Question #10
      4. Section 4
        1. Question #11
    6. DATA Act Delphi Study – Round #1 Summary
      1. Question #1
      2. Question #2
      3. Question #3
      4. Question #4
      5. Question #6 & #7
      6. Questions #8, #9 and #10
    7. DATA Act Survey - Delphi Round #2
    8. MORE TO FOLLOW
  2. Slides
    1. Slide 1 Data Science for the DataActDatathon
    2. Slide 2 Data Mining – Data Science Process
    3. Slide 3 IAC/ACT DatathonSocrata Catalog 1
    4. Slide 4 IAC/ACT Datathon Socrata Catalog 2
    5. Slide 5 IAC/ACT Datathon Socrata Catalog 3
    6. Slide 6 Data Act File Inventory
    7. Slide 7 Data Act 2015 Spreadsheet
    8. Slide 8 Data Act 2015 Datathon-Spotfire1 Cover Page
    9. Slide 9 Data Science for Data Act Datathon Knowledge Base
    10. Slide 10 Data Act 2015 Datathon-Spotfire1 11 Data Sets
    11. Slide 11 Data Act 2015 Datathon-Spotfire2 Awards: All Contracts, 2010-2014
    12. Slide 12 Data Act 2015 Datathon-Spotfire3 Awards: 2010 through 2014 - All Direct Payments Full
    13. Slide 13 Conclusions and Recommendations
    14. Slide 14 Data Science for the Data Act at Treasury
  3. Spotfire Dashboard
  4. Research Notes
  5. Your Data Is Crap, and It Isn't Your Fault
  6. DATA Act Forum Datathon Call for Participants
    1. Call for Participants
    2. Logistics
    3. Summary of Data and Infrastructure
    4. Guidelines for Applicants
    5. Contact Information
    6. Skills
    7. Supporting Documentation
  7. DATA Act Forum-The Art of the Possible
    1. DATA Act Forum Agenda
    2. Data Zoo Technology Showcase
    3. Datathon
    4. Who Should Attend?
    5. Registration Fees
  8. DATA Act Forum Data Zoo Technology Showcase Application
    1. Call for Participation
    2. Logistics
    3. Contact Information
    4. Program Description
  9. Datathon Resources Available on Socrata
    1. Available Data Resources on ACT-IAC.demo.socrata.com
    2. Highlights of the Socrata Platform at ACT-IAC.demo.socrata.com
  10. Teradata Aster Discovery Analytics for the Datathon
  11. Entity Insight For Awardees: D&B DIRECT 2.0
    1. Why Test D&B DIRECT 2.0?
    2. Track Federal Spending More Effectively By
    3. Tackle Fraud, Waste, and Abuse By
    4. Simplify Reporting + Improving Data Quality
    5. Contact
  12. NSF Agency Financial Report
  13. NEXT

  1. Story
    1. DataAct Datathon
    2. OMB Standard Data Act Data Elements and 18F Data Act Pilot
    3. NSF Agency Financial Report and Grants Spreadsheet
    4. Seeking Participation in DATA Act - Open Data Standards Survey from Don Geiger
    5. DATA Act Survey - Delphi Round #1
      1. Section 1 - Tools to be used for the DATA Act Schema
        1. Question #1
        2. Question #2
        3. Questions #3
        4. Question #4
      2. Section 2 -- Multi-dimensional Elements
        1. Question #5
        2. Question #6
        3. Question #7
      3. Section 3 - Extensibility
        1. Question #8
        2. Question #9
        3. Question #10
      4. Section 4
        1. Question #11
    6. DATA Act Delphi Study – Round #1 Summary
      1. Question #1
      2. Question #2
      3. Question #3
      4. Question #4
      5. Question #6 & #7
      6. Questions #8, #9 and #10
    7. DATA Act Survey - Delphi Round #2
    8. MORE TO FOLLOW
  2. Slides
    1. Slide 1 Data Science for the DataActDatathon
    2. Slide 2 Data Mining – Data Science Process
    3. Slide 3 IAC/ACT DatathonSocrata Catalog 1
    4. Slide 4 IAC/ACT Datathon Socrata Catalog 2
    5. Slide 5 IAC/ACT Datathon Socrata Catalog 3
    6. Slide 6 Data Act File Inventory
    7. Slide 7 Data Act 2015 Spreadsheet
    8. Slide 8 Data Act 2015 Datathon-Spotfire1 Cover Page
    9. Slide 9 Data Science for Data Act Datathon Knowledge Base
    10. Slide 10 Data Act 2015 Datathon-Spotfire1 11 Data Sets
    11. Slide 11 Data Act 2015 Datathon-Spotfire2 Awards: All Contracts, 2010-2014
    12. Slide 12 Data Act 2015 Datathon-Spotfire3 Awards: 2010 through 2014 - All Direct Payments Full
    13. Slide 13 Conclusions and Recommendations
    14. Slide 14 Data Science for the Data Act at Treasury
  3. Spotfire Dashboard
  4. Research Notes
  5. Your Data Is Crap, and It Isn't Your Fault
  6. DATA Act Forum Datathon Call for Participants
    1. Call for Participants
    2. Logistics
    3. Summary of Data and Infrastructure
    4. Guidelines for Applicants
    5. Contact Information
    6. Skills
    7. Supporting Documentation
  7. DATA Act Forum-The Art of the Possible
    1. DATA Act Forum Agenda
    2. Data Zoo Technology Showcase
    3. Datathon
    4. Who Should Attend?
    5. Registration Fees
  8. DATA Act Forum Data Zoo Technology Showcase Application
    1. Call for Participation
    2. Logistics
    3. Contact Information
    4. Program Description
  9. Datathon Resources Available on Socrata
    1. Available Data Resources on ACT-IAC.demo.socrata.com
    2. Highlights of the Socrata Platform at ACT-IAC.demo.socrata.com
  10. Teradata Aster Discovery Analytics for the Datathon
  11. Entity Insight For Awardees: D&B DIRECT 2.0
    1. Why Test D&B DIRECT 2.0?
    2. Track Federal Spending More Effectively By
    3. Tackle Fraud, Waste, and Abuse By
    4. Simplify Reporting + Improving Data Quality
    5. Contact
  12. NSF Agency Financial Report
  13. NEXT

Story

Data Science for the DataAct Datathon

DataAct Datathon

Yesterday, Just by happenstance, I discovered the DATA Act Forum Datathon Call for ParticipantsDATA Act Forum-The Art of the Possible, and the DATA Act Forum Data Zoo Technology Showcase Application on July 27-28, and July 29, respectively.

I was fortunate to find the participant information: Datathon Resources Available on SocrataTeradata Aster Discovery Analytics for the Datathon, and Entity Insight For Awardees: D&B DIRECT 2.0. I converted these three PDF files to Word and added them to this MindTouch page.

I used the usual Data Mining – Data Science Process:

  • Data Mining Process:
    • Business Understanding
    • Data Understanding
    • Data Preparation
    • Modeling
    • Evaluation
    • Deployment
  • Data Science Process:
    • Data Preparation
    • Data Ecosystem
    • Data Story
  • Data Science Questions:
    • How was the data collected?
    • Where is the data stored?
    • What are the data results? and
    • Why should we believe the data results?

The Business and Data Understanding were specified by the Data Act Datathon organizers. The rest of the Data Mining - Data Science Process was mine to decide since I was not formally part of the Data Act Datathon, since I am not an IAC/ACT member.

My purpose was to mine the Socrata Data Catalog for an inventory in a spreadsheet, download all the data sets to curate them in Spotfire, and provide them as a data science data publication knowledge base for another Federal Big Data Working Group and Virginia Big Data Meetup, and the upcoming OSTP/NSF Data Science Meetup of Meetups. The initial process and results are document in the Slides below.

So my data ecosystem consisted of:

  • 5 PDF-2.5 MB
  • 14 CSV-49.3 GB
  • 3 Excel-6 MB
  • 1 TXT-2KB
  • 3 Word-62 KB

which was formatted in a linked data spreadheet for import to a Spotfire Dashboard

My Conclusions and Recommendations are:

  • The Federal Big data Working Group Meetup Data Mining – Data Science Process was Applied to the DataAct Datathon Data Sets.
  • A Data Ecosytem was Built by Downloading 19 Files from the IAC/ACT Datathon Socrata Catalog and Using  Spotfire to Inventory Their Characteristics in an Excel Spreadsheet.
  • There are many duplicate files in the IAC/ACT Datathon Socrata Catalog.
  • The 14 unique files were imported into 3 Spotfire files for analytics and visualizations.
  • Screen Capture Samples Are Shown to Help the Datathon Participants and in Preparation for Another Federal Big Data Working Group/Virginia Big Data Meetup on the Data Act.

The three events (July 27-29) will be summarized for our future meetup (Data Science for the Data Act at Treasury?) and this Data Science for the Data Act Datathon will be extended by our Data Act Data Science team to make recommendations to OMB and other agencies.

OMB Standard Data Act Data Elements and 18F Data Act Pilot

The next step is to render the data dictionaries and the OMB Standard Data Act Data Elements in spreadsheet form so we can begining the semantic harmonization and mediation process in Spotfire.

Recent Slides for Our Data Act Data Science Team

18F Data Act Pilot

Source: https://github.com/18F/data-act-pilot My Note: No datasets!

https://github.com/18F/data-act-pilo...%20process.png

SBAPIlotProcess.png

 

18F launches 'Digital Economy' consulting arm

The consulting practice within the General Services Administration's innovation lab will now help agencies solve digital challenges within specific policy verticals, beginning with finance. The group says the new "Digital Economy Practice" represents the first niche market in this experiment.

Announcing 18F Consulting’s Digital Economy Practice: Invited Chris Cairns: Director Consulting // Washington, D.C. to present at Meetup

Also invited Greg Godbot: Godbout was executive director of 18F, which is the General Services Administration's digital innovation lab that seeks to help agencies better interact with citizens and others. Last month, he became chief technology officer for the Environmental Protection Agency where he plans to launch a digital service team to work on cloud services, infrastructure and agile development. See Former 18F director urges federal government to provide citizens with better digital experience

NSF Agency Financial Report and Grants Spreadsheet

I got the NSF Agency Financial Report.  Source: http://www.nsf.gov/pubs/2015/nsf15002/pdf/nsf15002.pdf (PDF)

I thought this could be a data science data publication with the PDF tables as real data and probably more useful right now to understanding how and why NSF spends our money and is what the DataAct is really about, namely to make this information fully digital!

  • Step 1 is an agency has a report that is fully digital as a data science data publication.
  • Step 2 is that an agency has a spreadsheet that is fully compatible with the new DataAct Data Standards (we are trying to do the cross-walk)
  • Step 3 is that every major agency needs to do steps 1 and 2 so all those data science data publications and spreadsheets can be integrated.

So for the meetup on the 16th we are:

  • Curating the NSF financial data set and report;
  • Creating a user-centric digital services focused on the interaction between government and the people and businesses it serves (data science data publications and spreadsheets); and 
  • Socializing it in a Federal Community of Practice on Crowdsourcing and Citizen Science of Big Data that meets bi-monthly to share lessons learned and develop best practices for designing, implementing, and evaluating crowdsourcing and citizen science initiatives.

I will try to invite an NSF financial person, unless either of you already know one and could ask them.

I guess we could also pick one more agency like Treasury or USDA and do the same thing and see how interoperable the report and spreadsheets are.

I tried to copy and repurpose the NSF Agency Financial Report but it is password protected and cannot be copied. This is not what should happen with the DataAct!

Seeking Participation in DATA Act - Open Data Standards Survey from Don Geiger

The purpose of this email is to ask for your participation in a study (online survey) for my dissertation. I am seeking participants for the study who were colleagues of mine during my employment with the federal government at the Departments of Treasury and Interior and while working at the International Federation of Accountants. I am now a doctoral student in the Business Department at Argosy University-Sarasota. This study is a requirement to fulfill a Doctor of Business Administration (DBA) in the Graduate School of Business and Management.

 

My dissertation is about the DATA Act, the 2014 federal legislation that calls for improvements to accountability and transparency in the reporting of federal government spending. The DATA Act mandates the use of Internet based open data standards to improve the reporting of spending on government programs back to the citizens and policy makers. My research seeks to identify critical success factors in adopting open data standards for open government initiatives like the DATA Act.

 

The survey is part of a Delphi Study. The Delphi process will take place entirely via email and using Survey Monkey. There will be between 3 and 5 rounds of the, each round taking 15-30 minutes of your time. Each round will take place over a 1-week period, or earlier if all participants complete the survey. Those who have participated in a Delphi research process tend to say it was an intellectually stimulating experience.

 

To participate in the research, please respond to this email indicating your interest within one week, by August 20th. I will email you an Informed Consent Form for you to complete and return to me by email. The study is expected to begin the week of September 14.

 

All information and responses that you provide during the entire study will be confidential. When the results of the study are reported, you will not be identifiable in the findings since you will be allocated a unique code known only to me..

 

If you have any questions about your participation or any aspect of the study, please feel free to email me or phone me.

 

Thank you for considering participation in this study.

 

Best Regards,Don Geiger

 

DATA Act Survey - Delphi Round #1

https://www.surveymonkey.com/r/GWDZXJ2

Section 1 - Tools to be used for the DATA Act Schema

Question #1

The 4 tools listed below are those identified via public input from the US Treasury DATA Act outreach process. This question is asking you to identify any other tools that should be considered for the schema design and hosting. Four OPTIONAL lines are provided below for you to identify tools not specified. However, note that these are optional and not necessary if you feel the 4 tools identified by Treasury are adequate, no need to answer just move on to the next question.

Data Schema tools identified in Treasury process (in alphabetical order):
1) Comma Separated Value (CSV) Format FIRST AND ONLY
2) JavaScript Object Notation (JSON)
3) Extensible Business Reporting Language (XBRL)
4) Extensible Markup Language (XML)
5) Other (specify tool): 
6) Other (specify tool): 
7) Other (specify tool): 
8) Other (specify tool):

Question #2

Rank and explain your preferences of schema formatting tools. The first four listed are those identified via the US Treasury outreach process, four other lines are provided if other tools were identified in the previous question. 

Note: please note the following in reference to your ranking: 

  • You select a number from 1-8 indicating your preference, with 1 being your 1st choice, 2 your 2nd choice, 3 your 3rd choice, etc.
  • You may select more then one tool as your first choice, if you feel that they should be utilized consecutively.
  • You are asked to Rule Out (by selecting RO from the dropdown list) choices you feel that would be inadequate and should not be used for the DATA Act schema design and housing (note any tools you rule out you will be asked to provide a Why? explanation, later

Note also you will be asked to explain your 1st choice ranking, later in the process.
Rank
Comma Separated Value (CSV) Format 
Comma Separated Value (CSV) Format Rank menu
JavaScript Object Notation (JSON)
JavaScript Object Notation (JSON) Rank menu
Extensible Business Reporting Language (XBRL)
Extensible Business Reporting Language (XBRL) Rank menu
Extensible Markup Language (XML)
Extensible Markup Language (XML) Rank menu
Other 5 (if identified)
Other 5 (if identified) Rank menu
Other 6 (if identified)
Other 6 (if identified) Rank menu
Other 7 (if identified)
Other 7 (if identified) Rank menu
Other 8 (if identified)
Other 8 (if identified) Rank menu

Questions #3

Why? (please specify) In this area provide a detailed reason for your 1st choice selection (response can be 10 lines, 80 characters wide) MOST INTEROPERABLE

Question #4

In this area please provide a detailed explanation for any tools that you Ruled Out (RO) and feel should NOT be used for the DATA Act Schema. Provide explanations and examples of why they should not be used (response can be 10 lines, 80 characters wide).

JSON, XBRL, AND XML ARE TOO SPECIALIZED

Section 2 -- Multi-dimensional Elements

It is anticipated that the DATA Act Schema will be a representation of a number of sub-schemas, in an attempt to “standardize the way we represent financial assistance, contract and loan award data as well as financial data” (Treasury, 2015). In total, the combined schema represents a specific business concept around financial assistance, contracts, loans, either singularly or in combination, creating structured data oriented to U.S. Standard General Ledger (USSGL) accounting concepts. The draft DATA Act schema above describes 4 diverse areas of government spending (financial assistance, contract, loans, and financial data).

This section expands on the multi-dimensional needs of the DATA Act Schem.

Question #5

Identify other essential factors in the modeling of complex and multi-dimensional data relationships.

Four elements have been identified for your consideration and are listed below. However six other spaces are provided for you to identify other essential factors needed in structuring multi-dimensional schemas. 

1) Ability to establish relationships among data elements.
2) Ability to establish and maintain a mathematical link among data elements and numerical elements. 
3) Ability to apply rules or business rules to the data elements (example: A grant payment made from the National Science Foundation June 2015 payment file to an academic institution X in Seattle, Washington is a valid grant recipient).
4) Ability to provide data origination and tracking elements for audit purposes (example: this particular loan transaction came from the Department of Education June 2015 payment file).
5) Other Specify Factor: 
6) Other Specify Factor: 
7) Other Specify Factor: 
8) Other Specify Factor: 
9) Other Specify Factor: 
10) Other Specify Factor:

Question #6

Rank and explain your preferences of essential factors concerning the DATA Act multi-dimensional elements. The first four listed are those initially identified, six other lines are provided if other factors are identified in the previous question. 

Note: please note the following in reference to your ranking: 

  • You select a number from 1-10 indicating your preference, with 1 being your 1st choice, 2 your 2nd choice, 3 your 3rd choice, etc.
  • You may select more then one tool as your first choice, if you feel that they should be utilized consecutively.
  • You are asked to Rule Out (by selecting RO) choices you feel that would be inadequate and should not be used for the DATA Act schema design and housing (note any factors you rule out you will be asked to provide a Why? explanation, later

Note also you will be asked to explain the importance of your 1st choice ranking.
Ranking
1) Ability to establish relationships among data elements. FIRST AND ONLY
1) Ability to establish relationships among data elements. Ranking menu
2) Ability to establish and maintain a mathematical link among data elements and numerical elements
2) Ability to establish and maintain a mathematical link among data elements and numerical elements Ranking menu
3) Ability to apply rules or business rules to the data elements (example: A grant payment made from the National Science Foundation June 2015 payment file to an academic institution in Seattle, Washington is a valid grant recipient).
3) Ability to apply rules or business rules to the data elements (example: A grant payment made from the National Science Foundation June 2015 payment file to an academic institution in Seattle, Washington is a valid grant recipient). Ranking menu
4) Ability to provide data origination and tracking elements for audit purposes (example: this particular loan transaction came from the Department of Education June 2015 payment file).
4) Ability to provide data origination and tracking elements for audit purposes (example: this particular loan transaction came from the Department of Education June 2015 payment file). Ranking menu
5) Other - specified earlier
5) Other - specified earlier Ranking menu
6) Other - specified earlier
6) Other - specified earlier Ranking menu
7) Other - specified earlier
7) Other - specified earlier Ranking menu
8) Other - specified earlier
8) Other - specified earlier Ranking menu
9) Other - specified earlier
9) Other - specified earlier Ranking menu
10) Other - specified earlier
10) Other - specified earlier Ranking menu

Question #7

Why? (please specify) In this area provide a very detailed reason for your 1st choice selection. As well as other general comments:
This is the foundation for semantic Interoperability and then everything else is possible

Section 3 - Extensibility

The Department of the Treasury is taking the lead, in conjunction with the Office of Management and Budget (OMB) in the design and housing of the DATA Act Schema. We have seen the term extensibility, which is a concept that takes future growth within the model into consideration. This is an indication that the DATA Act Schema will need to be extended into the future, a monitoring system will need to be put into place that considers the ability to extend the model and the level of effort to include new areas within the data model.

Question #8

Provide examples of best practices in the areas of managing extensibility that may be useful to those designing and housing the DATA Act Schema. 

Best Practice #1
More is needed than a simple Schema in this case. What is required is a mapping of every individual agencies data elements to the DataAct Standards. First there will be mapping of existing to the DataAct Standards (we are doing this with the NSF 2014 Grants data), then the agencies will start to collect the data for the new DataAct Standards so mapping should not be necessary

Question #9

Provide examples of best practices in the areas of managing extensibility that may be useful to those designing and housing the DATA Act Schema. 

Best Practice #2
The Federal Big Data Working Group Meetup is working on examples.

Question #10

Provide examples of best practices in the areas of managing extensibility that may be useful to those designing and housing the DATA Act Schema. 

Best Practice #3

GovPath Solutions

Section 4

General Comments & Feedback from participants

Question #11

General Comment Area

This is a general comment area for the participant to provide additional feedback on matters they deem appropriate for consideration in the analysis and in the next round of the Delphi study.

I got the NSF Agency Financial Report.  Source: http://www.nsf.gov/pubs/2015/nsf1500...f/nsf15002.pdf (PDF)

I thought this could be a data science data publication with the PDF tables as real data and probably more useful right now to understanding how and why NSF spends our money and is what the DataAct is really about, namely to make this information fully digital!

  • Step 1 is an agency has a report that is fully digital as a data science data publication.
  • Step 2 is that an agency has a spreadsheet that is fully compatible with the new DataAct Data Standards (we are trying to do the cross-walk)
  • Step 3 is that every major agency needs to do steps 1 and 2 so all those data science data publications and spreadsheets can be integrated.

So for the meetup on the 16th we are:

  • Curating the NSF financial data set and report;
  • Creating a user-centric digital services focused on the interaction between government and the people and businesses it serves (data science data publications and spreadsheets); and 
  • Socializing it in a Federal Community of Practice on Crowdsourcing and Citizen Science of Big Data that meets bi-monthly to share lessons learned and develop best practices for designing, implementing, and evaluating crowdsourcing and citizen science initiatives.

I tried to copy and repurpose the NSF Agency Financial Report but it is password protected and cannot be copied. This is not what should happen with the DataAct!

DATA Act Delphi Study – Round #1 Summary

Question #1

Identified 9 potential data schema tools, 4 identified by Treasury and 5 additional tools by Delphi respondents that may be considered for use.

Result Table 1 - Alphabetical Order

 

 

Data Schema Tool

Identified by

1

Comma Separated Value (CSV) Format

Treasury

2

DATALOG

Respondent

3

Extensible Business Reporting Language (XBRL)

Treasury

4

Extensible Business Reporting Language – Global Ledger (XBRL-GL)

Respondent

5

Extensible Markup Language (XML)

Treasury

6

JavaScript Object Notation (JSON)

Treasury

7

PROLOG

Respondent

8

RDE

Respondent

9

Web Ontology Language (OWL)

Respondent

 

Question #2

Participants Ranked the Treasury provided schema-formatting tools by preferences. The results from 17 expert respondents are summarized below.

Selected as 1st Choice for Data Schema Tool

# of 1st Choice Responses *

 

%

Extensible Business Reporting Language (XBRL)

11

73%

Extensible Markup Language (XML)

4

29%

Comma Separated Value (CSV) Format

3

20%

JavaScript Object Notation (JSON)

1

7%

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Result Table 2 - Ranking Order

* note in round 1 respondents were allowed more then one 1st choice.

Question #3

articipants provide comments on why a particular tool should be the 1st choice, see responses below by tool.

Result Table 3 – Comments on why XBRL should be 1st Choice

Response #

Respondent Comment

R 001

The tool selected for the DATA Act schema must self-explain data element definitions and relationships. Only XBRL is fully capable of that. Treasury's current approach, focused on flat-file CSV uploads, may be necessary to start the implementation process and get agencies comfortable with the notion of preparing consolidated files, but CSV cannot be the ultimate answer.

R 002

XBRL is becoming de facto standard. XML is a pale shadow of XBRL.

R 003

I am somewhat familiar with XBRL and it is recommended for use for the Data Act by the AICPA. I am not that familiar with the others.

R 004

My understanding is that XBRL is able to provide the most complete data for full analysis.

R 005

1. XBRL is used by the SEC for reporting; if we ever wish to "marry" any financial data between Treasury and SEC, this could be helpful. 2. A growing number of people and organizations are using XBRL. 3. It is easy to export to Excel.

R 006

I don't think of these as tools as much as formats for implementing a data model. In general, the format should be as simple as possible to support data intake and validation. XBRL can be implemented in a format which is not notably more complex than XML, and best supports data validation of these choices. JSON and CSV are more simplified formats, but don't support validation as easily. However, given a robust data model, the distinction between these formats are perhaps not as important as they have been. For example, the XBRL Standards Board is exploring a data model to decouple XBRL from XML, and is working on a JSON implementation of the prototype model https://www.xbrl.org/news/open-information-model-call-for-participation/

R 007

XBRL is the tool that is most inline with the unique requirements of tracking and reporting of spending data in conjunction with core financial systems. XBRL has a proven record with banking financial reporting, corporate/SEC financial reporting in the USA and many other examples internationally.

R 008

My knowledge of the technical nuances of these tools is extremely limited and my answers will be of no value. Therefore, ranked them all 1.

R 009

Widely used.

R 010

XBRL is the only available data standard to adequately meet the requirements of the DATA Act. XBRL is an open standard, broadly used around the world for reporting various types of business and performance-related applications. Since the standard is platform independent and vendor neutral, it is interoperable among different systems and can be used for many different business reporting purposes. XBRL allows for files to be validated by software to ascertain that the structure of the files and the content of the data comply with certain rules. Business rules can also be written into XBRL validation software to check the content of the data and to test whether required information is reported, calculations are correct, and that the data is accurate.

R 011

XBRL together with the XBRL-GL taxonomy, is a language designed for reporting business data. It can be used to report details (with XBRL-GL) and generalized and even verbal information. What is required is a tool which can be built without too much difficulty to enable those who so not know XBRL to create the reports directly from corporate books and records.

Result Table 4 – Comments on why XML should be 1st Choice

Response #

Respondent Comment

R 012

XML is widely used and easy to implement. It also paves the way for other tools, in particular XBRL, which in the longer term would be the preferred tool. XML is a good way to pave the way for XBRL.

R 013

CSV, JSON, and XML are good at representing data, but they are not good at representing INFORMATION. What is necessary to effectively exchange information is the ability to exchange meaning. XBRL, PROLOG, DATALOG, RDF, and OWL are all expressive enough to represent meaning. CSV is not expressiveness enough. JSON and XML are just syntax, not languages. XBRL, PROLOG, DATALOG, RDF, and OWL are languages and more expressive.

R 015

My knowledge of the technical nuances of these tools is extremely limited and my answers will be of no value. Therefore, ranked them all 1.

R 016

Provides the ability for variable length messages that include the definition of the data included in the message. Message tag identifies the fields included which makes it possible to visually analyze the data .

 

Result Table 5 – Comments on why CSV should be 1st Choice

Response #

Respondent Comment

R 017

CSV is tried and true. It is still widely used and does not require changes. Other tools should be available and should be the advocated direction.

R 018

Most interoperable.

R 019

My knowledge of the technical nuances of these tools is extremely limited and my answers will be of no value. Therefore, ranked them all 1.

 

Result Table 6 – Comments on why JSON should be 1st Choice

Response #

Respondent Comment

R 020

My knowledge of the technical nuances of these tools is extremely limited and my answers will be of no value. Therefore, ranked them all 1.

 

Question #4

Participants provided comments on why a particular tool should be Ruled Out from consideration, see responses below by tool.

Result Table 7 – Comments on why a tool should be RULED OUT (RO)

Response #

Tool

Respondent Comment

R 021

JSON

JSON is not well understood yet.

R 022

CSV

CSV carries no semantic info.

R 023

General

I am not that familiar with the other tools described. XBRL has been around for some time and seems appropriate.

R 024

CSV

CSV tools are possible but do not provide enough sophistication to establish the business rules needed in the tracking of the multi-dimension aspects of complex federal environments. XML is ruled out because XBRL includes the necessary aspects of XML. However XML by itself does not provide the added benefits of XBRL.

R 025

JSON, XBRL, XML

JSON, XBRL, AND XML are too specialized.

R 026

XML

I have ruled out XML because it is a general language. XBRL is the XML language which should be used and many of the features that have been built and are being built into XBRL would have to be rebuilt to make XML usable.

R 027

General

Technical tools which force business professionals articulating data to work at the technical level should be ruled out. What is needed is higher-level tools which inherently understand languages, but hide the technical details in the background so business professionals don't need to bother with the technical details. Business professionals get the technical details correct because the software forces the technical details to be correct, in the background.

 

Question #6 & #7

concerned multi-dimensional elements, and

Questions #8, #9 and #10

addressed best practices in the areas of managing extensibility. The responses to these questions provided extensive insight and will be addressed in a later round.

The objective of Round #2 will be to allow participants to see responses and rankings from other respondents and allow all participants to re-rank based upon this as a means of consensus building.

DATA Act Survey - Delphi Round #2Edit section

https://www.surveymonkey.com/r/6PJN6TV

This section is designed as an informational collaboration round. Participants are able to consider the comments and rankings from other respondents, perhaps resulting in a modification to ones own opinion.

Therefore Section 1 will review comments from other participants on the 1st choice selection, and allow for consideration and further comment.

2. Question #1

Comment boxes #1 - #4 are provided for these comments. In this area respond to any comments made in Round 1 that you feel need further discussion. You may respond in agreement or disagreement to another respondent to the reasons given for selecting a tool to be used to design of the the DATA Act Schema. Provide explanations and examples of why you agree or disagree with the previous comments (response can be 10 lines, 80 characters wide). This area is for you to convey important thoughts and ideas based upon the comments of others. 1-4 are optional once you have captured all your ideas you may move on to the ranking in Question #2.

The response # refers to the unique # provided in the Round #1 summary document.

Comment #1 response to (begin by typing the unique Response # ie R 005-:

I do not think DataAct Implementation is really a tool or format problem

I think the format should be CSV and rull-out all the other formats

MORE TO FOLLOW

Slides

Slides

Slide 1 Data Science for the DataActDatathon

Semantic Community

Data Science

Data Science for the DataActDatathon

BrandNiemann07282015Slide1.PNG

Slide 2 Data Mining – Data Science Process

BrandNiemann07282015Slide2.PNG

Slide 3 IAC/ACT DatathonSocrata Catalog 1

https://act-iac.demo.socrata.com/

BrandNiemann07282015Slide3.PNG

Slide 4 IAC/ACT Datathon Socrata Catalog 2

https://act-iac.demo.socrata.com/browse

BrandNiemann07282015Slide4.PNG

Slide 5 IAC/ACT Datathon Socrata Catalog 3

https://act-iac.demo.socrata.com/view/97wk-ux99

BrandNiemann07282015Slide5.PNG

Slide 6 Data Act File Inventory

BrandNiemann07282015Slide6.PNG

Slide 7 Data Act 2015 Spreadsheet

DataAct2015.xlsx

BrandNiemann07282015Slide7.PNG

Slide 8 Data Act 2015 Datathon-Spotfire1 Cover Page

Semantic Community Data Science Data Science for the DataAct Datathon

BrandNiemann07282015Slide8.PNG

Slide 9 Data Science for Data Act Datathon Knowledge Base

BrandNiemann07282015Slide9.PNG

Slide 10 Data Act 2015 Datathon-Spotfire1 11 Data Sets

Web Player 857 MB

BrandNiemann07282015Slide10.PNG

Slide 11 Data Act 2015 Datathon-Spotfire2 Awards: All Contracts, 2010-2014

4.5 GB

BrandNiemann07282015Slide11.PNG

Slide 12 Data Act 2015 Datathon-Spotfire3 Awards: 2010 through 2014 - All Direct Payments Full

1.6 GB

BrandNiemann07282015Slide12.PNG

Slide 13 Conclusions and Recommendations

BrandNiemann07282015Slide13.PNG

Slide 14 Data Science for the Data Act at Treasury

http://www.meetup.com/Virginia-Big-Data-Meetup/events/218682974/

BrandNiemann07282015Slide14.PNG

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Error: Embedded data could not be displayed. Use Google Chrome

Research Notes

DATA Act Forum Datathon Call for Participants
https://actiac.org/groups/data-act-forum-datathon

ACT-IAC Datathon Question
email Datathon leaders:
Kienast_Kathryn@bah.com ALSO KIRK BOURNE
herschel.chandler@iui.com; jshaw@actiac.org

FBDWG Meetup Email 7/27/2015
Call for Data Sets - Got Response from Steve Hanmer

DATA Act Forum - 7/29/2015 - The Art of the Possible
https://actiac.org/groups/event/data...-forum-7292015

http://datathon.ideascale.com/
Login Required

DATA Act Forum Data Zoo Technology Showcase Application
https://actiac.org/groups/data-act-forum-data-zoo

ACT-IAC Data Zoo Question
Kienast_Kathryn@bah.com; herschel.chandler@iui.com; jshaw@actiac.org

Registration Fees
Government Members: No charge
IAC Members: $220
Non-Members: $325

AGA’s DATA Act Information Hub
https://www.agacgfm.org/DataActHub

FEDERAL SPENDING TRANSPARENCY
DATA Act and FFATA Collaboration Space
http://fedspendingtransparency.githu.../dataelements/

Improve the quality of data submitted to USASpending.gov by holding agencies accountable
http://usaspending.gov/
https://www.usaspending.gov/Download...s/default.aspx
This GitHub page is where we will engage with non-federal stakeholders and the public at large.
http://fedspendingtransparency.github.io/

https://datathon.hackpad.com/

3 PDF Files - DOWNLOAD, CONVERT, and LINK

https://act-iac.demo.socrata.com/

http://http/developer.teradata.com/ DNW

https://developer.dnb.com/register-v2

Socrata Catalog to Excel Spreadsheet

Your Data Is Crap, and It Isn't Your Fault

Source: https://www.govloop.com/your-data-is...nt-your-fault/

How many agencies are in the federal government?

Seems like a straightforward question, right? Well, it depends on whom you ask. Hudson Hollister, Founder and Executive Director of the Data Transparency Coalition, recalled his days working as a congressional staffer for Rep. Darrell Issa and searching for an answer to that very question.

“One day I called up the Congressional Research Service, the Government Accountability Office and the Office of Management and Budget, and I asked them how many agencies we had, and I got three different answers,” Hollister said.

“If we are not even able to keep track of agencies and identify them consistently, … then of course we’re not doing it for grantees and contractors,” Hollister explained. “Of course, we’re not doing it for programs. We don’t have consistent data fields on any of these things.”

One of the main reasons for these inconsistencies in data is because “our government was built in an era when we didn’t do that,” he added. “It is nobody’s job to set up consistent data fields.”

That is until now. Thanks the to Digital Accountability and Transparency (DATA) Act, which the president signed into law last May, the Treasury Department and OMB are taking the lead to set a reporting standard for federal spending data. (Read more about the DATA Act here.)

As a staffer, Hollister wrote much of what is now the DATA Act, plus or minus some additions and subtractions. Speaking to a packed house of mostly government attendees at a GovLoop training event Wednesday, Hollister joined panelist from the Recovery Accountability and Transparency Board (RAT Board) and General Services Administration to talk data.

One question posed to each panelist was this: Where do we stand on data?

“Your data is crap, and it isn’t your fault,” Hollister said to the tune of audience chuckles. He encouraged feds who are in charge of data reporting requirements and figuring out what data fields to use to look outside of their agency when making those decisions. One perk is it becomes easier to match data reported by contractors to your agency with data reported to other organizations like financial regulators.

Using data internally to make decisions and provide transparency is one thing. But there’s a lot of pushback when it comes to sharing that data outside of agency walls, especially when it’s going to be shared publicly.

One of the biggest pushbacks is the concern about privacy and security, said Hemanth Setty, Chief Information Officer at the RAT Board. That’s a valid concern, which is why it takes a team of people, including chief information security officers, data architects, general counsel, and others to ensure data is properly managed. The agency has been hailed as a model for the successful use of a governmentwide analytics platform, open data and transparency. Learn what you can from the RAT Board now because the agency sunsets Sept. 30.

Much like Setty, Katherine Pearlman, Data Analytics Specialist at GSA, knows the power that comes from working collectively. There’s a big shift happening in government now where agencies are approaching her office and actually wanting to share their data with GSA to better understand its value and glean critical business insights in key areas, such as financial and IT management, human capital, contracting and real property. Agencies can also compare how they measure up to other agencies of similar sizes, budgets and missions to understand where they can improve in these key areas and to share best practices with those agencies.

“It’s a carrot and not a stick, and it shows agencies the value of their data and how they can actually save money,” Pearlman said.

GSA is also standing up what it calls a data-to-decisions environment. In a nutshell, this data warehouse capability will be used across the agency, and all data sets will be brought to the environment, Pearlman explained. Doing so will allow GSA to look across data sets and create a master list of agency codes, addresses and other information that can be used as a validation point for different data sets. The new environment will also enable GSA to combine varying data sets and use that information across business and policy lines.

“Hopefully, we will be able to start demonstrating the success of data and how it can be used, and it’s something that can be applied not just in GSA but working with other agencies as well,” she said.

DATA Act Forum Datathon Call for Participants

Source: https://actiac.org/groups/data-act-forum-datathon

As the DATA Act makes federal financial data “accessible, discoverable, and useable” for public consumption, opportunities are emerging to leverage this data within the government and in the private sector. The Datathon will give forum participants the chance to see data at work. Volunteers will gather at a shared workspace two days prior to the event, and form teams to explore research questions inspired by the DATA Act. The teams will use open data sources to build knowledge and share that knowledge with other participants through data visualizations, infographics, videos, and narratives. Insights from the Datathon will become part of the Data Zoo, and a panel discussion at the forum, offering participants a hand’s on practitioner’s perspective on the value of open data.

Call for Participants

Bring your problem-solving expertise to the ACT-IAC DATA Act Forum Datathon July 27 – 28. ACT-IAC is seeking experts capable of demonstrating the “art of the possible” with government data. Work side-by-side with other data professionals to conduct inquiries into enabling taxpayers and officials to more effectively track spending; prevent fraud, waste, and abuse; and reduce regulatory burden.  Teams will be supported with data resources, including access to a database with much of the currently available public data described in the DATA Act, as well as access to commercial data services generously donated by IAC members.   

The goal of the Datathon is to build knowledge from open data and inspire forum participants to realize greater value from data, particularly financial open data. Please volunteer today by completing this form.  Volunteers will be selected to serve in the roles of researchers, developers, and managers. Needed skills include:

  • Public sector program expertise related to contracts, grants, loans, and financial assistance as well as measuring program outcomes;
  • Programming, data management, and data visualization (e.g. Python, R, SAS, SQL, Pig, Hive, Scala, JavaScript… Cloud Services, Web Applications, Relational Databases, Distributed File Systems, Cluster Computing… Vector Graphics Applications, RShiny, D3, Tableau, QlikView, etc.);
  • Statistics and mathematics (e.g. sampling, distribution analysis, hypothesis testing, multivariate analysis, time series analysis, classification algorithms, clustering algorithms, natural language processing, etc.).

Submissions will only be accepted from ACT-IAC member organizations, 501(c)(3) non-profits, accredited academic instutions, and DATA Act Forum attendees. Please note that all full-time government employees are considered to be ACT members (there is no fee for ACT membership), and that all employees of an IAC member organization are considered to be members in good standing. 

Logistics

The Datathon will run from 9:00 AM to 5:00 PM on both Monday July 27th and Tuesday July 28th at a location in Metro DC. Volunteers will bring their own laptops and receive instructions on how to connect with data resources. Food, wifi, and power strips will be provided.

Summary of Data and Infrastructure

Datathon participants will have access to rich datasets that are related to the DATA Act, as well as robust technology platforms including a Teradata Virtual Machine and Aster Platform, D&B’s Direct 2.0 API, and Socrata’s Community Data Platform.  Participants are not limited to using this infrastructure. 

Below are the base dataset details that will be provided to Datathon participants to show the art of the possible:

1.      Budget - 4400 rows (Agency, Bureau, Account Name, TAS Agency, Account Code, 1976 thru 2020)

2.      CFDA - 4000 rows

3.      Object Class by Agency - 1000 rows

4.      Outlays - 5000 rows (Agency, Bureau, Account Name, TAS Agency, Account Code, 1976 thru 2020)

5.      TAS-BETC - 300k rows (TAFS)

6.      SF133 (2010-2014) - 250k rows * 5 = 1 mil rows (Agency, Bureau, Account, TAS Agency, Account Code, Line Number, Description, Quarter breakdown of amounts)

7.      Awards - 20 mil records approx. (250 columns) - contracts, grants, direct payments, insurance, loans, other assistance

Participants may augment this data with data from other open data sources or APIs.

A Teradata VMware VM will be loaded into AWS. This would act as a relational database to house the data tables listed above. Teradata Aster database and Discovery Platform will also be provided, an analytics tool that utilizes patented SQL-MapReduce to parallelize the processing of data and applications and deliver rich analytic insights at scale.

The Dun & Bradstreet D&B Direct 2.0 application programming interface is available. Using D&B Direct, Datathon developers will be able to search, match, cleanse, and enrich transparency data sources with contact data, firmographics, corporate linkage, financials, risk scores, and predictive analytics.

The Socrata Community Data Platform is provided or use. The Open Data APIs for Community Groups includes Data Hosting to import datasets into the Socrata platform to be searched, filtered, commented on and shared on social networks. Additionally, API Hosting is provided to create customized and managed APIs for datasets.

The event will likely be recorded, so please be prepared to sign waivers. Finally, research questions will be developed in advance of the Datathon and collected by the DATA Act project community. 

Please click here to email Datathon leaders with any questions you may have, or to request to join the project community.

Click here to register for the DATA Act – The Art of the Possible Forum.

Guidelines for Applicants

Please submit your application before 5:00 pm, July 17, 2015. Selections will be based on a consideration of skillsets, fair opportunity for member company participation, space available and the time the application was received so please tell us a bit about your skills and get your application in as soon as possible.  Selections will be made by leaders of DATA Act Project and Financial Management SIG Chairs.  Volunteers will form their own teams at the event (and swaps between teams are ok) with an emphasis on organizational diversity (no company teams). The Datathon is a collaborative effort to build as much knowledge as possible in the short time allotted. Outstanding accomplishments with be recognized; however, the Datathon is not a competition so come prepared to play nice and have fun.

Contact Information

 
 
 
 

Skills

Please describe your skiils in the following areas

Page statistics
1826 view(s) and 39 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments