Table of contents
  1. Story
  2. Slides
    1. Slide 1 Cover Page
    2. Slide 2 Premium GeoNames
    3. Slide 3 FEA
    4. Slide 4 ESRI World Country Shape File
  3. Spotfire Dashboard
  4. Research Notes
  5. Project Open Data
    1. Open Data Policy — Managing Information as an Asset
      1. 1. Background
      2. 2. Definitions
      3. 3. Implementation Guidance
      4. 4. Tools
      5. 5. Resources
      6. 6. Case Studies
  6. Open Data Glossary
    1. API
    2. API Analytics
    3. API Documentation
    4. Application Library
    5. Basic Auth
    6. Catalog
    7. Code Library
    8. Content API
    9. CSV
    10. Data
    11. /Data page
    12. Dataset
    13. /Developer page
    14. Database
    15. Endpoint
    16. Error Response Code
    17. GitHub
    18. Hackathon
    19. Information
    20. Information Life Cycle
    21. Information System
    22. Information System Life Cycle
    23. JSON
    24. JSONP
    25. Machine-Readable File
    26. Metadata
    27. OAuth
    28. Open Source Software
    29. Open Standard
    30. Parameter
    31. RDF
    32. REST
    33. RSS
    34. Schema
    35. SDK
    36. Service-Oriented-Architecture
    37. SOAP
    38. Swagger
    39. Terms of Service
    40. TSV
    41. Unstructured Data
    42. Web Service
    43. WSDL
    44. XML
  7. Frequently Asked Questions
    1. Project Open Data
      1. What problem does this solve?
      2. How does it solve that problem?
      3. Where do I come in?
      4. How can I contribute?
        1. Easy
        2. Advanced
      5. Can I use the project’s content or source code elsewhere?
      6. Who can participate in Project Open Data?
      7. Are my interactions to this project subject to any special privacy considerations?
      8. Who is in charge of Project Open Data?
      9. Can I create a new page?
      10. How long will I have to wait to get a response to my suggested change (i.e., pull request)?
    2. IRM Strategic Plans
      1. What is an IRM Strategic Plan?
      2. How do the IRM plans relate to the open data policy?
    3. Machine Readable and Open Formats
      1. Does PDF meet the “machine readable and open format” requirement?
    4. Metadata
      1. What is the relationship of the metadata standard (specifically) to NIEM, ISE, FGDC, and other existing (especially official) government data standards?
      2. What is a “persistent identifier”?
      3. Who established the common core metadata schema?
      4. How can I recommend changes and improvements to the metadata schema?
      5. Can I extend the metadata schema beyond the terms specified in the common core metadata schema?
    5. Security, Privacy and Data Quality
      1. Who is responsible for ensuring that datasets published in the agency.gov/data page (and subsequently Data.gov) meet each agency’s requirements for security and privacy and quality?
      2. How can I contact the Data.gov staff for assistance in conducting mosaic effect reviews?
    6. Public Data Listing
      1. What is the value to the government in placing metadata at agency.gov/data?
      2. How will agency.gov/open, /developer, and /data pages work together?
      3. What is the relationship of the /data page and public data listing to Data.gov, and how will this impact current Data.gov processes?
      4. Are redirects allowed for /data pages?
      5. What options exist for hosting the /data.json file specifically at agency.gov/data.json?
      6. How do I get started building this /data file?
      7. How should I manage this /data file?
      8. What formats are required/recommended for the agency.gov/data file?
    7. Agency participation with Open Data
      1. What are some of the ways that agencies can become more involved with Open Data?
    8. Scope
      1. How should agencies prioritize making improvements to existing systems and data?
      2. Which agencies are required to implement this policy?
    9. Timeline
      1. How long do agencies have to implement the policy?
    10. National Information Exchange Model (NIEM)
      1. What is the relationship between NIEM and the efforts underway for the Digital Government Strategy, The Open Data Policy, and Data.gov?
      2. What is NIEM?
      3. Has the NIEM community embraced the DGS/ODP direction?
      4. Does NIEM conform to the DGS/ODP requirements?
  8. Supplemental Guidance on the Implementation of M-13-13 “Open Data Policy – Managing Information as an Asset”
    1. I. Introduction
    2. II. Policy Requirements
      1. A. Create and Maintain an Enterprise Data Inventory
        1. Purpose
        2. Framework to Create and Maintain the Enterprise Data Inventory: Expand, Enrich, Open
        3. Minimum Requirements to Create and Maintain an Enterprise Data Inventory
          1. Develop and Submit to OMB an Inventory Schedule (by November 1, 2013)
          2. Create an Enterprise Data Inventory (by November 1, 2013)
          3. Maintain the Enterprise Data Inventory (ongoing after November 1, 2013)
          4. Tools and Resources on Project Open Data
      2. B. Create and Maintain a Public Data Listing
        1. Purpose
        2. Minimum Requirements to Create and Maintain a Public Data Listing
      3. C. Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
        1. Purpose
        2. Minimum Requirements to Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
      4. D. Document if Data Cannot be Released
        1. Purpose
        2. Minimum Requirements to Document if Data Cannot be Released
      5. E. Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        1. Purpose
        2. Minimum Requirements to Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        3. Tools and Resources on Project Open Data
      6. III. Summary of Agency Actions and Reporting Requirements
    3. Appendix
      1. Enterprise Data Inventory Enrichment Examples
    4. Footnotes
      1. 1
      2. [^2]
      3. [^3]
      4. [^4]
      5. [^5]
      6. [^6]
      7. [^7]
      8. [^8]
      9. [^9]
      10. [^10]
      11. [^11]
      12. [^12]
      13. [^13]
      14. [^14]
      15. [^15]
      16. [^16]
      17. [^17]
      18. [^18]
      19. [^19]
      20. [^20]
      21. [^21]
      22. [^22]
      23. [^23]
      24. [^24]
      25. [^25]
      26. [^26]
      27. [^27]
      28. [^28]
      29. [^29]
      30. [^30]
      31. [^31]
  9. GeoNames
    1. Info
    2. Free Gazetteer Data
    3. Free Postal Code Data
    4. Premium Data
  10. Federal Enterprise Architecture (FEA)
    1. Guidance
    2. FEA Reference Models
    3. Historical Information
    4. Management Tools
    5. Communities
    6. Success Stories
  11. Performance Reference Model
    1. Introduction
    2. Taxonomy Structure Description
    3. Description of columns
  12. NEXT

Project Open Data

Last modified
Table of contents
  1. Story
  2. Slides
    1. Slide 1 Cover Page
    2. Slide 2 Premium GeoNames
    3. Slide 3 FEA
    4. Slide 4 ESRI World Country Shape File
  3. Spotfire Dashboard
  4. Research Notes
  5. Project Open Data
    1. Open Data Policy — Managing Information as an Asset
      1. 1. Background
      2. 2. Definitions
      3. 3. Implementation Guidance
      4. 4. Tools
      5. 5. Resources
      6. 6. Case Studies
  6. Open Data Glossary
    1. API
    2. API Analytics
    3. API Documentation
    4. Application Library
    5. Basic Auth
    6. Catalog
    7. Code Library
    8. Content API
    9. CSV
    10. Data
    11. /Data page
    12. Dataset
    13. /Developer page
    14. Database
    15. Endpoint
    16. Error Response Code
    17. GitHub
    18. Hackathon
    19. Information
    20. Information Life Cycle
    21. Information System
    22. Information System Life Cycle
    23. JSON
    24. JSONP
    25. Machine-Readable File
    26. Metadata
    27. OAuth
    28. Open Source Software
    29. Open Standard
    30. Parameter
    31. RDF
    32. REST
    33. RSS
    34. Schema
    35. SDK
    36. Service-Oriented-Architecture
    37. SOAP
    38. Swagger
    39. Terms of Service
    40. TSV
    41. Unstructured Data
    42. Web Service
    43. WSDL
    44. XML
  7. Frequently Asked Questions
    1. Project Open Data
      1. What problem does this solve?
      2. How does it solve that problem?
      3. Where do I come in?
      4. How can I contribute?
        1. Easy
        2. Advanced
      5. Can I use the project’s content or source code elsewhere?
      6. Who can participate in Project Open Data?
      7. Are my interactions to this project subject to any special privacy considerations?
      8. Who is in charge of Project Open Data?
      9. Can I create a new page?
      10. How long will I have to wait to get a response to my suggested change (i.e., pull request)?
    2. IRM Strategic Plans
      1. What is an IRM Strategic Plan?
      2. How do the IRM plans relate to the open data policy?
    3. Machine Readable and Open Formats
      1. Does PDF meet the “machine readable and open format” requirement?
    4. Metadata
      1. What is the relationship of the metadata standard (specifically) to NIEM, ISE, FGDC, and other existing (especially official) government data standards?
      2. What is a “persistent identifier”?
      3. Who established the common core metadata schema?
      4. How can I recommend changes and improvements to the metadata schema?
      5. Can I extend the metadata schema beyond the terms specified in the common core metadata schema?
    5. Security, Privacy and Data Quality
      1. Who is responsible for ensuring that datasets published in the agency.gov/data page (and subsequently Data.gov) meet each agency’s requirements for security and privacy and quality?
      2. How can I contact the Data.gov staff for assistance in conducting mosaic effect reviews?
    6. Public Data Listing
      1. What is the value to the government in placing metadata at agency.gov/data?
      2. How will agency.gov/open, /developer, and /data pages work together?
      3. What is the relationship of the /data page and public data listing to Data.gov, and how will this impact current Data.gov processes?
      4. Are redirects allowed for /data pages?
      5. What options exist for hosting the /data.json file specifically at agency.gov/data.json?
      6. How do I get started building this /data file?
      7. How should I manage this /data file?
      8. What formats are required/recommended for the agency.gov/data file?
    7. Agency participation with Open Data
      1. What are some of the ways that agencies can become more involved with Open Data?
    8. Scope
      1. How should agencies prioritize making improvements to existing systems and data?
      2. Which agencies are required to implement this policy?
    9. Timeline
      1. How long do agencies have to implement the policy?
    10. National Information Exchange Model (NIEM)
      1. What is the relationship between NIEM and the efforts underway for the Digital Government Strategy, The Open Data Policy, and Data.gov?
      2. What is NIEM?
      3. Has the NIEM community embraced the DGS/ODP direction?
      4. Does NIEM conform to the DGS/ODP requirements?
  8. Supplemental Guidance on the Implementation of M-13-13 “Open Data Policy – Managing Information as an Asset”
    1. I. Introduction
    2. II. Policy Requirements
      1. A. Create and Maintain an Enterprise Data Inventory
        1. Purpose
        2. Framework to Create and Maintain the Enterprise Data Inventory: Expand, Enrich, Open
        3. Minimum Requirements to Create and Maintain an Enterprise Data Inventory
          1. Develop and Submit to OMB an Inventory Schedule (by November 1, 2013)
          2. Create an Enterprise Data Inventory (by November 1, 2013)
          3. Maintain the Enterprise Data Inventory (ongoing after November 1, 2013)
          4. Tools and Resources on Project Open Data
      2. B. Create and Maintain a Public Data Listing
        1. Purpose
        2. Minimum Requirements to Create and Maintain a Public Data Listing
      3. C. Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
        1. Purpose
        2. Minimum Requirements to Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
      4. D. Document if Data Cannot be Released
        1. Purpose
        2. Minimum Requirements to Document if Data Cannot be Released
      5. E. Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        1. Purpose
        2. Minimum Requirements to Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        3. Tools and Resources on Project Open Data
      6. III. Summary of Agency Actions and Reporting Requirements
    3. Appendix
      1. Enterprise Data Inventory Enrichment Examples
    4. Footnotes
      1. 1
      2. [^2]
      3. [^3]
      4. [^4]
      5. [^5]
      6. [^6]
      7. [^7]
      8. [^8]
      9. [^9]
      10. [^10]
      11. [^11]
      12. [^12]
      13. [^13]
      14. [^14]
      15. [^15]
      16. [^16]
      17. [^17]
      18. [^18]
      19. [^19]
      20. [^20]
      21. [^21]
      22. [^22]
      23. [^23]
      24. [^24]
      25. [^25]
      26. [^26]
      27. [^27]
      28. [^28]
      29. [^29]
      30. [^30]
      31. [^31]
  9. GeoNames
    1. Info
    2. Free Gazetteer Data
    3. Free Postal Code Data
    4. Premium Data
  10. Federal Enterprise Architecture (FEA)
    1. Guidance
    2. FEA Reference Models
    3. Historical Information
    4. Management Tools
    5. Communities
    6. Success Stories
  11. Performance Reference Model
    1. Introduction
    2. Taxonomy Structure Description
    3. Description of columns
  12. NEXT

  1. Story
  2. Slides
    1. Slide 1 Cover Page
    2. Slide 2 Premium GeoNames
    3. Slide 3 FEA
    4. Slide 4 ESRI World Country Shape File
  3. Spotfire Dashboard
  4. Research Notes
  5. Project Open Data
    1. Open Data Policy — Managing Information as an Asset
      1. 1. Background
      2. 2. Definitions
      3. 3. Implementation Guidance
      4. 4. Tools
      5. 5. Resources
      6. 6. Case Studies
  6. Open Data Glossary
    1. API
    2. API Analytics
    3. API Documentation
    4. Application Library
    5. Basic Auth
    6. Catalog
    7. Code Library
    8. Content API
    9. CSV
    10. Data
    11. /Data page
    12. Dataset
    13. /Developer page
    14. Database
    15. Endpoint
    16. Error Response Code
    17. GitHub
    18. Hackathon
    19. Information
    20. Information Life Cycle
    21. Information System
    22. Information System Life Cycle
    23. JSON
    24. JSONP
    25. Machine-Readable File
    26. Metadata
    27. OAuth
    28. Open Source Software
    29. Open Standard
    30. Parameter
    31. RDF
    32. REST
    33. RSS
    34. Schema
    35. SDK
    36. Service-Oriented-Architecture
    37. SOAP
    38. Swagger
    39. Terms of Service
    40. TSV
    41. Unstructured Data
    42. Web Service
    43. WSDL
    44. XML
  7. Frequently Asked Questions
    1. Project Open Data
      1. What problem does this solve?
      2. How does it solve that problem?
      3. Where do I come in?
      4. How can I contribute?
        1. Easy
        2. Advanced
      5. Can I use the project’s content or source code elsewhere?
      6. Who can participate in Project Open Data?
      7. Are my interactions to this project subject to any special privacy considerations?
      8. Who is in charge of Project Open Data?
      9. Can I create a new page?
      10. How long will I have to wait to get a response to my suggested change (i.e., pull request)?
    2. IRM Strategic Plans
      1. What is an IRM Strategic Plan?
      2. How do the IRM plans relate to the open data policy?
    3. Machine Readable and Open Formats
      1. Does PDF meet the “machine readable and open format” requirement?
    4. Metadata
      1. What is the relationship of the metadata standard (specifically) to NIEM, ISE, FGDC, and other existing (especially official) government data standards?
      2. What is a “persistent identifier”?
      3. Who established the common core metadata schema?
      4. How can I recommend changes and improvements to the metadata schema?
      5. Can I extend the metadata schema beyond the terms specified in the common core metadata schema?
    5. Security, Privacy and Data Quality
      1. Who is responsible for ensuring that datasets published in the agency.gov/data page (and subsequently Data.gov) meet each agency’s requirements for security and privacy and quality?
      2. How can I contact the Data.gov staff for assistance in conducting mosaic effect reviews?
    6. Public Data Listing
      1. What is the value to the government in placing metadata at agency.gov/data?
      2. How will agency.gov/open, /developer, and /data pages work together?
      3. What is the relationship of the /data page and public data listing to Data.gov, and how will this impact current Data.gov processes?
      4. Are redirects allowed for /data pages?
      5. What options exist for hosting the /data.json file specifically at agency.gov/data.json?
      6. How do I get started building this /data file?
      7. How should I manage this /data file?
      8. What formats are required/recommended for the agency.gov/data file?
    7. Agency participation with Open Data
      1. What are some of the ways that agencies can become more involved with Open Data?
    8. Scope
      1. How should agencies prioritize making improvements to existing systems and data?
      2. Which agencies are required to implement this policy?
    9. Timeline
      1. How long do agencies have to implement the policy?
    10. National Information Exchange Model (NIEM)
      1. What is the relationship between NIEM and the efforts underway for the Digital Government Strategy, The Open Data Policy, and Data.gov?
      2. What is NIEM?
      3. Has the NIEM community embraced the DGS/ODP direction?
      4. Does NIEM conform to the DGS/ODP requirements?
  8. Supplemental Guidance on the Implementation of M-13-13 “Open Data Policy – Managing Information as an Asset”
    1. I. Introduction
    2. II. Policy Requirements
      1. A. Create and Maintain an Enterprise Data Inventory
        1. Purpose
        2. Framework to Create and Maintain the Enterprise Data Inventory: Expand, Enrich, Open
        3. Minimum Requirements to Create and Maintain an Enterprise Data Inventory
          1. Develop and Submit to OMB an Inventory Schedule (by November 1, 2013)
          2. Create an Enterprise Data Inventory (by November 1, 2013)
          3. Maintain the Enterprise Data Inventory (ongoing after November 1, 2013)
          4. Tools and Resources on Project Open Data
      2. B. Create and Maintain a Public Data Listing
        1. Purpose
        2. Minimum Requirements to Create and Maintain a Public Data Listing
      3. C. Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
        1. Purpose
        2. Minimum Requirements to Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release
      4. D. Document if Data Cannot be Released
        1. Purpose
        2. Minimum Requirements to Document if Data Cannot be Released
      5. E. Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        1. Purpose
        2. Minimum Requirements to Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release
        3. Tools and Resources on Project Open Data
      6. III. Summary of Agency Actions and Reporting Requirements
    3. Appendix
      1. Enterprise Data Inventory Enrichment Examples
    4. Footnotes
      1. 1
      2. [^2]
      3. [^3]
      4. [^4]
      5. [^5]
      6. [^6]
      7. [^7]
      8. [^8]
      9. [^9]
      10. [^10]
      11. [^11]
      12. [^12]
      13. [^13]
      14. [^14]
      15. [^15]
      16. [^16]
      17. [^17]
      18. [^18]
      19. [^19]
      20. [^20]
      21. [^21]
      22. [^22]
      23. [^23]
      24. [^24]
      25. [^25]
      26. [^26]
      27. [^27]
      28. [^28]
      29. [^29]
      30. [^30]
      31. [^31]
  9. GeoNames
    1. Info
    2. Free Gazetteer Data
    3. Free Postal Code Data
    4. Premium Data
  10. Federal Enterprise Architecture (FEA)
    1. Guidance
    2. FEA Reference Models
    3. Historical Information
    4. Management Tools
    5. Communities
    6. Success Stories
  11. Performance Reference Model
    1. Introduction
    2. Taxonomy Structure Description
    3. Description of columns
  12. NEXT

Story

Slides

Open Data Policy Implementation: It Takes a Data Science Team

A colleague asked for my take on the new OMB Open Data Policy Implementation Guidance and I provided it (see my Research Notes).

Essentially, OMB has established a new Web Site, specifically Open Data Git Hub, to do this instead of the OMB MAX, which is very similar to the way I use MindTouchand Spotfire.

GitHub is a social coding platform allowing developers to publicly or privately build code repositories and interact with other developers around these repositories–providing the ability to download or fork a repository, as well as contribute back, resulting in a collaborative environment for software development.

Several month ago when the Open Data Policy was announced, I did an An Open Data Policy Result story showing how it could be implemented using their new content with an Open Government Data Data Science Team as follows:

  • Steven VanRoekel - Federal CIO - Directs the Digital Government Strategy
  • Jeanne Holm - Data.gov Evangelist - Evangelizes the Availability of the Data
  • Gannon Dick - Data Preparation - Prepares the Data for Analysis
  • Brand Niemann - Data Scientist - Provides the Data (Catalog and Results) in a Data Platform

Now I am going to do the same thing with their new content and some actual data that I found in it (GeoNames and Federal Enterprise Architecture (FEA)). I am pleased to note that the implementation guidance at least inclides sample language for a Chief Data Officer position description. I reformatted the new content to be compliant with their Digital Government Strategy requirements and the functionality provided by the new GitHub so all of this is done in one place with content analytics.

GeoNames consists of the following:

The Federal Enterprice Architecture consists of the following:

I was able to read a total of 11 files all at once into a 472 MB Spotfire file and constructed visualizations.

The data sets dimensions are as follows:

Data Set Name Rows Columns Size (MB or KB)
fea_brmv3_wdefinitions_20120622_final.xls 206 3 39 KB
prm_2012_agencystrategicobjectivesv3.txt 412 10 290 KB
Free Postal Code Data 900,456 12 68 MB
Free Gazetteer Data 8,514,155 19 1 GB
Premium Data: Airports 100 14 10 KB
Premium Data: boundingbox 100 5 8 KB
Premium Data: spot_city 100 5 4 KB
Premium Data: Unlocode-geonameid 100 4 3 KB
Premium Data: countrysubdivision 100 8 4 KB
Premium Data: dependencies 8 8 1 KB
Premium Data: readme-premiumfiles 44 4 3 KB
ESRI World Country Shape File 263 8 2 MB

An ESRI Would Country Shape File was imported into Spotfire to provide a base map for visualizing the Postal and Gazeteer data sets.

The visulaizations are shown below for:

  • Cover Page: Free Postal and Gazeteer data sets (Find a postal code by country code and location)
  • Premium GeoNames (See samples of 6 data sets you subscribe to)
  • Federal Enterprise Architecture (Look up PRM goals codes by Agency)
  • ESRI World Country Shape File (See locations by country)

This Open Data Policy Implementation took only several hours to build and provides more functionality than the Open Data Git Hub.

Slides

Slide 1 Cover Page

ProjectOpenData-SpotfireSlide1.png

Slide 2 Premium GeoNames

ProjectOpenData-SpotfireSlide2.png

Slide 3 FEA

ProjectOpenData-SpotfireSlide3.png

Slide 4 ESRI World Country Shape File

ProjectOpenData-SpotfireSlide4.png

 

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Research Notes

Information Week Government asked: Could I get your take on the new Next.Data.gov site:

"But in many respects, the new site is also likely to disappoint die-hard data users as being not much more than a shiny new showroom attached to the same old government data warehouse, a warehouse still in need of operating improvements and accessible data.“
http://www.informationweek.com/gover...look/240158634

My Comment: It does disappoint and using the following click trail as an example:

Start at: http://next.data.gov/
pick a Community like Safety: http://next.data.gov/safety
pick Resources: http://next.data.gov/safety/safety-resources/
pick the National Map: http://nationalmap.gov/viewer.html
then Click here to go to The National Map Viewer
and Download Platform!: http://viewer.nationalmap.gov/viewer/
and you finally get to the data and its display

Bottom line: This is yet another new interface to the old Data.gov interface that eventually takes you (if you are lucky enough to find it) to where the actual data has been for years!
http://www.informationweek.com/gover...ook/240158634#

My current efforts for OMB are just starting:
http://semanticommunity.info/Data_Science/Free_Data_Visualization_and_Analysis_Tools

Information Week Government asked: Noticed OMB issued some new open data policies.

Would you give me your take on this?  And also, I'm need to round out my sources on gov open data practices.  Besides Jeanne Holmes who else do you respect in this space?

My take is in a story I did several month ago when this was announced:http://semanticommunity.info/An_Open_Data_Policy#Story

They need Data Science Teams in each agency to implement this!

Josh Taubere: tauberer@govtrack.us has written a respected book: http://opengovdata.io/

They also had lots of good people submit statements: http://semanticommunity.info/An_Open_Data_Policy#Written_Statements

Best regards, Brand

P.S. OMB offered me a job as a Data Scientist recently and I did the following to audition:

http://semanticommunity.info/Data_Science/Free_Data_Visualization_and_Analysis_Tools

We are doing a conference on this in September: http://www.afei.org/events/3A03/Pages/default.aspx

Thanks very much.  So are you starting work w/ OMB soon?

Waiting to hear (Smilie Face)

Good luck!!!

Project Open Data

Source: http://project-open-data.github.io/

Open Data Policy — Managing Information as an Asset

1. Background

Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. Managing this data as an asset and making it available, discoverable, and usable – in a word, open– not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life.

For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Now, weather and mapping tools are ubiquitous and help everyday Americans navigate their lives.

The ultimate value of data can often not be predicted. That’s why the U.S. Government released a policythat instructs agencies to manage their data, and information more generally, as an asset from the start and, wherever possible, release it to the public in a way that makes it open, discoverable, and usable.

The White House developed Project Open Data – this collection of code, tools, and case studies – to help agencies adopt the Open Data Policy and unlock the potential of government data. Project Open Data will evolve over time as a community resource to facilitate broader adoption of open data practices in government. Anyone – government employees, contractors, developers, the general public – can view and contribute. Learn more about Project Open Data Governance and dive right in and help to build a better world through the power of open data.


2. Definitions

This section is a list of definitions and principles used to guide the project.

2-1 Open Data Principles - The set of open data principles.

2-2 Standards, Specifications, and Formats - Standards, specifications, and formats supporting open data objectives.

2-3 Open Data Glossary - The glossary of open data terms. My Note: See Below

2-4 Open Licenses - The definition for open licenses.

2-5 Common Core Metadata - The schema used to describe datasets, APIs, and published data at agency.gov/data.


3. Implementation Guidance

Implementation guidance for open data practices.

3-1 U.S. Government Policy on Open Data - Full text of the memorandum.

3-2 Implementation Guide - Official OMB implementation guidance for each step of implementing the policy. My Note: See Below

3-3 Public Data Listing - The specific guidance for publishing the Open Data Catalog at the agency.gov/data page.

3-4 Frequently Asked Questions - A growing list of common questions and answers to facilitate adoption of open data projects. My Note: See Below

3-5 Open Data Cross Priority (CAP) Goal - Information on the development of the Open Data CAP goal as required in the Open Data Execuitve Order.


4. Tools

This section is a list of ready-to-use solutions or tools that will help agencies jump-start their open efforts. These are real, implementable, coded solutions that were developed to significantly reduce the barrier to implementing open data at your agency. Many of these tools are hosted at Labs.Data.gov and developers are encouraged to contribute improvements to them and contribute other tools which help us implement the spirit of Project Open Data.

4-1 Database to API - Dynamically generate RESTful APIs from the contents of a database table. Provides JSON, XML, and HTML. Supports most popular databases. - Hosted

4-2 CSV to API - Dynamically generate RESTful APIs from static CSVs. Provides JSON, XML, and HTML. - Hosted

4-3 Spatial Search - A RESTful API that allows the user to query geographic entities by latitude and longitude, and extract data.

4-4 Kickstart - A WordPress plugin to help agencies kickstart their open data efforts by allowing citizens to browse existing datasets and vote for suggested priorities.

4-5 PDF Filler - PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empower web developers to use browser-based forms and modern web standards to facilitate the collection of information. - Hosted

4-6 Catalog Generator - Multi-format tool to generate and maintain agency.gov/data catalog files. - Hosted

4-7 JSON Validator - Validation tool to confirm the formatting of agency.gov/data catalog files. - Hosted

4-8 API Sandbox - Interactive API documentation systems.

4-9 CFPB Project Qu - The CFPB’s in-progress data publishing platform, created to serve public data sets.

4-10 HMDA Tools - Lightweight tools to make importing and analyzing Home Mortgage Disclosure Act data easier.

4-11 ESRI2Open - A tool which converts spatial and non-spatial data form ESRI only formats to the Open Data formats, CSV, JSON, or GeoJSON, making them more a part of the WWW ecology.

4-12 ckanext-datajson - A CKAN extension to generate agency.gov/data.json catalog files.

4-13 DKAN - An open data portal modeled on CKAN. DKAN is a stand alone Drupal distribution that allows anyone to spin up an open data portal in minutes as well as two modules, DKAN Dataset and DKAN Datastore, that can be added to existing Drupal sites to add data portal functionality to an exist Drupal site.

4-14 DataVizWiz - A Drupal module that provides a fast way to get data vizualizations online.


5. Resources

This section contains programmatic tools, resources, and/or checklists to help programs determine open data requirements.

5-1 Metadata Resources - Resources to provide guidance and assistance for each aspect of creating and maintaining agency.gov/data catalog files.

5-2 Business Case for Open Data - Overview of the benefits associated with open data.

5-3 General Workflows for Open Data Projects - A comprehensive overview of the steps involved in open data projects and their associated benefits.

5-4 Open License Examples - Potential licenses for data and content.

5-5 Chief Data Officer Material - Sample language for a Chief Data Officer position description.

5-6 API Basics - Introductory resources for understanding application programming interfaces (APIs).

5-7 Data Release Safeguard Checklist - Checklist to enable the safe and secure release of data.

5-8 Digital PII Checklist - Tool to assist agencies identify personally identifiable information in data.

5-9 Applying the Open Data Policy to Federal Awards: FAQ - Frequently asked questions for contracting officers, grant professionals and the federal acquisitions community on applying the Open Data Policy to federal awards.


6. Case Studies

Case studies of novel or best practices from agencies who are leading in open data help others understand the challenges and opportunities for success.

6-1 Department of Labor API Program - A department perspective on developing APIs for general use and, in particular, building the case for an ecosystem of users by developing SDKs.

6-2 Department of Transportation Enterprise Data Inventory - A review of DOT’s strategy and policy when creating a robust data inventory program.

6-3 Disaster Assistance Program Coordination - The coordinated campaign led by FEMA has integrated a successful data exchange among 16 agencies to coordinate an important public service.

6-4 Environmental Protection Agency Central Data Exchange - The agency’s data exchange provides a model for programs that seek to coordinate the flow of data among industry, state, local, and tribal entities.

6-5 FederalRegister.gov API - A core government program update that has grown into an important public service.

6-6 National Broadband Map - The National Broadband Map, a case study on open innovation for national policy. Produced by the Wilson Center.

6-7 National Renewable Energy Laboratory API program - An agency perspective on developing APIs for general use and in particular building the case for the internal re-use of the resources.

6-8 USAID Crowdsourcing to Open Data - A case study that shows how USAID invited the “crowd” to clean and geocode a USAID dataset in order to open and map the data.

For Developers: View all appendices (and source)

Open Data Glossary

Source: http://project-open-data.github.io/glossary/

his section contains explanations of common terms referenced in Project Open Data and the Open Data Policy.

API

An application programming interface, which is a set of definitions of the ways one piece of computer software communicates with another. It is a method of achieving abstraction, usually (but not necessarily) between higher-level and lower-level software. —source

API Analytics

Rate limiting will be part of any API platform, without some sort of usage log and analytics showing developers where they stand, the rate limits will cause nothing but frustration. Clearly show developers where they are at with daily, weekly or monthly API usage and provide proper relief valves allowing them to scale their usage properly. —source

API Documentation

Quality API documentation is the gateway to a successful API. API documentation needs to be complete, yet simple–a very difficult balance to achieve. This balance takes work and will take the work of more than one individual on an API development team to make happen.

API documentation can be written by developers of the API, but additional edits should be made by developers who were not responsible for deploying the API. As a developer, it’s easy to overlook parameters and other details that developers have made assumptions about. —source

Application Library

Complete, functioning applications built on an API is the end goal of any API owner. Make sure and showcase all applications that are built on an API using an application showcase or directory. App showcases are a great way to showcase not just applications built by the API owner, but also showcase the successful integrations of ecosystem partners and individual developers. —source

Basic Auth

Basic Auth is a way for a web browser or application to provide credentials in the form of a username and password. Because Basic Auth is integrated into HTTP protocol it is the easiest way for users to authenticate with a RESTful API.

Basic Auth is easily integrated, however if SSL is not used, the username and password are passed in plain text and can be easily intercepted on the open Internet. —source

Catalog

A catalog is a collection of datasets or web services. —source

Code Library

Working code samples in all the top programming languages are common place in the most successful APIs. Documentation will describe in a general way, how to use an API, but code samples will speak in the specific language of developers. —source

Content API

A web service that provides dynamic access to the page content of a website, includes the title, body, and body elements of individual pages. Such an API often but not always functions atop a Content Management System. —source

CSV

A comma separated values (CSV) file is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, and each field belongs to one table column. CSV files are often used for moving tabular data between two different computer programs (like moving between a database program and a spreadsheet program). —source

Data

A value or set of values representing a specific concept or concepts. Data become “information” when analyzed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary depending on its context. Data includes all data. It includes, but is not limited to, 1) geospatial data 2) unstructured data, 3) structured data, etc. —source

/Data page

A hub for data discovery which provides a common location that lists and links to an organization’s datasets. Such a hub is often located at www.example.com/data. —source

Dataset

A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column’s variable. A dataset may also present information in a variety of non-tabular formats, such as an extensible mark-up language (XML) file, a geospatial data file, or an image file, etc. —source

/Developer page

A hub for API discovery which provides a common location where an organization’s APIs and their associated documentation. Such a hub is often located at www.example.com/developer. —source

Database

A collection of data stored according to a schema and manipulated according to the rules set out in one Data Modelling Facility. —source

Endpoint

An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format. —source

Error Response Code

Errors are an inevitable part of API integration, and providing not only a robust set of clear and meaningful API error response codes, but a clear listing of these codes for developers to follow and learn from is essential.

API errors are directly related to frustration during developer integration, the more friendlier and meaningful they are, the greater the chance a developer will move forward after encountering an error. Put a lot of consideration into your error responses and the documentation that educates developers. —source

GitHub

GitHub is a social coding platform allowing developers to publicly or privately build code repositories and interact with other developers around these repositories–providing the ability to download or fork a repository, as well as contribute back, resulting in a collaborative environment for software development. —source

Hackathon

An event in which computer programmers and others in the field of software development, like graphic designers, interface designers, project managers and computational philologists, collaborate intensively on software projects. Occasionally, there is a hardware component as well. Hackathons typically last between a day and a week in length. Some hackathons are intended simply for educational or social purposes, although in many cases the goal is to create usable software. Hackathons tend to have a specific focus, which can include the programming language used, the operating system, an application, an API, the subject, or the demographic group of the programmers. In other cases, there is no restriction on the type of software being created. —source

Information

Information, as defined in OMB Circular A-130, means any communication or representation of knowledge such as facts, data, or opinions in any medium or form, including textual, numerical, graphic, cartographic, narrative, or audiovisual forms. —source

Information Life Cycle

Information life cycle, as defined in OMB Circular A-130, means the stages through which information passes, typically characterized as creation or collection, processing, dissemination, use, storage, and disposition. —source

Information System

Information system, as defined in OMB Circular A-130, means a discrete set of information resources organized for the collection, processing, maintenance, transmission, and dissemination of information, in accordance with defined procedures, whether automated or manual. —source

Information System Life Cycle

Information system life cycle, as defined in OMB Circular A-130, means the phases through which an information system passes, typically characterized as initiation, development, operation, and termination. —source

JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. —source

JSONP

JSONP or “JSON with padding” is a JSON extension wherein the name of a callback function is specified as an input argument of the underlying JSON call itself. JSONP makes use of runtime script tag injection. —source

Machine-Readable File

Refers to information or data that is in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost. —source

Metadata

To facilitate common understanding, a number of characteristics, or attributes, of data are defined. These characteristics of data are known as “metadata”, that is, “data that describes data.” For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, its relationship to other data, and how it should be labeled. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and access management policy. Each of the types of data described above has a corresponding set of metadata. Two of the many metadata standards are the Dublin Core Metadata Initiative (DCMI) and Department of Defense Discovery Metadata Standard (DDMS). The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails. Metadata includes data element names (such as Organization Name, Address, etc.), their definition, and their format (numeric, date, text, etc.). In contrast, data is the actual data values such as the “US Patent and Trade Office” or the “Social Security Administration” for the metadata called “Organization Name”. Metadata may include metrics about an organization’s data including its data quality (accuracy, completeness, etc.). —source

OAuth

An open standard for authorization. It allows users to share their private resources stored on one site with another site without having to hand out their credentials, typically username and password. —source

Open Source Software

Computer software that is available in source code form: the source code and certain other rights normally reserved for copyright holders are provided under an open-source license that permits users to study, change, improve and at times also to distribute the software.

Open source software is very often developed in a public, collaborative manner. Open source software is the most prominent example of open source development and often compared to (technically defined) user-generated content or (legally defined) open content movements. —source

Open Standard

A standard developed or adopted by voluntary consensus standards bodies, both domestic and international. These standards include provisions requiring that owners of relevant intellectual property have agreed to make that intellectual property available on a non-discriminatory, royalty-free or reasonable royalty basis to all interested parties. —source

Parameter

A special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. The semantics for how parameters can be declared and how the arguments get passed to the parameters of subroutines are defined by the language, but the details of how this is represented in any particular computer system depend on the calling conventions of that system. —source

RDF

Resource Description Framework - A family of specifications for a metadata model. The RDF family of specifications is maintained by the World Wide Web Consortium (W3C). The RDF metadata model is based upon the idea of making statements about resources in the form of a subject-predicate-object expression…and is a major component in what is proposed by the W3C’s Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and utilize metadata about the vast resources of the Web, in turn enabling users to deal with those resources with greater efficiency and certainty. RDF’s simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity. —source

REST

A style of software architecture for distributed systems such as the World Wide Web. REST has emerged as a predominant Web service design model. REST facilitates the transaction between web servers by allowing loose coupling between different services. REST is less strongly typed than its counterpart, SOAP. The REST language is based on the use of nouns and verbs, and has an emphasis on readability. Unlike SOAP, REST does not require XML parsing and does not require a message header to and from a service provider. This ultimately uses less bandwidth. —source

RSS

A family of web feed formats (often dubbed Really Simple Syndication) used to publish frequently updated works — such as blog entries, news headlines, audio, and video — in a standardized format. An RSS document (which is called a “feed,” “web feed,” or “channel”) includes full or summarized text, plus metadata such as publishing dates and authorship. —source

Schema

An XML schema defines the structure of an XML document. An XML schema defines things such as which data elements and attributes can appear in a document; how the data elements relate to one another; whether an element is empty or can include text; which types of data are allowed for specific data elements and attributes; and what the default and fixed values are for elements and attributes. A schema is also a description of the data represented within a database. The format of the description varies but includes a table layout for a relational database or an entity-relationship diagram. It is method for specifying constraints on XML documents. —source

SDK

Software Development Kits (SDK) are the next step in providing code for developers, after basic code samples. SDKs are more complete code libraries that usually include authentication and production ready objects, that developers can use after they are more familiar with an API and are ready for integration.

Just like with code samples, SDKs should be provided in as many common programming languages as possible. Code samples will help developers understand an API, while SDKs will actually facilitate their integration of an API into their application. When providing SDKs, consider a software licensing that gives your developers as much flexibility as possible in their commercial products. —source

Service-Oriented-Architecture

Expresses a software architectural concept that defines the use of services to support the requirements of software users. In a SOA environment, nodes on a network make resources available to other participants in the network as independent services that the participants access in a standardized way. Most definitions of SOA identify the use of Web services (using SOAP and WSDL) in its implementation. However, one can implement SOA using any service-based technology with loose coupling among interacting software agents. —source

SOAP

SOAP (Simple Object Access Protocol) is a message-based protocol based on XML for accessing services on the Web. It employs XML syntax to send text commands across the Internet using HTTP. SOAP is similar in purpose to the DCOM and CORBA distributed object systems, but is more lightweight and less programming-intensive. Because of its simple exchange mechanism, SOAP can also be used to implement a messaging system. —source

Swagger

A specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services. The overarching goal of Swagger is to enable client and documentation systems to update at the same pace as the server. The documentation of methods, parameters and models are tightly integrated into the server code, allowing APIs to always stay in sync. —source

Terms of Service

Terms of Service provide a legal framework for developers to operate within. They set the stage for the business development relationships that will occur within an API ecosystem. Terms of Service should protect the API owner’s company, assets and brand, but should also provide assurances for developers who are building businesses on top of an API. —source

TSV

A simple text format for a database table. Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. It is a form of the more general delimiter-separated values format. —source

Unstructured Data

Data that is more free-form, such as multimedia files, images, sound files, or unstructured text. Unstructured data does not necessarily follow any format or hierarchical sequence, nor does it follow any relational rules. Unstructured data refers to masses of (usually) computerized information which do not have a data structure which is easily readable by a machine. Examples of unstructured data may include audio, video and unstructured text such as the body of an email or word processor document. Data mining techniques are used to find patterns in, or otherwise interpret, this information. Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data – commonly appearing in e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations, and Web pages (“The Problem with Unstructured Data.”) —source

Web Service

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.—source

WSDL

An XML-based language (Web Services Description Language) used to describe the services a business offers and to provide a way for individuals and other businesses to access those services electronically. —source

XML

Extensible Markup Language (XML) is a flexible language for creating common information formats and sharing both the format and content of data over the Internet and elsewhere. XML is a formatting language recommended by the World Wide Web Consortium (W3C). —source

Frequently Asked Questions

Source: http://project-open-data.github.io/faq/

This section is a growing list of common questions and answers to support agencies when implementing the Open Data Policy.

Project Open Data

What problem does this solve?

Technology moves much faster than policy ever could. Often when writing policy for technology, agencies are stuck with outdated methods as soon as they publish new policies.

How does it solve that problem?

This project is meant to be a living document, so that collaboration in the open data ecosystem is fostered, and the continual update of technology pieces that affect update can happen at a more rapid pace.

Where do I come in?

Help the United States Government make its Open Data Policy better by collaborating. Please suggest enhancements by editing the content here, or add tools that help anyone open data (See “How can I contribute?” below).

How can I contribute?

This project constitutes a collaborative work (“open source”). Federal employees and members of the public are encouraged to improve the project by contributing. This can be done in two ways:

Easy
  1. Click the “Improve this content” button in the top right corner of any page.
  2. Make changes as you would normally.
  3. Click “Submit.”
  4. Your change should appear, once approved.

Note: You will need to create a free GitHub account if you do not already have one.

Advanced
  1. Configure Git by using this basic tutorial or by downloading the GitHub for Mac/GitHub for Windowsand optionally, a Markdown editor like Markdownpad (for Windows) or Mou (for Mac).
  2. Fork the project.
  3. Make changes as you would normally using the tools installed in step #1.
  4. Push the changes back to your fork.
  5. Submit a pull request to this repository.
  6. Your change should appear once it’s approved.

Note: All contributors retain the original copyright to their contribution, but by contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the license(s) under which this project is distributed.

Can I use the project’s content or source code elsewhere?

The project as originally published constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. Subsequent contributions by members of the public, however, retain their original copyright.

In order to better facilitate collaboration, the content of this project is licensed under the Creative Commons 3.0 License, and the underlying source code used to format and display that content is licensed under the MIT License.

Who can participate in Project Open Data?

Anyone – Federal employees, contractors, developers, the general public – can view and contribute to Project Open Data.

Are my interactions to this project subject to any special privacy considerations?

Comments, pull requests, and any other messages received through this repository may be subject to thePresidential Records Act, and may be archived. Learn more at http://whitehouse.gov/privacy.

Who is in charge of Project Open Data?

Ultimately? You. While the White House founded and continues to oversee the project, Project Open Data is a collaborative work — commonly known as “open source” — and is supported by the efforts of an entire community. See the “how to contribute” section above to learn more.

At the onset, the General Services Administration is here to provide daily oversight and support, but over time, it is our vision that contributors both inside and outside of government can be empowered to take on additional leadership roles.

Can I create a new page?

Yes! Simply follow the “advanced” instructions above to submit a pull request.

How long will I have to wait to get a response to my suggested change (i.e., pull request)?

Release cycles vary from repo to repo. See the README file within the repo where you submitted a pull request to see how often code pushes and updates are done.

IRM Strategic Plans

What is an IRM Strategic Plan?

Agencies’ Information Resource Management (IRM) plans are comprehensive strategic documents for information management that are intended to align with the agency’s Strategic Plan. IRM plans should provide a description of how information resource management activities help accomplish agency missions, and ensure that related management decisions are integrated with organizational planning, budget, procurement, financial management, human resources management, and program decisions.

How do the IRM plans relate to the open data policy?

In 2012, OMB established PortfolioStat accountability sessions to engage directly with agency leadership to assess the maturity and effectiveness of current IT management practices and address management opportunities and challenges. As part of the annual PortfolioStat process, agencies must update their IRM Strategic Plans to describe how they are meeting new and existing information life cycle management requirements. Specifically, agencies must describe how they have institutionalized and operationalized the interoperability and openness requirements in this Memorandum into their core processes across all applicable agency programs and stakeholders. The FY13 OMB PortfolioStat Guidance was issued on March 27, 2013.

Machine Readable and Open Formats

Does PDF meet the “machine readable and open format” requirement?

While ISO 32000 is an open standard, the Portable Document Format (PDF) does not achieve the same level of openness as CSV, XML, JSON, and other generic formats.

Metadata

What is the relationship of the metadata standard (specifically) to NIEM, ISE, FGDC, and other existing (especially official) government data standards?

The common core metadata schema is based on existing vocabularies and easily mapped to NIEM, Information Sharing Environment, and FGDC.

What is a “persistent identifier”?

A persistent identifier is a unique label assigned to digital objects or data files that is managed and kept up to date over a defined time period (e.g., Unique Investment Identifiers).

Who established the common core metadata schema?

The core metadata schema was the result of recommendations from a government-wide Metadata Working Group at Data.gov combined with research of existing public schemas for data catalogs. Most of the elements trace their roots to the Dublin Core Library.

How can I recommend changes and improvements to the metadata schema?

Submit a pull request for the metadata schema.

Can I extend the metadata schema beyond the terms specified in the common core metadata schema?

Yes, if your data management process includes rich metadata specific to the mission of your agency or the Line of Business your agency participates, publishing additional metadata that makes your data more useful to the public is welcomed and encouraged. Note that Data.gov will be harvesting only the metadata in this published schema unless specific arrangements are in place (e.g. geospatial FGDC/ISO).

Security, Privacy and Data Quality

Who is responsible for ensuring that datasets published in the agency.gov/data page (and subsequently Data.gov) meet each agency’s requirements for security and privacy and quality?

Each agency is responsible for all data made public.

How can I contact the Data.gov staff for assistance in conducting mosaic effect reviews?

For general questions about Data.gov, please contact http://www.data.gov/contact-us. For specific information about the mosaic effect, please contact the Data.gov PMO at GSA.

Public Data Listing

What is the value to the government in placing metadata at agency.gov/data?

Having the metadata available at the agency level provides agencies with a self-managed publishing capability. In addition, having the metadata in a machine-readable format opens the possibility that major search engines will index these metadata in a manner similar to site maps and allow the public to discover public data across the government using a search tool of their choice.

How will agency.gov/open, /developer, and /data pages work together?

The agency.gov/open page contains informational regards an agencies contributions to Open Government, while the /developer and /data pages pertain to APIs and Open Data. All three pages contribute to an open and transparent government in the United States.

What is the relationship of the /data page and public data listing to Data.gov, and how will this impact current Data.gov processes?

In the near term, Data.gov will continue its current dataset publishing process. As agencies deploy agency.gov/data pages, the publishing process will become a harvesting of metadata from the agencies web site.

Are redirects allowed for /data pages?

No, the files should be located at agency.gov/data web space. Each agency should populate files named agency.gov/data.json, agency.gov/data.html and agency.gov/data.xml.

What options exist for hosting the /data.json file specifically at agency.gov/data.json?

  1. For websites that are composed of static HTML, simply host the data.json file at the designated location.
  2. If needed, one may also host the file by using /data.json/index.html to provide the same functionality.
  3. Sites that utilize WordPress may modify and employ the open-sourced Datafiles WordPress Plugin.
  4. Sites that utilize Drupal may modify and employ the open-sourced Digital Strategy Drupal Module.

How do I get started building this /data file?

Data.gov will (when possible) help agencies get started by creating a /data file for each agency containing the metadata in the correct syntax. The agency will then begin to manage that file for future publishing of datasets.

How should I manage this /data file?

A wide variety of tools are available to manage a data catalog, whether public-facing or for internal data managements. The records of metadata in the file can be managed by databases, spreadsheets, or even text editors. Data management systems should be able to export the metadata either in the desired format or in one which may be simply mapped with tools.

What formats are required/recommended for the agency.gov/data file?

There are several syntaxes that may be used when publishing the data file. The syntax that is required to make the data readily available to developers is JSON (JavaScript Object Notation). It is recommended that agencies also create a data.html file and use RDFa Lite (Resource Description Framework) to mark-up the metadata using the common core metadata schema. The RDFa Lite file can be easily consumed by major search engines and applications and make you data easier to find by the public. A third alternative for populating your metadata file is XML (eXtensible Markup Language). Agencies are encouraged to maintain all three version of the metadata file. Tools are available to transform any instance of the file into the alternative formats.

Agency participation with Open Data

What are some of the ways that agencies can become more involved with Open Data?

Having a contact point at the agency who can answer questions and receive comments about published data is extremely important to making your data more open and valuable to the public. This contact point can be centralized at the agency level, but it’s extremely value to have someone close to the source of the data who understands it well enough to help the public take full advantage of it..

Scope

How should agencies prioritize making improvements to existing systems and data?

Agencies should regularly add to and improve the entries in their data catalog, as well as ensure continuity of access to the data by involving primary users in the changes.

Which agencies are required to implement this policy?

All executive agencies.

Timeline

How long do agencies have to implement the policy?

Agencies are required to implement the Open Data Policy within six months.

National Information Exchange Model (NIEM)

What is the relationship between NIEM and the efforts underway for the Digital Government Strategy, The Open Data Policy, and Data.gov?

Each of these initiatives has a discreet, targeted focus, but all are aimed at increasing access and use of government data. Data.gov has provided a central place to find data and applications for publicly releasable information. New applications and services to better serve citizens have been produced as a result in the increase of information made available through Data.gov. The DGS/ODP policy establishes a framework to help institutionalize the principles of effective information management at each stage of the information’s life cycle. The framework can help agencies build information systems and processes in a way that increases information and system interoperability, openness, and safeguarding – mutually reinforcing activities that help to promote data discoverability and usability. NIEM, as a government-wide program provides tools to enhance the way many communities build standardized exchanges to increase mission performance. NIEM fully aligns to the DGS/ODP policy and can be seen one of the tools for implementation.

What is NIEM?

NIEM provides a commonly understood way for various organizations to connect data that improves government decision making for the greater good. By making it possible for organizations to share critical data, NIEM empowers people to make informed decisions that improve efficiency and advance and fulfill organizational missions.

NIEM is not a standard, database, software, or the actual exchange of information. Rather, NIEM provides the community of users, tools, common terminology, governance, methodologies, and support that enables the creation of standards. As a result, organizations can “speak the same language” to quickly and efficiently exchange meaningful data.

There are 14 domains or communities established within NIEM. These are the Biometrics, CBRN (Chemical, Biological, Radiological, Nuclear), Children, Youth, and Family Services, Cyber, Emergency Management, Health, Human Services, Immigration, Infrastructure Protection, Intelligence, International Trade, Justice, Maritime, and Screening Communities.

Additional tools and toolkits can be found at NIEM.gov. Any tools relevant to the NIEM community may also be registered in the NIEM Tools catalog to ensure reuse across the NIEM community at NIEM.gov.

Has the NIEM community embraced the DGS/ODP direction?

Treating information as a national asset is core to the Open Data Policy and the National Strategy for Information Sharing and Safeguarding. Departments and agencies will need an end-to-end Data Strategy that accommodates both codified in IT governance. Both are aimed at liberating data from the bounds of the application into exposure for unintentional users and uses (as permitted by law and policy). NIEM has become a best-practice implementation of the new National Information Sharing and Safeguarding Strategy, and is fully supportive of the implementation of the Open Data Policy and is positioned to become an early adopter. NIEM provides a common data model, governance, training, tools, technical support services, and an active community.

Does NIEM conform to the DGS/ODP requirements?

NIEM adheres to the DGS/ODP Policy. NIEM Communities use open standards such as XML / XSD, and UML to assist in the development of standardized ways of exchanging information across and between government agencies. NIEM is vendor and product neutral. The adoption of the UML profile will allow additional open standards implementations of NIEM based exchanges as supported by community requirements. Additionally, some NIEM Communities submit their NIEM based information exchanges to external standards development organizations to increase industry adoption such as the NIEM Biometrics and NIST, NIEM Radiological / Nuclear and IEC.

Supplemental Guidance on the Implementation of M-13-13 “Open Data Policy – Managing Information as an Asset”

Source: http://project-open-data.github.io/implementation-guide/

I. Introduction

The purpose of this guidance is to provide additional clarification and detailed requirements to assist agencies in carrying out the objectives of Executive Order 13642 of May 9, 2013, Making Open and Machine Readable the New Default for Government Information and OMB Memorandum M-13-13 Open Data Policy-Managing Information as an Asset. Specifically, this document focuses on near-term efforts agencies must take to meet the following five initial requirements of M-13-13, which are due November 1, 2013 (six months from publication of M-13-13):

  1. Create and maintain an Enterprise Data Inventory (Inventory)
  2. Create and maintain a Public Data Listing
  3. Create a process to engage with customers to help facilitate and prioritize data release
  4. Document if data cannot be released
  5. Clarify roles and responsibilities for promoting efficient and effective data release

Agencies will establish an open data infrastructure by implementing this guidance and Memorandum M-13-13and taking advantage of the resources provided on Project Open Data. Once established, agencies will continue to evolve the infrastructure by identifying and adding new data assets 1, enriching the description of those data assets through improved metadata, and increasing the amount of data shared with other agencies and the public.

At a minimum, a successful open data infrastructure must: * Provide a robust and usable Enterprise Data Inventory of an agency’s data assets, so that an agency can manage its data as strategic assets, * Incorporate iterative and efficient processes for managing and opening data assets, and * Create the Public Data Listing as a direct output or subset of the Enterprise Data Inventory.

The “access level” categories described in this document are intended to be used for organizational purposes within agencies and to reflect decisions already made in agencies about whether data assets can be made public; simply marking data assets “public” cannot substitute for the analysis necessary to ensure the data can be made public. Agencies are reminded that this underlying data from the inventory may only be released to the public after a full analysis of privacy, confidentiality, security, and other valid restrictions pertinent to law and policy.

This guidance seeks to balance the need to establish clear and meaningful expectations for agencies to meet, while allowing sufficient flexibility on the approach each agency may take to address their own unique needs. This guidance also includes references to other OMB memoranda that relate to the management of information. Agencies should refer to the definitions included in the attachment in OMB Memorandum M-13-13 Open Data Policy-Managing Information as an Asset.

This guidance introduces an Enterprise Data Inventory framework to provide agencies with improved clarity on specific actions to be taken and minimum requirements to be met. It also provides OMB with a rubric by which to evaluate compliance and progress toward the objectives laid out in the Open Data Policy. Following the November 1, 2013 deadline, agencies shall report progress on a quarterly basis, and performance will be tracked through the Open Data Cross-Agency Priority (CAP) Goal. Meeting the requirements of this guidance will ensure agencies are putting in place a basic infrastructure for inventorying, managing, and opening up data to unlock the value created by opening up information resources.

II. Policy Requirements

A. Create and Maintain an Enterprise Data Inventory

Purpose

To develop a clear and comprehensive understanding of what data assets they possess, Federal Agencies are required to create an Enterprise Data Inventory (Inventory) that accounts for all data assets created or collected by the agency. This includes, but is not limited to, data assets used in the agency’s information systems. The Inventory must be enterprise-wide, accounting for data assets across programs [^2] and bureaus [^3], and must use the required common core metadata available on Project Open Data. After creating the Inventory, agencies should continually improve the usefulness of the Inventory by expanding, enriching, and opening the Inventory (concepts described in the framework below).

The objectives of this activity are to: * Build an internal inventory that accounts for data assets used in the agency’ s information systems * Include data assets produced through agency contracts and cooperative agreements, and in some cases agency-funded grants; include data assets associated with, but not limited to, research, program administration, statistical, and financial activities * Indicate if the data may be made publicly available and if currently available * Describe the data with common core metadata available on Project Open Data.

Framework to Create and Maintain the Enterprise Data Inventory: Expand, Enrich, Open

Since agencies have varying levels of visibility into their data assets, the size and maturity of agencies’ Enterprise Data Inventories will differ across agencies. OMB will assess agency progress toward overall maturity of the Enterprise Data Inventory through the maturity areas of “Expand,” “Enrich,” and “Open.”

Expand: Expanding the inventory refers to adding additional data assets to the Inventory. Agencies should develop their own strategy to expand the inventory and break down the work according to agency-defined classes of data [^4]. Agencies should communicate their plans for expanding the Inventory in the Inventory Schedule (described in the minimum requirements). As agencies develop an Inventory Schedule, they may find it helpful to group their data assets into classes of data. The following list provides examples of classes agencies may use as they schedule the expansion of the Inventory: * Agency operating units (for example, bureaus or offices) * Federal Program Inventory on Performance.gov * Common business areas or segments, such as those described in the Business Reference Model or the Budget Function Codes of budget accounts * Agency strategic objectives on Performance.gov and the Performance Reference Model * Types of data from [Data Reference Model] (http://www.whitehouse.gov/omb/e-gov/fea) * Existing listings of certain types of data assets, such as Information Collection Requests (ICR) submitted to OMB under the Paperwork Reduction Act (as listed on reginfo.gov [^5]) and/or files posted on the agency’s public website * Data assets already prioritized by the agency in response to other Administration initiatives [^6] * Primary related IT investments from the Federal IT Dashboard [^7] * Agency-defined prioritizations of data assets * Other classes or criteria

Example ways to evaluate “Expand” maturity: How has the Inventory expanded over time to include additional data assets? What “classes” of data (for example, financial, performance, scientific, regulatory, etc.) have been added or are planned to be added? Are all bureaus and programs represented in the Inventory? If not, what percentage is?*

Enrich: To improve the discoverability, management, and re-usability of data assets, agencies should enrich the Inventory over time by improving the quality of metadata describing each data asset. For example, agencies may:

  • increase the number of keyword tags,
  • clarify descriptions of data, or
  • add additional metadata fields consistent with existing communities of practice or use cases.

Project Open Data provides metadata requirements, additional optional metadata fields, and examples of metadata areas (see Appendix for examples). To improve the management of IT systems through the Inventory, agencies are encouraged to include the Primary Related IT Investment Unique Investment Identifier (UII) as a metadata field. As they work to enrich data assets, agencies should carefully weigh the potential value of efforts to improve data description or increase the number of metadata fields against the potential associated burden. Agencies should work to avoid the risk of duplicative metadata and work toward adopting uniform schema. To that end, agencies should draw on the expertise of existing communities of practice [^8], review standard taxonomies [^9], and coordinate across the government to harmonize definitions when adopting additional metadata fields.

Example ways to evaluate “Enrich” maturity: How has the agency improved the quality of metadata for each record? Are effective keywords and clear language used in data descriptions? Are additional metadata fields applying best practices from Project Open Data? Has the agency developed policies and procedures for populating these fields consistently? Has the agency linked the Inventory to federal IT management by including the Primary Related IT Investment Unique Investment Identifier (UII)?*

Open: Agencies should implement tools and processes that will accelerate the opening of additional valuable data assets by making them public and machine-readable, while ensuring adequate policy, process, and technical safeguards are in place to prevent against the release of sensitive data. Agencies are required to increase the number of public data assets included in the Public Data Listing (described in the next section) over time. Agencies should work toward increasing the ratio of data that are public and machine-readable to data that can be made public as measured in the Inventory.

Example ways to evaluate “Open” maturity: How many releasable data assets have been released in the Public Data Listing? How have more data assets been released in accordance with the “open data” principles over time?*

Minimum Requirements to Create and Maintain an Enterprise Data Inventory
Develop and Submit to OMB an Inventory Schedule (by November 1, 2013)
  • Describe how the agency will ensure that all data assets from each bureau and program in the agency have been identified and accounted for in the Inventory, to the extent practicable, no later than November 1, 2014.
  • Describe how the agency plans to expand, enrich, and open their Inventory each quarter through November 1, 2014 at a minimum; include a summary and milestones in the schedule. [^10]
  • Publish Inventory Schedule on the www.[agency].gov/digitalstrategy page by November 1, 2013. [^11]
Create an Enterprise Data Inventory (by November 1, 2013)
  • Include, at a minimum, all data assets which were posted on Data.gov before August 1, 2013 and additional representative data assets from programs and bureaus.
  • Ensure the Inventory contains one metadata record for each data asset. A data asset can describe a collection of datasets (such as a CSV file for each state).
  • Use common core “required” fields and “required-if-applicable” fields on Project Open Data (includes indicating whether data can be made publicly available).
  • Submit to OMB via MAX Community [^12] the inventory as a single JSON file using the defined schema from Project Open Data. OMB invites agency input on the option of replacing future submission with an API via a discussion on Project Open Data.
Maintain the Enterprise Data Inventory (ongoing after November 1, 2013)
  • Continue to expand, enrich, and open the Inventory on an on-going basis.
  • Update the Inventory Schedule submitted on November 1, 2013 on a quarterly basis on the www.[agency].gov/digitalstrategy page. [^13]
Tools and Resources on Project Open Data
  • Out-of-the-box Inventory Tool: OMB and GSA have provided a data inventory tool (CKAN) that is customized to be compliant with the Open Data Policy out of the box. Customization includes the ability to generate the compliant Public Data Listing directly from the Inventory, as well as integration of the required common core metadata schema. Agencies may choose to install CKAN on their servers or use the centrally hosted tool.
  • Definitions and schema of “common core metadata fields” and selected “extensible metadata fields”
  • The JSON schema for each Inventory’s “JSON Snapshot” as well as a schema generator and validator tools to facilitate agency efforts to create metadata
  • Additional best practices, case studies, and tools

B. Create and Maintain a Public Data Listing

Purpose

To improve the discoverability and usability of data assets, all federal agencies must develop a Public Data Listing, which contains a list of all data assets that are or could be made available to the public. This Public Data Listing, posted at www.[agency].gov/data.json, would typically be a subset of the agency’s Inventory. This will allow the public to view agencies’ open data assets and subsequent progress as additional data assets are published.

Agencies, at their discretion, may choose to include entries for non-public data assets in their Public Data Listings, taking into account guidance in section D. For example, an agency may choose to list data assets with an ‘accessLevel’ of ‘restricted public’ to make the public aware of their existence and the process by which these data may be obtained.

Agencies’ Public Data Listings will be used to dynamically populate the newly renovated Data.gov, the main website to find data assets generated and held by the U.S. Government. Data.gov allows anyone from the public to find, download, and use government data. The upcoming re-launch of Data.gov (currently in beta at next.data.gov) will automatically aggregate the agency-managed Public Data Listings into one centralized location, using the common core metadata standards and tagging to improve the user ability to find and use government data.

The objectives of this activity are to: * List any data assets in the agency’s Enterprise Data Inventory that can be made publicly available * Publish Public Data Listing at www.[agency].gov/data.json * Include data assets produced through agency-funded grants, contracts, and cooperative agreements

Minimum Requirements to Create and Maintain a Public Data Listing

Publish a Public Data Listing (by November 1, 2013) * Include, at a minimum, all data assets where ‘accessLevel’ = ‘public’ [^14] in the Inventory. By design, an agency should be able to filter the Inventory to all entries where ‘accessLevel’ = ‘public’ to easily generate the Public Data Listing. * Publish the Public Data Listing at www.[agency].gov/data.json. * Follow the schema available on Project Open Data. * Include accessURL [^15] link in the data asset’s metadata for all data assets in the Public Data Listing that are already publicly available [^16]. (as opposed to those that could be publicly available).

Tools and Resources on Project Open Data * Schema Generator * CKAN * JSON Validator

C. Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release

Purpose

Identifying and engaging with key data customers to help determine the value of federal data assets can help agencies prioritize those of highest value for quickest release. Data customers include public as well as government stakeholders [^17]. All Federal Agencies will be required to engage public input and reflect on how to incorporate customer feedback into their data management practices. Agencies may develop criteria at their discretion for prioritizing the opening of data assets, accounting for a range of factors, such as the quantity and quality of user demand, internal management priorities, and agency mission relevance. As customer feedback mechanisms and internal prioritization criteria will likely evolve over time and vary across agencies, agencies should share successful innovations in incorporating customer feedback through interagency working groups and Project Open Data to disseminate best practices. Agencies should regularly review the evolving customer feedback and public engagement strategy.

The objectives of this activity are to: * Create a process to engage with customers through www.[agency].gov/data pages and other appropriate channels * Make data available in multiple formats according to customer needs * Help agencies prioritize data release through the Public Data Listing and management efforts to improve data discoverability and usability

Minimum Requirements to Create a Process to Engage With Customers to Help Facilitate and Prioritize Data Release

Establish Customer Feedback Mechanism (by November 1, 2013) * Through the common core metadata requirements, agencies are already required to include a point of contact within each data asset’s metadata listed. * Agencies should create a process to engage with customers on the www.[agency].gov/data page or other appropriate mechanism. If the feedback tool is in an external location, it must be linked to the www.[agency].gov/data page. * Agencies should consider utilizing tools available on Project Open Data, such as the “Kickstart” plug-in, to organize feedback around individual data assets.Describe Customer Feedback Processes (by November 1, 2013) * Update www.[agency].gov/digitalstrategy [^18] page to describe the agency’s process to engage with customers. * Moving forward, agencies should consider updating their customer feedback strategy and reflecting changes on www.[agency].gov/digitalstrategy beyond November 1, 2013.

Tools and Resources on Project Open Data * Data “Kickstart” Plug-in * GSA’s Innovation Center API Resources

D. Document if Data Cannot be Released

Purpose

The Open Data Policy requires agencies to strengthen and develop policies and processes to ensure that only the appropriate data are made available publicly. Agencies should work with their Senior Agency Official for Privacy and other relevant officials to ensure a complete analysis of issues that could preclude public disclosure of information collected or created. If the agency determines the data should not be made publicly available because of law, regulation, or policy or because the data are subject to privacy, confidentiality, security, trade secret, contractual, or other valid restrictions to release, agencies must document the determination in consultation with their Office of General Counsel or equivalent. The agency should designate one of three “access levels” for each data asset listed in the inventory: public, restricted public, and non-public. The descriptions of these categories can be found below and on Project Open Data.

The objectives of this activity are to: * Review information for valid restrictions to public release in order to ensure proper safeguarding of privacy, security, and confidentiality of government information * Document reasons why a data asset or certain components of a data asset should not be made public at this time * Consult with agency’s Senior Agency Official for Privacy and general counsel regarding the barriers identified * Encourage dialogue regarding resources necessary to make more data assets public

As part of an agency’s analysis to assign a general access level to each data asset [^19], agencies should consult section ##III.4 of the OMB Memorandum M-13-13, and Executive Order 13556. Specifically, agencies are required to incorporate the National Institute of Standards and Technology (NIST) Federal Information Processing Standard (FIPS) Publication 199 “Standards for Security Categorization of Federal Information and Information Systems,” which includes guidance and definitions for confidentiality, integrity, and availability. Agencies should also consult with the Controlled Unclassified Information (CUI) program to ensure compliance with CUI requirements, the National Strategy for Information Sharing and Safeguarding and the best practices found in Project Open Data. In addition to complying with the Privacy Act of 1974, the Paperwork Reduction Act, the E-Government Act of 2002, the Federal Information Security Management Act (FISMA), and the Confidential Information Protection and Statistical Efficiency Act (CIPSEA), and other applicable laws, agencies should implement information policies based upon Fair Information Practice Principles, OMB guidance, and NIST guidance on Security and Privacy Controls for Federal Information Systems and Organizations.

  • Public: Data asset is or could be made publicly available to all without restrictions. The accesLevelComment field may be used to provide information on technical or resource barriers to increasing access to that data asset.

  • Restricted Public: Data asset is available under certain use restrictions. One example, among many, is a data asset that can only be made available to select researchers under certain conditions, because the data asset contains sufficient granularity or linkages that make it possible to re-identify individuals, even though the data asset is stripped of Personally Identifiable Information (PII). Another example would be a data asset that contains PII and is made available to select researchers under strong legal protections. This category includes some but not all data assets designated as Controlled Unclassified Information (CUI), consistent with Executive Order 13556. The accessLevelComment field must be filled in with details on how one can obtain access.

  • Non-Public: Data asset is not available to members of the public. This category includes data assets that are only available for internal use by the Federal Government, such as by a single program, single agency, or across multiple agencies. This category might include some but not all data assets designated as Controlled Unclassified Information (CUI), consistent with Executive Order 13556. Some non-public data assets may still potentially be available to other intra-agency operating units and/or other government agencies, as discussed in OMB Memorandum M-11-02: Sharing Data While Protecting Privacy. The accessLevelComment field for non-public datasets must contain an explanation for the reasoning behind why these data cannot be made public.

Minimum Requirements to Document if Data Cannot be Released

Describe Data Publication Process (by November 1, 2013) * Agencies must develop a new process, in consultation with their General Counsel or equivalent, to determine whether data assets have a valid restriction to release. * Agencies must publish a general overview of this process on the www.[agency].gov/digitalstrategy page. Overviews should include information on the actual process by which data is determined to have a valid restriction to release and examples of what kinds of characteristics a data asset has that leads to a determination to not release.

E. Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release

Purpose

Agencies should identify points of contact for the following roles and responsibilities related to managing information as an asset: * Communicating the strategic value of open data to internal stakeholders and the public; * Ensuring that data released to the public are open, as appropriate, and a point of contact is designated to assist open data use and to respond to complaints about adherence to open data requirements; * Engaging entrepreneurs and innovators in the private and nonprofit sectors to encourage and facilitate the use of agency data to build applications and services; * Working with agency components to scale best practices from bureaus and offices that excel in open data practices across the enterprise; * Working with the agency’s Senior Agency Official for Privacy (SAOP) or other relevant officials to ensure that privacy and confidentiality are fully protected; and * Working with the Chief Information Security Officer (CISO) and mission owners to assess overall organizational risk, based on the impact of releasing potentially sensitive data, and make a risk-based determination.

Minimum Requirements to Clarify Roles and Responsibilities for Promoting Efficient and Effective Data Release

Report the point of contact for each of these roles and responsibilities via the E-Gov IDC [^20] by November 1, 2013

Tools and Resources on Project Open Data
  • Sample Chief Data Officer Job Descriptions
  • Best practices such as Data Governance Board

III. Summary of Agency Actions and Reporting Requirements

This section includes a high-level summary of agency actions and reporting requirements which are described in detail in the Policy Requirements section. Some requirements are one-time requirements, and others shall be updated quarterly as a part of the E-Gov IDC. This guidance uses three reporting channels: * MAX Collect [^21] * MAX Community [^22] * Agency www.[agency].gov/digitalstrategy pages [^23]

 

Agency Actions and Reporting Requirements By Nov 1, 2013 After Nov 1, 2013 Page
A. Create and maintain an Enterprise Data Inventory (Inventory)      
Develop an Inventory Schedule   5
Publish Inventory Schedule on the www.[agency].gov/digitalstrategy page [^24]   5
Create an Enterprise Data Inventory   5
Submit Inventory Snapshot in a JSON format to the MAX Community [^25]   5
Maintain the Enterprise Data Inventory: Expand, Enrich, Open   5
Update Inventory Snapshot quarterly in a JSON format in MAX Community [^26]   5
Update the Inventory Schedule on the www.[agency].gov/digitalstrategy [^27] page, revise plans and describe actual results as each quarter completes   5
B.Create and maintain a Public Data Listing      
Create and publish Public Data Listing in JSON format at www.[agency].gov/data.json   6
Maintain the Public Data Listing   6
C. Create a process to engage with customers to help facilitate and prioritize data release      
Establish Customer Feedback Mechanism   7
Describe Customer Feedback Processes on www.[agency].gov/digitalstrategy [^28]   8
Follow and update process as necessary   8
D.Document if data cannot be released      
Develop Data Publication Process   10
Publish an overview of Data Publication Process on the www.[agency].gov/digitalstrategy [^29] page   10
Update process as necessary   10
E. Clarify roles and responsibilities for promoting efficient and effective data release      
Report Point Of Contact of roles and responsibilities, including contact information for each listed responsibility in MAX Collect [^30]   10
Update the Point Of Contact and contact information for each listed responsibility in MAX Collect as part of the quarterly E-Gov IDC [^31]   10

Appendix

Enterprise Data Inventory Enrichment Examples

Enrichment Area Examples
Tagging: Reference Models and Controlled Vocabulary These fields describe each data asset in terms which have been standardized government-wide. See Project Open Data for additional examples and best practices. Some examples include: FEAv2 Data Reference Model, FEAv2 Business Reference Model, OMB Budget Function Codes, Related Data.gov Community, Schema.org
Cross-Inventory Identifier Mapping These fields describe related entries in other “Inventory” lists. Some examples include: Program (from OPPM’s Program Inventory), Related IT investment from FY2015 Exhibit 53 (UII), Related OIRA Information Collection Request, Related Performance.gov Agency Strategic Objective, Related Federal Data Center Consolidation Initiative data center ID
Information Quality These fields describe any aspects of data quality evaluated by the agency, consistent with OMB’s Government-Wide Information Quality Guidelines (for example, the type of pre-dissemination review, use of existing standards, documents characterizing missing data in time, or spatial series).
Data Value These fields describe internal and external use, reuse, and demand by customers and users.
Openness These fields describe to what extent each data asset achieves the criteria for “open data” in M-13-13.

Footnotes

1

Data Asset: A collection of data elements or datasets that make sense to group together. Each community of interest identifies the Data Assets specific to supporting the needs of their respective mission or business functions. Notably, a Data Asset is a deliberately abstract concept. A given Data Asset may represent an entire database consisting of multiple distinct entity classes, or may represent a single entity class.

[^2]

Programs from the Federal Program Inventory: http://goals.performance.gov/federalprograminventory

[^3]

Bureaus from OMB Circular A-11 Appendix C: http://www.whitehouse.gov/sites/defa...year/app_c.pdf

[^4]

For example, by applying the categorizations of “bureau” and “business,” an agency might create classes of “bureau and business,” and choose to tackle “Bureau A, B, & C’s education grants-related data” first.

[^5]

Information collection requests (ICR): http://www.reginfo.gov/public/jsp/PR...aDashboard.jsp

[^6]

For example: * OSTP Memorandum Increasing Access to Results of Federally Funded Scientific Research: http://www.whitehouse.gov/sites/defa..._memo_2013.pdf * OMB Memorandum M-13-17 Next Steps in the Evidence and Innovation Agenda: http://www.whitehouse.gov/sites/defa...13/m-13-17.pdf * OMB Memorandum M-12-14, Use of Evidence and Evaluation in the 2014 Budget: http://www.whitehouse.gov/sites/defa...12/m-12-14.pdf

[^7]

IT Dashboard for Exhibit 53 and 300 reporting on IT investments: https://www.itdashboard.gov/

[^8]

For example the statistical and geospatial communities have mature metadata standards

[^9]

For example discipline specific

[^10]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^11]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^12]

By following the instructions at: https://max.omb.gov/community/x/8YamK

[^13]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^14]

The value of “public” in the AccessLevel metadata field should be used to refer to a data asset that is or could be made publicly available to all without restrictions. This includes 1) data assets that have already been openly published online, and 2) data assets that have not yet been made publicly available but could be.

[^15]

The presence of an accessURL value in a data asset’s metadata will indicate whether or not the data asset has been published or released. This avoids human error in manually updating the accessLevel field when there is an automatic, reliable means of determining the same thing.

[^16]

Publicly available refers to data assets whose contents are downloadable from the Public Data Listing by the public via an accessURL.

[^17]

Working with government stakeholders is encouraged through existing initiatives such as: * OMB Memorandum M-13-17 Next Steps in the Evidence and Innovation Agenda: http://www.whitehouse.gov/sites/defa...13/m-13-17.pdf * OMB Memorandum M-12-14, Use of Evidence and Evaluation in the 2014 Budget: http://www.whitehouse.gov/sites/defa...12/m-12-14.pdf * OMB Memorandum M-11-02 Sharing Data While Protecting Privacy: http://www.whitehouse.gov/sites/defa...011/m11-02.pdf

[^18]

Agency Digital Government Strategy page by following the instructions at: https://max.omb.gov/community/x/kIamK

[^19]

The inventory’s “access levels” should be implemented consistent with Executive Order 13556, which sets out the framework for designating the Controlled Unclassified Information (CUI) categories and subcategories that will serve as exclusive designations for identifying unclassified information throughout the Executive branch that requires safeguarding or dissemination controls, pursuant to and consistent with applicable law, regulations, and Government-wide policies.

[^20]

By following the instructions at: https://max.omb.gov/community/x/uIemK

[^21]

By following the instructions at: https://max.omb.gov/community/x/uIemK

[^22]

By following the instructions at: https://max.omb.gov/community/x/8YamK

[^23]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^24]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^25]

By following the instructions at: https://max.omb.gov/community/x/8YamK

[^26]

By following the instructions at: https://max.omb.gov/community/x/8YamK

[^27]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^28]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^29]

By following the instructions at: https://max.omb.gov/community/x/kIamK

[^30]

By following the instructions at: https://max.omb.gov/community/x/uIemK

[^31]

By following the instructions at: https://max.omb.gov/community/x/uIemK

GeoNames

Source: http://www.geonames.org/

The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge.

Info

Source: http://download.geonames.org/export/dump/readme.txt

Readme for Geonames.org :
=========================

This work is licensed under a Creative Commons Attribution 3.0 License,
see http://creativecommons.org/licenses/by/3.0/
The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.

The data format is tab-delimited text in utf8 encoding.


Files :
-------
XX.zip                   : features for country with iso code XX, see 'geoname' table for columns
allCountries.zip         : all countries combined in one file, see 'geoname' table for columns
cities1000.zip           : all cities with a population > 1000 or seats of adm div (ca 80.000), see 'geoname' table for columns
cities5000.zip           : all cities with a population > 5000 or PPLA (ca 40.000), see 'geoname' table for columns
cities15000.zip          : all cities with a population > 15000 or capitals (ca 20.000), see 'geoname' table for columns
alternateNames.zip       : two files, alternate names with language codes and geonameId, file with iso language codes
admin1CodesASCII.txt     : ascii names of admin divisions. (beta > http://forum.geonames.org/gforum/posts/list/208.page#1143)
admin2Codes.txt          : names for administrative subdivision 'admin2 code' (UTF8), Format : concatenated codes <tab>name <tab> asciiname <tab> geonameId
iso-languagecodes.txt    : iso 639 language codes, as used for alternate names in file alternateNames.zip
featureCodes.txt         : name and description for feature classes and feature codes 
timeZones.txt            : countryCode, timezoneId, gmt offset on 1st of January, dst offset to gmt on 1st of July (of the current year), rawOffset without DST
countryInfo.txt          : country information : iso codes, fips codes, languages, capital ,...
                           see the geonames webservices for additional country information,
                                bounding box                         : http://ws.geonames.org/countryInfo?
                                country names in different languages : http://ws.geonames.org/countryInfoCSV?lang=it
modifications-<date>.txt : all records modified on the previous day, the date is in yyyy-MM-dd format. You can use this file to daily synchronize your own geonames database.
deletes-<date>.txt       : all records deleted on the previous day, format : geonameId <tab> name <tab> comment.

alternateNamesModifications-<date>.txt : all alternate names modified on the previous day,
alternateNamesDeletes-<date>.txt       : all alternate names deleted on the previous day, format : alternateNameId <tab> geonameId <tab> name <tab> comment.
userTags.zip		: user tags , format : geonameId <tab> tag.
hierarchy.zip		: parentId, childId, type. The type 'ADM' stands for the admin hierarchy modeled by the admin1-4 codes. The other entries are entered with the user interface. The relation toponym-adm hierarchy is not included in the file, it can instead be built from the admincodes of the toponym.


The main 'geoname' table has the following fields :
---------------------------------------------------
geonameid         : integer id of record in geonames database
name              : name of geographical point (utf8) varchar(200)
asciiname         : name of geographical point in plain ascii characters, varchar(200)
alternatenames    : alternatenames, comma separated varchar(5000)
latitude          : latitude in decimal degrees (wgs84)
longitude         : longitude in decimal degrees (wgs84)
feature class     : see http://www.geonames.org/export/codes.html, char(1)
feature code      : see http://www.geonames.org/export/codes.html, varchar(10)
country code      : ISO-3166 2-letter country code, 2 characters
cc2               : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
admin1 code       : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code       : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) 
admin3 code       : code for third level administrative division, varchar(20)
admin4 code       : code for fourth level administrative division, varchar(20)
population        : bigint (8 byte int) 
elevation         : in meters, integer
dem               : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone          : the timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format


AdminCodes:
Most adm1 are FIPS codes. ISO codes are used for US, CH, BE and ME. UK and Greece are using an additional level between country and fips code.



The table 'alternate names' :
-----------------------------
alternateNameId   : the id of this alternate name, int
geonameid         : geonameId referring to id in table 'geoname', int
isolanguage       : iso 639 language code 2- or 3-characters; 4-characters 'post' for postal codes and 'iata','icao' and faac for airport codes, fr_1793 for French Revolution names,  abbr for abbreviation, link for a website, varchar(7)
alternate name    : alternate name or name variant, varchar(200)
isPreferredName   : '1', if this alternate name is an official/preferred name
isShortName       : '1', if this is a short name like 'California' for 'State of California'
isColloquial      : '1', if this alternate name is a colloquial or slang term
isHistoric        : '1', if this alternate name is historic and was used in the past

Remark : the field 'alternatenames' in the table 'geoname' is a short version of the 'alternatenames' table without links and postal codes but with ascii transliterations. You probably don't need both. 
If you don't need to know the language of a name variant, the field 'alternatenames' will be sufficient. If you need to know the language
of a name variant, then you will need to load the table 'alternatenames' and you can drop the column in the geoname table.



Statistics on the number of features per country and the feature class and code distributions : http://www.geonames.org/statistics/ 


Continent codes :
AF : Africa			geonameId=6255146
AS : Asia			geonameId=6255147
EU : Europe			geonameId=6255148
NA : North America		geonameId=6255149
OC : Oceania			geonameId=6255151
SA : South America		geonameId=6255150
AN : Antarctica			geonameId=6255152


If you find errors or miss important places, please do use the wiki-style edit interface on our website 
http://www.geonames.org to correct inaccuracies and to add new records. 
Thanks in the name of the geonames community for your valuable contribution.

Data Sources:
http://www.geonames.org/data-sources.html


More Information is also available in the geonames faq :

http://forum.geonames.org/gforum/forums/show/6.page

The forum : http://forum.geonames.org

or the google group : http://groups.google.com/group/geonames

Free Gazetteer Data

Source: http://download.geonames.org/export/dump/

allCountries.zip                           21-Aug-2013 03:58  232M  My Note: I downloaded this
Readme for Geonames.org : My Note: See above

Free Postal Code Data

Source: http://download.geonames.org/export/zip/

 allCountries.zip        21-Aug-2013 06:39  9.1M My Note: I downloaded this
Readme for GeoNames.org Postal Code files :

This work is licensed under a Creative Commons Attribution 3.0 License.
This means you can use the dump as long as you give credit to geonames (a link on your website to www.geonames.org is ok)
see http://creativecommons.org/licenses/by/3.0/
UK: Contains Royal Mail data Royal Mail copyright and database right 2010.
The Data is provided "as is" without warranty or any representation of accuracy, timeliness or completeness.

This readme describes the GeoNames Postal Code dataset.
The main GeoNames data dump is here: http://download.geonames.org/export/dump/


For many countries lat/lng are determined with an algorithm that searches the place names in the main geonames database 
using administrative divisions and numerical vicinity of the postal codes as factors in the disambiguation of place names. 
For postal codes and place name for which no corresponding toponym in the main geonames database could be found an average 
lat/lng of 'neighbouring' postal codes is calculated.
Please let us know if you find any errors in the data set. Thanks

Warning:
  The lat/lng accuracy for Turkey and Indian Postal Index Number (PIN) is not very high, we have been asked to include the data for India in the dump despite this inaccuracies.
For Canada we have only the first letters of the full postal codes (for copyright reasons)

The Argentina data file contains 4-digit postal codes which were replaced with a new system in 1999.

For Brazil only major postal codes are available (only the codes ending with -000 and the major code per municipality.

The data format is tab-delimited text in utf8 encoding, with the following fields :

country code      : iso country code, 2 characters
postal code       : varchar(20)
place name        : varchar(180)
admin name1       : 1. order subdivision (state) varchar(100)
admin code1       : 1. order subdivision (state) varchar(20)
admin name2       : 2. order subdivision (county/province) varchar(100)
admin code2       : 2. order subdivision (county/province) varchar(20)
admin name3       : 3. order subdivision (community) varchar(100)
admin code3       : 3. order subdivision (community) varchar(20)
latitude          : estimated latitude (wgs84)
longitude         : estimated longitude (wgs84)
accuracy          : accuracy of lat/lng from 1=estimated to 6=centroid

Premium Data

Source: http://www.geonames.org/products/premium-data.html

The GeoNames Monthly Premium Data is a high quality release that has passed a huge number of consistency checks. The modifications and differences to the previous data release have been monitored and verified by members of the GeoNames team. The Monthly Premium Data is an excellent choice for enterprise users. Applications using the Premium Data can reduce their own consistency checks and let the GeoNames team to this job. 


The Premium Data Subscription covers the toponyms, the postal code files are not yet available as premium offering.

Data formats

The premium data is available in tabulator separated flat files and in RDF format. The files have the same format as the freely available files.

Sample

A small sample subset is available. My Note: I did this.

Federal Enterprise Architecture (FEA)

Source: http://www.whitehouse.gov/omb/e-gov/fea

Got questions? Send email to fea@omb.eop.gov

Guidance

FEA Reference Models

Historical Information

Management Tools

  • IT Dashboard - Your window into the Federal IT portfolio

Success Stories

Performance Reference Model

Source: http://www.whitehouse.gov/sites/defa...ectivesv3.xlsx

Introduction

The Performance Reference Model (PRM) provides a structured taxonomy for the Federal performance information required by the GPRA Modernization Act and collected and published through the required central website http://Performance.gov.

The PRM taxonomy provides a unique identifier for the performance elements currently published through Performance.gov.  In addition to the unique identifier the PRM structure provides a context for the performance element with specific performance elements.

Consistent with other reference models the PRM is structured using a 9 digit value.  The form and interpretation of those digits differs between reference models however.

Taxonomy Structure Description

Consistent with the other reference models under revision the PRM utilizes a 9 digit code to describe the reference model taxonomy.  There is a logical structure for the the coding although for the use described for the exhibit 300 the key benefit is the availability of a unique identifier that relates directly to the information published through Performance.gov. The PRM coding structure provides a scalable numbering convention to describe all currently published performance goals and objectives on Performance.gov and has the ability to extend to include broader performance contexts in the future.  The general form of the PRM is PGO - PPP- NNN

 

  • The first P is the reference model identifier prefix that indicates this string is part of the PRM
  • The G indicates the goal type- 4 for an Agency Strategic Objective (OBJ)
  • The O indicates the Objective number that corresponds with an Agency Strategic Goal. Objectives are numbered sequentially.
  • The next three PPP digits indicate the organization code for the identified goal or objective.
  • The next three NNN digits indicate the goal number within the goal type.

Description of columns

The structure of the columns in the spreadsheet closely aligns with the taxonomy.  The individual elements of the code are kept in separate columns to readily enable use of the inherent taxonomy structure through sorting functions

  • In the PRM spreadsheet the first column identifies the corresponding goal type that corresponds with the G value in the third column. 
  • The second column is the reference model designation character.  All values are “P”.
  • The third column indicates the goal type – Strategic Goal and Objective- OBJ-4. Note:  In the future other kinds of values may be added to include program specific goals or performance indicators.
  • The fourth column indicates the objective number.  Although this view of the PRM focuses exclusively on strategic objectives, parent strategic goals can be identified with a ""0"" in the objective field.
  • The fifth column includes the goal prefix (the three digit org code) for the associated goal.
  • The sixth column has the 3 digit goal number for the performance element.
  • The seventh column identifies the agency name corresponding to the fifth column.  For cross agency goals the code would be “000.”
  • Ignore the column labeled NID if visible.  This is a unique system identifier.  The column is included to facilitate system translation functions in the future.
  • The eight column includes the title for the performance element.
  • The last column provides the goal statement associated with each title for the performance element.

NEXT

Page statistics
6945 view(s) and 19 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments