Table of contents
  1. Story
  2. Slides
    1. TheDataMap
    2. TheDataMap Data Science Spreadsheet
    3. Cover Page
    4. Ontology-Linked Data Relationships
    5. The DataMap Map of Categories with Linked Data
    6. State Costs Data
    7. State Data Ecosystem
    8. USA States Repositioned
  3. Spotfire Dashboard
  4. Research Notes
  5. Ontology Summit 2014, April 28 & 29, 2014 Symposium (Face-to-Face Workshop)
  6. Ontology Summit 2014 Hackathon, March 29, 2014
    1. March 27, 2014 12:16 p.m.
    2. March 27, 2014 11:45 p.m.
    3. March 24, 2014 11:35 p.m.
    4. March 24, 2014 10:05 a.m.
    5. Ontology Design Patterns and Semantic Abstractions in Ontology Integration
    6. Standards and Semantics for Biomedicine
      1. Outline
        1. Standard vocabularies
          1. SNOMED CT
          2. RxNorm
          3. LOINC
        2. Semantics across standards
          1. Unified Medical Language System
          2. Slide 26 Integrating Subdomains 1
          3. Slide 27 Integrating Subdomains 2
          4. Slide 28 Terminology integration
          5. NCBO BioPortal
          6. CTS2 – Common Terminology Services
        3. Data elements
          1. Common data elements at NIH
        4. Information models
          1. Clinical research data
          2. Clinical information modeling initiative (CIMI)
        5. Document markup standards
          1. Clinical Document Architecture
        6. Exchanging information with patients Blue Button
        7. Biomedical standards and semantics in action
        8. Getting involved
          1. Health IT Standards Committee
          2. Standards and Interoperability (S&I) Framework
          3. IHE
          4. Semantic Web – Health Care and Live Sciences
          5. Linked Open Data Cloud
      2. Medical Ontology Research
  7. Information Artifact Ontologies Workshop
  8. theDatamap
    1. About
      1. Introduction
      2. Health Data Motivation
      3. Beyond Health Data
    2. Demographic Fields
    3. Admission and Discharge Fields
    4. HIPAA Equivalence
    5. Costs and Contacts
    6. Overview of State Survey
    7. Top Buyers of Publicly Available State Health Databases
      1. Matching Known Patients to Health Records in Washington State Data
  9. NEXT
  10. Survey of Publicly Available State Health Databases
    1. Abstract
    2. Introduction
    3. Background
    4. Methods
    5. Results
      1. Table for Figure 1
      2. Figure 1. United States map showing states that release patient-level hospital data in blue, for a total of 33 states
      3. Table 1 Demographic information released by each state
      4. Table 2 Release of state data
      5. Table 3 Where to get state data
      6. Table 4 Where state data is HIPPA compliant
      7. Figure 2. United States map showing states where demographic data is HIPAA equivalent (yellow) or non-HIPAA equivalent (red)
      8. Figure 3. United States map showing states where demographic and admission/discharge data is HIPAA equivalent (yellow) versus non-HIPAA equivalent (red)
    6. Discussion
    7. Acknowledgments
    8. References
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8

State Health Databases

Last modified
Table of contents
  1. Story
  2. Slides
    1. TheDataMap
    2. TheDataMap Data Science Spreadsheet
    3. Cover Page
    4. Ontology-Linked Data Relationships
    5. The DataMap Map of Categories with Linked Data
    6. State Costs Data
    7. State Data Ecosystem
    8. USA States Repositioned
  3. Spotfire Dashboard
  4. Research Notes
  5. Ontology Summit 2014, April 28 & 29, 2014 Symposium (Face-to-Face Workshop)
  6. Ontology Summit 2014 Hackathon, March 29, 2014
    1. March 27, 2014 12:16 p.m.
    2. March 27, 2014 11:45 p.m.
    3. March 24, 2014 11:35 p.m.
    4. March 24, 2014 10:05 a.m.
    5. Ontology Design Patterns and Semantic Abstractions in Ontology Integration
    6. Standards and Semantics for Biomedicine
      1. Outline
        1. Standard vocabularies
          1. SNOMED CT
          2. RxNorm
          3. LOINC
        2. Semantics across standards
          1. Unified Medical Language System
          2. Slide 26 Integrating Subdomains 1
          3. Slide 27 Integrating Subdomains 2
          4. Slide 28 Terminology integration
          5. NCBO BioPortal
          6. CTS2 – Common Terminology Services
        3. Data elements
          1. Common data elements at NIH
        4. Information models
          1. Clinical research data
          2. Clinical information modeling initiative (CIMI)
        5. Document markup standards
          1. Clinical Document Architecture
        6. Exchanging information with patients Blue Button
        7. Biomedical standards and semantics in action
        8. Getting involved
          1. Health IT Standards Committee
          2. Standards and Interoperability (S&I) Framework
          3. IHE
          4. Semantic Web – Health Care and Live Sciences
          5. Linked Open Data Cloud
      2. Medical Ontology Research
  7. Information Artifact Ontologies Workshop
  8. theDatamap
    1. About
      1. Introduction
      2. Health Data Motivation
      3. Beyond Health Data
    2. Demographic Fields
    3. Admission and Discharge Fields
    4. HIPAA Equivalence
    5. Costs and Contacts
    6. Overview of State Survey
    7. Top Buyers of Publicly Available State Health Databases
      1. Matching Known Patients to Health Records in Washington State Data
  9. NEXT
  10. Survey of Publicly Available State Health Databases
    1. Abstract
    2. Introduction
    3. Background
    4. Methods
    5. Results
      1. Table for Figure 1
      2. Figure 1. United States map showing states that release patient-level hospital data in blue, for a total of 33 states
      3. Table 1 Demographic information released by each state
      4. Table 2 Release of state data
      5. Table 3 Where to get state data
      6. Table 4 Where state data is HIPPA compliant
      7. Figure 2. United States map showing states where demographic data is HIPAA equivalent (yellow) or non-HIPAA equivalent (red)
      8. Figure 3. United States map showing states where demographic and admission/discharge data is HIPAA equivalent (yellow) versus non-HIPAA equivalent (red)
    6. Discussion
    7. Acknowledgments
    8. References
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8

  1. Story
  2. Slides
    1. TheDataMap
    2. TheDataMap Data Science Spreadsheet
    3. Cover Page
    4. Ontology-Linked Data Relationships
    5. The DataMap Map of Categories with Linked Data
    6. State Costs Data
    7. State Data Ecosystem
    8. USA States Repositioned
  3. Spotfire Dashboard
  4. Research Notes
  5. Ontology Summit 2014, April 28 & 29, 2014 Symposium (Face-to-Face Workshop)
  6. Ontology Summit 2014 Hackathon, March 29, 2014
    1. March 27, 2014 12:16 p.m.
    2. March 27, 2014 11:45 p.m.
    3. March 24, 2014 11:35 p.m.
    4. March 24, 2014 10:05 a.m.
    5. Ontology Design Patterns and Semantic Abstractions in Ontology Integration
    6. Standards and Semantics for Biomedicine
      1. Outline
        1. Standard vocabularies
          1. SNOMED CT
          2. RxNorm
          3. LOINC
        2. Semantics across standards
          1. Unified Medical Language System
          2. Slide 26 Integrating Subdomains 1
          3. Slide 27 Integrating Subdomains 2
          4. Slide 28 Terminology integration
          5. NCBO BioPortal
          6. CTS2 – Common Terminology Services
        3. Data elements
          1. Common data elements at NIH
        4. Information models
          1. Clinical research data
          2. Clinical information modeling initiative (CIMI)
        5. Document markup standards
          1. Clinical Document Architecture
        6. Exchanging information with patients Blue Button
        7. Biomedical standards and semantics in action
        8. Getting involved
          1. Health IT Standards Committee
          2. Standards and Interoperability (S&I) Framework
          3. IHE
          4. Semantic Web – Health Care and Live Sciences
          5. Linked Open Data Cloud
      2. Medical Ontology Research
  7. Information Artifact Ontologies Workshop
  8. theDatamap
    1. About
      1. Introduction
      2. Health Data Motivation
      3. Beyond Health Data
    2. Demographic Fields
    3. Admission and Discharge Fields
    4. HIPAA Equivalence
    5. Costs and Contacts
    6. Overview of State Survey
    7. Top Buyers of Publicly Available State Health Databases
      1. Matching Known Patients to Health Records in Washington State Data
  9. NEXT
  10. Survey of Publicly Available State Health Databases
    1. Abstract
    2. Introduction
    3. Background
    4. Methods
    5. Results
      1. Table for Figure 1
      2. Figure 1. United States map showing states that release patient-level hospital data in blue, for a total of 33 states
      3. Table 1 Demographic information released by each state
      4. Table 2 Release of state data
      5. Table 3 Where to get state data
      6. Table 4 Where state data is HIPPA compliant
      7. Figure 2. United States map showing states where demographic data is HIPAA equivalent (yellow) or non-HIPAA equivalent (red)
      8. Figure 3. United States map showing states where demographic and admission/discharge data is HIPAA equivalent (yellow) versus non-HIPAA equivalent (red)
    6. Discussion
    7. Acknowledgments
    8. References
      1. 1
      2. 2
      3. 3
      4. 4
      5. 5
      6. 6
      7. 7
      8. 8

Story

theDataMap as a Data Paper

Data Science uses the Data Mining Standard to make content Data Papers and Publications (see for example: Euretos BRAIN, Data Science for VIVO and IVMOOC, and NIST Scientific Data). This was done earlier for the Ontology Summit 2012 and now for the Ontology Summit 2014 HackathonInformation Artifact Ontologies Workshop, and Federal Big Data Working Group Meetup.

I selected theDataMap because it looks and functions like an ontology but is actually static graphics, not interactive networks (see table below).In addition, the paper Survey of Publicly Available State Health Databases is a PDF file in which the tables are images (not data tables) and are not as current as the web tables (see below) so it is a challenge to make it a Data Paper.

Recently, Professor Barry Smith, called the author's attention to the fact that their is an ontology for Data Mining as follows:

The OntoDM ontology

ontodmmodules.png

The domain of data mining (DM) deals with analyzing different types of data. The data typically used in data mining is in the format of a single table, with primitive datatypes as attributes. However, structured (complex) data, such as graphs, sequences, networks, text, image, multimedia and relational data, are receiving an increasing amount of interest in data mining. A major challenge is to treat and represent the mining of different types of structured data in a uniform fashion.

A theoretical framework that unifies different data mining tasks, on different types of data can help to formalize the knowledge about the domain and provide a base for future research, unification and standardization. Next, automation and overall support of the Knowledge Discovery in Databases (KDD) process is also an important challenge in the domain of data mining. A formalization of the domain of data mining is a solution that addresses these challenges. It can directly support the development of a general framework for data mining, support the representation of the process of mining structured data, and allow the representation of the complete process of knowledge discovery.

We propose a reference modular ontology for the domain of data mining OntoDM, directly motivated by the need for formalization of the data mining domain. The OntoDM ontology is designed and implemented by following ontology best practices and design principles. Its distinguishing feature is that it uses Basic Formal Ontology (BFO) as an upper-level ontology and a template, a set of formally defined relations from Relational Ontology (RO) and other state-of-the-art ontologies, and reuses classes and relations from the Ontology of Biomedical Investigations (OBI), the Information Artifact Ontology (IAO), and the Software Ontology (SWO). This will ensure compatibility and connections with other ontologies and allow cross-domain reasoning capabilities.

 

Data mining is a craft. It involves the application of a substantial amount of science and technology, but the proper application still involves art as well. But as with many mature crafts, there is a well-understood process that places a structure on the problem, allowing reasonable consistency, repeatability, and objectiveness. A useful codification of the data mining process is given by the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000), illustrated in Figure 2-2 below. 1 See the Wikipedia page on the CRISP-DM process model

This process diagram makes explicit the fact that iteration is the rule rather than the exception. Going through the process once without having solved the problem is, generally speaking, not a failure. Often the entire process is an exploration of the data, and after the first iteration the data science team knows much more. The next iteration can be much more well-informed. Let’s now discuss the steps in detail. Source: http://www.data-science-for-biz.com/

The Data Mining Standard was applied recently to NIST Scientific Data for Data Science as follows:

  • Business Understanding:
    • NIST Mission
    • Standardize measurement
  • Data Understanding:
    • NIST Digital Archives
      • Promised to publish raw data sets
  • Data Preparation:
    • Knowledge Base of the Above
      • Need raw data for figures
  • Modeling:
    • Semantic Knowledge Base, Data Papers, and NanoPublications
      • See White Paper on “Making Big Data Small" using Data Science and Semantics
  • Evaluation:
    • Searchability, Discovery, and Reasoning
      • Relational Queries and Graph Traversal
  • Deployment:
    • Story and Knowledge Base in MindTouch, Excel, NodeXL, Spotfire, and Be Informed
      • Data ecosystem

So the The OntoDM ontology deals with ontologies and the Data Mining Standard deals with data and broader business issues.

The ontology for theDataMap is the graphic below.

map2013.jpg

The correspondence is as follows:

  • Upper-level Ontology: The graphic above and the Knowledge Base Ontology
  • Mid-level ontologies: The 5 state data tables
  • Domain ontology: The Categories Linked Data Table

theDataMap was mined for as many data tables as could be found and readily digitized, inter-related, and visualized.

The Hackathon (e.g. Meaning) and this applications (e.g. Categories in theDataMap) is focused on:

  • Meaning: Categories in theDataMap
  • Re-use of ontology material in the form of reusable patterns: The OntoDM ontology and theDataMap ontology
  • Specific area (risk): theDataMap is a detailed description of personal data flows in the United States. The effort started with health data and is expanding to other kinds of personal data.
  • Collaborative drawing tools and ontology integration: Spotfire (Data Ecosystem​ in Information Designer)
  • Ontology editor: Excel (also used TopBraid for Data Science for FIBO)
  • Identify suitable resources both for concepts and for available data: theDataMap Categories and other Tables 

The results are shown in screen captures and interactive Spotfire Dashboard below of the Spotfire visualizations of the theDataMap Data Science spreadsheet.

It may be obvious, but it bears restating: Ontology requires strong relationships that can be quantified in order to realize the full benefits like data integration and reasoning, but real world activities and their data usually lack those strong relationship and the ability to quantify them so data science following the data mining process standard to produce semantically linked data is a way to "cross the chasm" with semantic technologies into the main stream because semantics are an important part of the data mining process and data science has crossed the chasm.

MORE TO FOLLOW

Slides

Cover Page

theDataMapDataScience-Spotfire-Cover Page.png

Ontology-Linked Data Relationships

theDataMapDataScience-Spotfire-Ontology-Linked Data Relationships.png

The DataMap Map of Categories with Linked Data

theDataMapDataScience-Spotfire-The DataMap Map of Categories with Linked Data.png

State Costs Data

theDataMapDataScience-Spotfire-State Costs Data.png

State Data Ecosystem

theDataMapDataScience-Spotfire-State Data Ecosystem.png

USA States Repositioned

theDataMapDataScience-Spotfire-USA States Repositioned.png

Spotfire Dashboard

For Internet Explorer Users and Those Wanting Full Screen Display Use: Web Player Get Spotfire for iPad App

Research Notes

Ontology Summit 2014, April 28 & 29, 2014 Symposium (Face-to-Face Workshop)

http://ontolog.cim3.net/cgi-bin/wiki...posium#nid482Q

Big Data and Semantic Web Meet Applied Ontology
Within Big Data applications, ontologies appear to have had little impact.

My Comment: This statement is not true. See below for Standards and Semantics for Biomedicine, Dr. Olivier Bodenreider, Semantics - Crossing the Chasm Workshop, March 25, 2014. I though this was the best presentation of the Workshop.

My comments:

  • Big data starts with the data (unstructured and structured)
  • Semantic Web starts with RDF/SPARQL on the data
  • Applied Ontology starts with ontologies in ontology editors

My comments:

Semantic Medline uses:

  • NLP on PubMed
  • Unified Medical Language System (Ontology-see below)
  • MySQL and now the YarcData Graph Appliance (RDF/SPARQL - compliant)

Ontology Summit 2014 Hackathon, March 29, 2014

http://ontolog.cim3.net/cgi-bin/wiki...2014_Hackathon

March 27, 2014 12:16 p.m.

Mike, Yes, thank you. I am working on theDataMap for this, Barry Smith's Information Artifact Ontologies Workshop, and the Federal Big Data Working Group Meetuphttp://semanticommunity.info/Data_Science/State_Health_Databases#Story

Brand

March 27, 2014 11:45 p.m.

Thanks Brand. Does this also mean you will be joining us on Saturday? I am trying to finalize the timings and other details today and will put up a lost of the people who we know are participating so far, so I was wondering whether I can add your name to that list.

if so, I look forward to seeing you online on Saturday.

Best regards, Mike

March 24, 2014 11:35 p.m.

Mike, I am also very interested in using TheDataMap: http://thedatamap.org/

Brand

March 24, 2014 10:05 a.m.

Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/

Everyone,

We would like to invite you to join a hackathon with a difference. This is where we will be hacking at the level of pure meaning (!)

We are inviting you to take part in the hackathon titled "Ontology Design Patterns and Semantic Abstractions in Ontology Integration" which will take place this coming Saturday, March 29th, as part of this year's Ontology Summit.

Here is the Hackathon page with details:

http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014_Hackathon_DesignPatternsAndAbstractions

The aim of this hackathon is to get some hands-on experience with the issues around re-use of ontology material in the form of reusable patterns, and the role of semantic abstraction in this. We have chosen a specific area (risk) which by its nature requires the integration and use of concepts from multiple subject areas (in particular, patterns for events, goals and probability).

We will be exploring a number of different tools and notations, including pure graphical representation of concepts; modeling tools, and RDF/OWL tools. Tools will include collaborative drawing tools like Cacoo and CMapTools, and for ontology integration we will use the Visual Ontology Modeler from Thematix. Participants may create or edit ontologies within their own RDF/OWL ontology editor such as Protégé or TopBraid Composer.

There is no need for you to pre-install anything for this hackathon.

The plan is to have an opening session where we will define the problem and identify suitable resources both for concepts and for available data.

This will be followed by a period of time for people to find or develop ontologies, and then we will have a second collaborative session later in the day to pull it all together.

Precise times will be agreed by the team, but we will ensure that both sessions are at a time suitable for folks both in Europe and the US, as a minimum.

Please mail me at this email address if you wish to participate. Please get your request in early so we can finalize the meeting times to suit you. My Note: See Above

Best regards, Mike Bennett

Ontology Design Patterns and Semantic Abstractions in Ontology Integration

http://ontolog.cim3.net/cgi-bin/wiki...ndAbstractions

Description: This hackathon will bring together a number of ontologies, ontology design patterns and high level semantic abstractions to create an ontology around the area of risk. The aim would be to think about what it would take to create a basic risk application. A possible outcome would be the specification of such an application for future development.

Future work/scope out:

  • Scope out and specify a possible application
  • Capture and / or simulate sample data
  • Specify and create SPARQL queries to interrogate the integrated ontology
  • Feed query results into a spreadsheet or a calculation application

My Comments:

  • So in our Hackathon we started and ended with ontology.
  • In my preparation for the Hackathon, I started with theDataMap (risk of loss of personal health data) which contained both an ontology (schematic based on state health databases) and the databases themselves to first create a "data paper" which is the actual data in semantic linked data format in a spreadsheet that formed the bassis for the application. I have not "Specified and created SPARQL queries to interrogate the integrated ontology" yet.
    • http://semanticommunity.info/Data_Science/State_Health_Databases (this page)
  • So as a semantic data scientist that works with both big and small data, I can see the need for cross-communication across these communities that each start in a different way and may or may not end up with using the other two.
  • I find it more logical to start with the databases, especially when they are large, apply data science to see if one can scale down the problem, follow the Data Mining Standard (6 steps and iterate) to deploy an application, and build at least a knowledge base that serves to develop a knowledge model (whcih can be expressed as an ontology).
  • Examples of this from the Federal Big Data Working Group Semantic Data Science Teams are Semantic Medline - YarcData and Healthcare.gov/Be Informed,
  • Our Semantic Data Science Teams include individuals from all three communities (big Data, Semantic Web, and Applied Ontology)

It may be obvious, but it bears restating: Ontology requires strong relationships that can be quantified in order to realize the full benefits like data integration and reasoning, but real world activities and their data usually lack those strong relationship and the ability to quantify them so data science following the Data Mining Process Standard to produce semantically linked data is a way to "cross the chasm" with semantic technologies into the main stream because semantics are an important part of the data mining process and data science has crossed the chasm.

END OF Ontology Summit 2014 - Ontology Summit 2014 Hackathon COMMENTS

Standards and Semantics for Biomedicine

http://mor.nlm.nih.gov/pubs/pres/201..._Semantics.pdf

Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

OMG TECHNICAL MEETING SPECIAL EVENT
Semantics - Crossing the Chasm Workshop
Reston, Virginia
March 26, 2014

Outline

  • Standard vocabularies
  • Data elements
  • Information models
  • Document markup standards
  • Protocols
  • Interfaces
  • Biomedical standards and semantics in action
    • Clinical quality measures (Meaningful Use)
Standard vocabularies
SNOMED CT

http://www.ihtsdo.org/
Formalism: Description logics (EL++)

RxNorm

http://www.nlm.nih.gov/research/umls/rxnorm/
Formalism: UMLS RRF format

LOINC

http://www.regenstrief.org/loinc/loinc.htm
Formalism: “DL-like”

Semantics across standards

Terminology integration systems

Unified Medical Language System

Slides 26-28
https://uts.nlm.nih.gov/

Slide 26 Integrating Subdomains 1

OlivierBodenreider03252014Slide26.png

Slide 27 Integrating Subdomains 2

OlivierBodenreider03252014Slide27.png

Slide 28 Terminology integration

OlivierBodenreider03252014Slide28.png

CTS2 – Common Terminology Services

http://informatics.mayo.edu/cts2/

Data elements

Individual variables

Common data elements at NIH

http://www.nlm.nih.gov/cde/

Information models

Questionnaires, forms

Clinical research data

http://www.cdisc.org/

Clinical information modeling initiative (CIMI)

Lister Hill National Center for Biomedical Communications
Clinical information modeling initiative (CIMI)
http://informatics.mayo.edu/CIMI/

Document markup standards

Information exchange

Clinical Document Architecture

https://www.hl7.org/

Exchanging information with patients Blue Button

Getting relevant information – InfoButton
https://www.hl7.org/

Biomedical standards and semantics in action

VALUE SETS IN CLINICAL QUALITY MEASURES
http://cms.gov

https://vsac.nlm.nih.gov/

Getting involved

A few pointers

Health IT Standards Committee
Standards and Interoperability (S&I) Framework

http://wiki.siframework.org/

Semantic Web – Health Care and Live Sciences

http://www.w3.org/blog/hcls/

Linked Open Data Cloud

Linked Open Data Cloud – Biomedical resources

Medical Ontology Research

Olivier Bodenreider
Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

Contact: olivier@nlm.nih.gov
Web: http://mor.nlm.nih.gov

Information Artifact Ontologies Workshop

FOIS 2014, Rio de Janeiro, Brazil

http://fois2014.inf.ufes.br/p/home.html

http://ncorwiki.buffalo.edu/index.ph...act_Ontologies

Summary

This workshop, held in conjunction with the FOIS 2014 conference, is designed to provide a forum for discussion of both foundational and practical issues relating to the ontological representation of information artifacts. We welcome three types of submissions: tutorial proposals (1-page), longer papers (up to 6 pages), and short progress reports (1-page).

Background

Information artifacts such as photographs, newspaper articles, books, entries in databases, computer programs, emails, video clips are entities which can be used in a variety of ways that depend on their being about something (having a topic or content or subject-matter). Information artifacts also have a variety of further attributes, including format, purpose, evidence, provenance, operational relevance, security markings. Data concerning such attributes (often called ‘metadata’) are vital to the effective exploitation of the reports, images, or signals documents for purposes of discovery and analysis.

Various attempts have been made to create controlled vocabularies for the consistent formulation of such metadata in order to enhance the degree to which the content formulated with their aid will be available to computational reasoning. These include:

Various attempts have been made to create controlled vocabularies for the consistent formulation of such metadata in order to enhance the degree to which the content formulated with their aid will be available to computational reasoning. The goal of this workshop is to advance work on resources of this sort, with a view towards coordination and convergence.

Definition and Scope

The goal of this workshop is to advance work on information artifact ontologies along the following axes:

1. introductory tutorials providing training in the development and use of specific ontologies
2. discussion of foundational issues concerning the ontological treatment of information artifacts and information entities and also concerning issues of dissemination (how can we advance the degree to which different communities use common, useful and usable ontologies)
3. advancing convergence among resources developed to represent information artifacts in various domains
4. sharing of information on existing initiatives and on plans for further development.

The workshop will accordingly consist of a mixture of tutorials, longer papers under headings 2. and 3., and short progress reports under heading 4. All tutorial proposals and papers will be refereed.

Interested participants can submit:

* A full paper (5-6 pages) that addresses foundational issues
* A one-page progress report discussing existing initiatives
* A one-page proposal for a tutorial session

Deadline for submissions:                     May 22, 2014
Deadline for notification of acceptance:  June   15, 2014
Deadline for camera-ready copy:           August 15, 2014

Rules for Submission

A limited number of 1.5 hour tutorial sessions will be offered in parallel at the beginning of the workshop. Tutorial proposals should be 1-page in length and must include a title, abstract, motivation as well as description of the content, aims, presentation style, and tutorial format. We expect the tutorials to have practical examples and exercises for participants.

Short papers should be 1 page in length and should provide an overview of on-going work on some specific information artifact ontology initiative. Long papers should be between 5 and 6 pages.

All submissions will be refereed prior to publication. Accepted papers will be published in the CEUR workshop proceedings.

Submissions may not have been published previously, nor be under review elsewhere. Papers should be submitted in PDF format to the Easychair submission page here. They should be prepared in accordance with the LNCS formatting guidelines. The workshop proceedings will be published in the CEUR-WS workshop series.

Organizing Committee

* Mauricio B. Almeida (Minas Gerais)
* Mathias Brochhausen (Arkansas)
* Laura Slaughter (Oslo)
* Barry Smith (Chair, Buffalo)

Sponsors

theDatamap

Source: http://thedatamap.org/

My Note: How does this work?

See: view-source:http://thedatamap.org/

Source: http://thedatamap.org/map2013/map2013.jpg

map2013.jpg

My Note: 12, 20, and 32 are missing

Description Example URL
Prescription Analytics are companies that use prescriptions to develop data products (e.g., marketing sales reports). IMS Health http://thedatamap.org/map2013/p1.html
Companies develop, produce, and markets drugs and pharmaceuticals for use as medication. Merck, Pfizer http://thedatamap.org/map2013/p2.html
Pharmacies include local stores charged with ensuring the safe and effective use and sale of pharmaceutical drugs. CVS, Rite-Aid, Walmart http://thedatamap.org/map2013/p3.html
Pharmacy Benefits Managers are companies hired by health payers to be responsible for approving, processing and paying prescription drug claims.   http://thedatamap.org/map2013/p4.html
Clearing Houses are companies hired by physicians and healthcare providers to process and manage payment issues with health payers. Navicure, Office Ally, ZirMed http://thedatamap.org/map2013/p5.html
Life Insurance Companies provide a lump-sum payment to beneficiaries upon your death in exchange for regularly paid payments (premiums) by you while you are alive. MetLife, Prudential http://thedatamap.org/map2013/p6.html
Vital Statistics offices at the local, regional or state level register births, marriages, divorces and deaths.   http://thedatamap.org/map2013/p7.html
Transcription services transcribe voice recordings of reports from physicians and healthcare providers into written text. Many are off-shored (e.g. in India).   http://thedatamap.org/map2013/p8.html
Coding services help healthcare providers with billing by completing insurance claim forms with diagnoses (e.g., ICD-9 codes) and procedures codes.   http://thedatamap.org/map2013/p9.html
ICU Management software integrates data from various critical care instruments in intensive care units. Software improvements require sharing ICU data with vendors. Philips http://thedatamap.org/map2013/p10.html
Employer's Wellness Programs are operated by companies hired by an employer to help improve the health of employees deemed to be at risk for illness and prod them to change their behavior (e.g. stop smoking, lose weight).   http://thedatamap.org/map2013/p11.html
You, the Patient are the source of your personal health information. Not every person is in every data sharing arrangement shown, but each arrow denotes a known sharing arrangement for some person's data.   http://thedatamap.org/map2013/p13.html
Medical Devices include personal medical devices (e.g., pace makers, walkers, glucometers) and medical equipment (e.g., mammography machine, X-ray machines).   http://thedatamap.org/map2013/p14.html
Employers (Yours, Spouse's) decide on the health plan options made available to employees and usually contribute a significant portion of the payment for each employee.   http://thedatamap.org/map2013/p15.html
Consulting Physicians are usually qualified specialists that make a diagnosis, treatment plan, or prognosis about your disease.   http://thedatamap.org/map2013/p16.html
Health Payers (Insurers) are entities other than you that pay for the cost of your health services. There are many options, including insurance companies, healthcare organizations, the government, and even employers.   http://thedatamap.org/map2013/p17.html
Providers (Physicians, Hospitals, Dentists, Eye Care providers) provide preventive, curative or rehabilitative health care services in a systematic way to individuals, families or communities.   http://thedatamap.org/map2013/p18.html
Accreditation organizations review providers and facilities to make sure they operate within the industry's best practices. Joint Commission, NCQA http://thedatamap.org/map2013/p19.html
De-identification services offer advice on how to redact patient data so that it is sufficiently de-identified and may certify the results. Privacert http://thedatamap.org/map2013/p21.html
Law Firms involved in lawsuits may obtain your personal health information, especially in medical malpractice lawsuits, even if you are not a complaining patient.   http://thedatamap.org/map2013/p22.html
Public Health agencies at the local, regional, or state level require healthcare providers and clinical labs to report certain patient information whenever cases of specific diseases or measurements occur. HIV, cancer, lead poisioning http://thedatamap.org/map2013/p23.html
Clinical or medical laboratories conduct tests on your body specimens to get information about your health pertaining to diagnosis, treatment or the prevention of a disease.   http://thedatamap.org/map2013/p24.html
Analytic companies or consultants use methods to explore, investigate, and locate patterns within collections of personal information in order to provide strategic insights or produce data or marketing products.   http://thedatamap.org/map2013/p25.html
Disease Management companies coordinate healthcare and communications between stakeholders in cases where patient self-care efforts are significant. Aetna Health Management, CIGNA Health Support http://thedatamap.org/map2013/p26.html
Personal Health Record companies allow individuals to deposit, retrieve and share personal health information. Microsoft Health Vault, Dossier, Health Record Bank Alliance http://thedatamap.org/map2013/p27.html
Discharge Data (State, Hospital Consortium) is a statewide collection of information on each hospital or physician office visit and includes your demographics, diagnoses, procedures, costs and payment information.   http://thedatamap.org/map2013/p28.html
Researchers engage a full spectrum of data affecting all aspects of personal and social life, business practices, and policy assessments.   http://thedatamap.org/map2013/p29.html
The Centers for Disease Control and Prevention (CDC) is a federal agency that operates as the national public health institute in the United States.   http://thedatamap.org/map2013/p30.html
Online Websites allow many possible opportunities for you to share your personal information publicly or among large groups. Discussion forums, Personal Genome Project, 23andMe, Facebook http://thedatamap.org/map2013/p31.html
Employee Unions legally represent workers in many industries in the United States in issues involving wages, benefits and working conditions. SEIU (includes healthcare workers) http://thedatamap.org/map2013/p33.html
Health IT companies provide information technology for the use, management and exchange of health information within and across computerized systems, including for provider or for personal use. 3M, GE http://thedatamap.org/map2013/p34.html
Other Government agencies include but are not limited to the FDA, motor vehicle registries, regional planning and development offices and professional licensing boards.   http://thedatamap.org/map2013/p35.html
The Federal Trade Commission or FTC is the national consumer protection agency in the United States and collects complaints about companies, practices and identity theft.   http://thedatamap.org/map2013/p36.html
Financial Firms include financial market institutions, but also organizations providing personal loans, mortgages, credit cards and other personal financial services.   http://thedatamap.org/map2013/p37.html
Media refers to outlets intended to reach large audiences through mass communications. Ex. television, newspapers, heavily trafficked online blogs.   http://thedatamap.org/map2013/p38.html
Real Estate Firms provide advice and marketing and related services to the real estate industry, websites, and specific sales of land-related property.   http://thedatamap.org/map2013/p39.html
Blood & Tissue   http://thedatamap.org/map2013/p40.html
Retirement & Disability   http://thedatamap.org/map2013/p41.html
Human Resources   http://thedatamap.org/map2013/p42.html
The Social Security Administration (or SSA)   http://thedatamap.org/map2013/p43.html
Personal Transport   http://thedatamap.org/map2013/p44.html
Mental Health   http://thedatamap.org/map2013/p45.html
Dental/Vision   http://thedatamap.org/map2013/p46.html
Home Health   http://thedatamap.org/map2013/p47.html
Care Facility   http://thedatamap.org/map2013/p48.html
Debt Collection   http://thedatamap.org/map2013/p49.html
Law & Justice   http://thedatamap.org/map2013/p50.html
Social Services   http://thedatamap.org/map2013/p51.html
Education   http://thedatamap.org/map2013/p52.html
Copy &Transport   http://thedatamap.org/map2013/p53.html
Registries   http://thedatamap.org/map2013/p54.html
Associations   http://thedatamap.org/map2013/p55.html
Licensing   http://thedatamap.org/map2013/p56.html
Social Support   http://thedatamap.org/map2013/p57.html

About

Source: http://thedatamap.org/about.html

Introduction

theDataMap™ is an online portal for documenting flows of personal data. It tells you where your data goes. The goal is to produce a detailed description of personal data flows in the United States. The effort started with health data and is expanding to other kinds of personal data.

A comprehensive data map will encourage new uses of personal data, help innovators find new data sources, and educate the public and inform policy makers on data sharing practices so society can act responsibly to reap benefits from sharing while addressing risks for harm. To accomplish this goal, the portal engages members of the public in a game-like environment to report and vet reports of personal data sharing.

When you interact with an organization, you often leave behind personal information. The receiving organization may then share your personal information with other organizations without your explicit knowledge. This hidden data can also cause you harm by making personal data available to third parties without your knowledge, while at the same time, it can make it difficult for third-parties with a legitimate interest in your data to obtain it in ways that benefit you and/or society.

A comprehensive data map will encourage new uses of personal data, help innovators find new data sources, and educate the public and inform policy makers on data sharing practices so society can act responsibly to reap benefits from sharing while addressing risks for harm. To accomplish this goal, the portal engages members of the public in a game-like environment to report and vet reports of personal data sharing.

When you interact with an organization, you often leave behind personal information. The receiving organization may then share your personal information with other organizations without your explicit knowledge. This hidden data can also cause you harm by making personal data available to third parties without your knowledge, while at the same time, it can make it difficult for third-parties with a legitimate interest in your data to obtain it in ways that benefit you and/or society.


theDataMap™ operates as a research project in the Data Privacy Lab, a program in the Institute for Quantitative Social Science (IQSS) at Harvard University. The project leader is Professor Latanya Sweeney.

 

Health Data Motivation

Only recently has health data been moving from paper to electronic form. Most Americans, pay bills electronically, email photographs, search the Web for supplemental health information, and complete so many functions online, people rarely visit brick and mortar offices anymore for basic transactions. Yet chances are a child's pediatrician uses paper records like those of his grandfather's pediatrician. The technology mismatch is striking. About 61% of Americans looked online for health information in 2009 [Fox and Jones 2009], but only 4% of American office-based physicians used fully functional electronic medical record systems in 2007 [Hing and Hsiao 2010].

The American Recovery and Reinvestment Act of 2009 ("ARRA") attempts to ignite a mass exodus from this prehistoric paper age into a tech-savvy networked cosmos by 2015 using political will and billions of dollars. If successful, patient measurements, diagnoses, procedures, medications, and demographics, along with physician notes and lab results will no longer be stored on paper but in digital format, enabling widespread sharing beyond the doctor-patient encounter. The vision of the benefits of widespread data sharing of medical information is clear. Relevant medical information should flow seamlessly across computers, devices, organizations and locations as needed. Evidence exists that doing so can offer significant improvements to patient care and possibly reduce costs [Chaudhry et al. 2006]. There are signs that health data are flowing. For example, PriceWaterHouseCoopers estimates the sharing of personal health information beyond the doctor-patient encounter is now a two billion dollar market.

With so much personal data readily available in today's data-rich network savvy world and the sharing of health data rising, it is reasonable to expect to see a litany of personal harms, but pronouncements seem rare. There are many reasons for this, but perhaps the most important is the lack of transparency in data sharing arrangements. These hidden activities make personal harms difficult to detect. How then can policy makers and individuals make educated decisions about privacy and data utility in the absence of such knowledge? There are many worthy uses for personal data beyond the person, so the goal is not to stop data sharing, but to understand the risks so society can address the risks responsibly and reap benefits.

Beyond Health Data

Much of the initial attention given to theDataMap™ stems from health data, but the project is in no way limited to health data. theDataMap™ project includes the full spectrum of sharing personal information. Even if there was a desire to limit attention to health data, the reality is that health data appears in all kinds of other data. Bob Gellman points out that health data does not respect a silo. For example, schools have records about a student's vaccinations, medications, special education needs, illnesses, and more. Motor vehicle departments have records about a driver's medical restrictions (eyeglasses, etc.) and disabilities for special license plates. Gyms, websites, banks, casualty insurers, and many others have health information often mixed together with other data about individuals. Below are kinds of data other than health to be included in theDataMap™:

  • driver license information
  • voter registration records
  • birth information
  • marriage information
  • death information
  • real estate property records
  • court records
  • divorce records
  • arrest records
  • postal address information

Admission and Discharge Fields

http://thedatamap.org/states-other.html

My Note: See Table 2 Release of state data

Overview of State Survey

http://thedatamap.org/states.html

My Note: See Table for Figure 1

Top Buyers of Publicly Available State Health Databases

Source: http://thedatamap.org/buyers.html

Public and private corporations top the most frequent multi-state buyers.

Purchaser States that Sold Purchaser Data
Truven Health Analytics AZ, CA, FL, IL, MD, MA, NJ, NY, PA, TN, WA
Optuminsight (Ingenix) CA, FL, IL, MD, MA, NJ, NY, PA, TX, WA
Milliman AZ, CA, FL, IL, MD, MA, NY, TN, TX, WA
WebMD Health AZ, CA, IL, MD, NJ, NY, PA, TN, WA
IMS Health (SDI Health and Verispan) AZ, FL, IL, MD, NJ, NY, PA, TN, WA
Intellimed International AZ, CA, FL, MD, NY, TX, WA
Service Employees International Union (SEIU) CA, FL, MD, MA, PA, TN, WA
DataBay Resources CA, FL, MA, NY, PA, WA
iVantage Health Analytics (Health Info Technics) AZ, CA, FL, NY, TN, WA
Health Market Science AZ, CA, FL, NJ, TN, WA

Above are the top 10 multi-state purchasers for 2011.

Jordan Robertson at Bloomberg News made records requests to 20 states for lists of who's buying publicly available health data sold by the state. Twelve states supplied data. For more, see his Bloomberg article, also available on Businessweek.

Washington State appears in all the top entries. Patients names can be matched to Washington State data (more...) My Note: See below

The personal data elements being bought and sold vary by state (more...My Note: See above

Hundreds of organizations purchase state health databases (more...My Note: See above

Matching Known Patients to Health Records in Washington State Data

Source: http://thedatamap.org/risks.html

 

Step 1
Step 2

 

Information from news accident reports uniquely and exactly matched medical records in publicly available Washington State health data in 43% of the cases, thereby putting names to patient records.

Sweeney L. Matching Known Patients to Health Records in Washington State Data. Harvard University. Data Privacy Lab. 1089-1. June 2013. PDF

   
Patient-level health data from the State of Washington can be purchased for $50. This publicly available dataset has virtually all hospitalizations occurring in the State in a given year, including patient demographics, diagnoses, procedures, attending physician, hospital, a summary of charges, and how the bill was paid. It does not contain patient names or addresses (only ZIPs). Newspaper stories that contained the word "hospitalized" and printed in the State of Washington were surveyed for the same year. Most news stories included a patient's name and residential information and explain why the person was hospitalized, such as vehicle accident or assault. This is the same kind of information an employer may know about an employee taking a medical leave, a creditor may know about a debtor citing health concerns as a reason for tardy payments, and family, friends or neighbors may know about a patient in a hospital.  

The first step was to look up the person's name, age and residence information in online Public Records to learn the person's date of birth and associated ZIP codes. Then, a direct comparison matched news story information and ZIP to hospital data thereby putting a name to a medical record. The fields used for matching were a combination of genderagehospitaladmit monthdiagnoses related to the incidence, and ZIP,Age is in years and months. For more details, see PDF paper.

 

Are matched results correct? See Bloomberg article.

Public and private corporations top the list of multi-state buyers of this data (more...) My Note: See above

Other states release similar data (more...My Note: See above

NEXT

Survey of Publicly Available State Health Databases

http://dataprivacylab.org/projects/50states (PDF)

Sean Hooley and Latanya Sweeney Harvard University

Cambridge, Massachusetts

shooley@fas.harvard.edulatanya@fas.harvard.edu

Abstract

We surveyed every state and the District of Columbia to see what patient specific information states release on hospital visits and how much potentially identifiable information is released in those records. Thirty-three states release hospital discharge data in some form, with varying levels of demographic information and hospital stay details such as hospital name, admission and discharge dates, diagnoses, doctors who attended to the patient, payer, and cost of the stay. We compared the level of demographic and other data to federal standards set by the Health Information Portability and Accountability Act or HIPAA), which states do not have to adhere to for this type of data. We found that states varied widely in whether their data was HIPAA equivalent; while 13 were equivalent (or stricter) with demographic fields only 3 of the 33 states that released data did so in a form that was HIPAA equivalent across all fields.

Introduction

People expect that the information they tell their doctor will remain private, and that expectation extends to doctors they see at the hospital. Doctor-patient confidentiality makes the relationship work to its fullest - if there is no fear the doctor will discuss private medical issues, a patient can feel secure telling his doctor important details, leading to better care and better data. Some states require hospitals to share information about each patient encounter, and the states in turn, may sell or give the data away. The released version does not include people’s names, but does include demographic information about the patient and details about the visit. Most people are unaware these data exist, much less that they are shared publicly. Individuals could be harmed if the data could be matched back to the patient because it contains diagnoses that may include drug and alcohol dependency, tobacco use, venereal diseases, and other sensitive information, even if that was not the reason for the hospitalization. It seems prudent to survey the decisions states make when sharing these data to see how they compare to the federal standard for sharing patient level health information to see if standards are the same.

It is important to understand that sharing data beyond the patient encounter offers many worthy benefits to society. These data may be particularly useful because they contain a complete set of hospital discharges within the state, thereby allowing comparisons across regions and states such as rating hospital and physician performances and assessing variations and trends in care, access, charges and outcomes. Research studies that have used these datasets include: examinations of utilization differences based on proximity [1], patient safety [2,3], and procedures [4]; and, a comparison of motorcycle accident results in states with and without helmet laws [5]. The very completeness that make these studies informative makes it impossible to rely on patients to consent to sharing because the resulting data would not be as complete.

Of course, when data are shared publicly, the information becomes available for many other purposes too, some that may not be as motivating. A recent Bloomberg news article reported that the top multi-state buyers of patient level hospital data are commercial and other for-profit organizations, not researchers [6]. The challenge is to find ways comprehensive patient level data can be shared widely so society can enjoy the benefits of data sharing without risks of harms to individuals.

Background

When a person goes to the hospital, information about her is recorded and in most states it is passed on to the state government or a separate nonprofit organization that collects that information for the state. Additionally, many states use the Federal-State-Industry partnership Healthcare Cost and Utilization Project (HCUP) to collect the information for them. Then the state, nonprofit or HCUP distributes this information (with names and some geographic and temporal information redacted) to the public. This flow of information is authorized in most states by a legislative mandate. Depending on the state, different levels of information are publicly available at varying costs, and some states require approval to obtain the information. Some even have different tiers of data, with variable restrictions and costs.

The Health Information Portability and Accountability Act (HIPAA) in the United States is the federal regulation that dictates sharing of medical information beyond the immediate care of the patient, prescribing to whom and how physicians, hospitals and insurers may share a patient’s medical information broadly. Not all health data is covered by HIPAA, but for medical data covered by HIPAA to be shared publicly, all dates must be in years and only the first 3 digits of the patient’s ZIP code (totally omitted, with only the state name if the population in the ZIP code is less than 20,000) can be released.1 The information states distribute about hospital stays is not covered by HIPAA, so states may make different decisions.

Methods

We performed a survey of the information each state makes publicly available as well as the cost and restrictions for the data. This was done by visiting a state’s website, using online search engines to search for “inpatient data” or “discharge data” for each state, and utilizing a subscription service to see what information each state released to the public. Some states were also contacted by email or phone.

We began by using The National Association of Health Data Organizations (NAHDO) website, a membership and educational association that maintains a web site with information on 49 states (all but Alabama) and the District of Columbia. Each state has a web page with information about the collection and release policies of their healthcare data as well as links related to that information such as the states health care organization or department, contact people, and, where relevant, the law(s) that mandate the availability of the information. Many states also had information on their websites about obtaining this data. A few, like Vermont, were free but most had at least a nominal fee and several were thousands of dollars (see Table 3).

Given time and monetary restrictions, we did not acquire every state’s health data. However, some of the states data we did acquire differed in the information they released from that reported on their website. For instance, Washington State reported on their website that they release age in years, but in fact release age in months as well in a separate data field. Virginia reported releasing 9 digit zip codes, but the data we received showed only the first 5 digits. To populate the tables in the Results section, we used fields with naming that we understood such as “AGE_GROUP” or “ZIPCODE” assisted by data dictionaries the states provide to decode the fields. However, some fields may be reporting information not readily apparent without intimate knowledge of that state’s data. The information presented here was culled from many sources and the best effort was made to collect the most accurate and up-to-date information.

Some states have Data Use Agreements that require acknowledging (by clicking on the state’s website) or signing forms agreeing to comply with the agreement. There was a great deal of variability in what the agreements required including restrictions on who could use the data and what they could do with it as well as who they could share it with and how long they were allowed to keep and use the data. Since HIPAA does not allow a Data Use Agreement to offset its standards, terms of Data Use Agreements are not considered in this paper.

Results

We organized our information into four tables. Thirty-three states provide some publicly available hospital information (see map in Figure 1). Nebraska was listed on HCUP as providing data, but NAHDO says they do not, so we considered Nebraska as not sharing hospital data for the purposes of this paper. The information presented here was culled from many sources and the best effort was made to collect the most accurate and up-todate information. Please send the authors any updates/corrections for rectification. Check dataprivacylab.org/projects/50states for updates.

Table for Figure 1

Source: http://thedatamap.org/states-admin.html My Note: This table is not in the PDF paper

State Mandate Organization Cost
Alabama      
Alaska      
Arizona Yes Arizona Department of Health Services. http://www.azdhs.gov/ $135
Arkansas Yes Arkansas Department of Health. http://www.healthy.arkansas.gov/ $485
California Yes California Office of Statewide Health Planning & Development. http://www.oshpd.ca.gov/ $200
Colorado No Colorado Hospital Association. http://www.cha.com/ $0.02 for record
Connecticut      
Delaware      
District of Columbia      
Florida Yes Florida Center for Health Information and Policy Analysis. http://ahca.myflorida.com/SCHS/ $100 / year ($25/quarter)
Georgia      
Hawaii No Hawaii Health Information Corporation. http://www.hhic.org/ $1,035
Idaho      
Illinois Yes Illinois Department of Public Health. http://www.idph.state.il.us/ $100+
Indiana      
Iowa Yes Iowa Hospital Association. http://www.ihaonline.org/ $585
Kansas      
Kentucky Yes Kentucky Cabinet for Health and Family Services- Office of Health Policy. http://www.chfs.ky.gov/ohp/ $1,535
Louisiana      
Maine Yes Maine Health Data Organization. http://www.maine.gov/mhdo/ $675
Maryland Yes Health Services Cost Review Commission. http://www.hscrc.state.md.us/ $35
Massachusetts Yes Division of Health Care Finance and Policy. http://www.mass.gov/dhcfp $835
Michigan      
Minnesota      
Mississippi      
Missouri Yes Missouri Department of Health and Senior Services. http://www.dhss.mo.gov/ $150+
Montana      
Nebraska      
Nevada Yes Healthcare Cost and Utilization Project (HCUP). http://www.hcup-us.ahrq.gov/sidoverview.jsp $$435
New Hampshire Yes New Hampshire Department of Health & Human Service. http://www.dhhs.nh.gov/ Free
New Jersey Yes New Jersey Department of Health & Senior Services. http://www.state.nj.us/health/ $60
New Mexico Yes New Mexico Department of Health. http://www.health.state.nm.us/ $485 (only 2009 and 2010 available)
New York Yes New York State Dept of Health. http://www.health.state.ny.us/ About $350, final price determined at time of request
North Carolina Yes Cecil G. Sheps Center for Health Services Research at the University of North Carolina at Chapel Hill. http://www.shepscenter.unc.edu $535
North Dakota      
Ohio      
Oklahoma Yes Oklahoma State Department of Health. http://www.ok.gov/health/ $50-7500
Oregon Yes Office for Oregon Health Policy and Research. http://www.oregon.gov/OHPPR/ $250
Pennsylvania Yes Pennsylvania Health Care Cost Containment Council (PHC4). http://www.phc4.org/ $4,500
Rhode Island Yes Rhode Island Department of Health. http://www.health.state.ri.us/ $115
South Carolina Yes South Carolina State Budget & Control Board, Office of Research and Statistics. http://ors.sc.gov/ $1.25 per 1,000 records with a minimum of a $100 charge
South Dakota Yes South Dakota Association of Healthcare Organizations. http://www.sdaho.org/ $785
Tennessee Yes Tennessee Hospital Association (THA) Health Information Network. http://www.tha-hin.com/ $10,000
Texas Yes Texas Health Care Information Collection, Center for Health Statistics, Texas Department of State Health Services. www.dshs.state.tx.us/thcic/ $6,000 recent years, 1999-2006 free
Utah Yes Office of Health Care Statistics, Utah Department of Health. http://www.health.utah.gov/hda/ $3,150
Vermont Yes Division of Health Care Administration. http://www.bishca.state.vt.us/ Free
Virginia Yes Virginia Health Information. http://www.vhi.org/ $975-$2500
Washington Yes Washington State Department of Health. http://www.doh.wa.gov/ $50
West Virginia Yes West Virginia Health Care Authority. http://www.hcawv.org/ $510
Wisconsin Yes Wisconsin Hospital Association. http://www.wha.org/ $835
Wyoming      

Notes: Costs are those to research institutions for inpatient unrestricted versions of public data files for most recent year available. A blank value indicates that State does not release public data. Some States have data available through HCUP, which may be at a different price and may offer different fields than those available directly from the State.

Figure 1. United States map showing states that release patient-level hospital data in blue, for a total of 33 states

SurveyofPubliclyAvailableStateHealthDatabases-Figure1.png

Table 1 lists the demographic information released by each state including gender, address, and age. All the states that provide data give the patients gender. Address information was the address of the insurance policy holder, which is usually the patient’s home. Ten states release 3 digit ZIP codes for addresses, subject to further masking if the ZIP code has a small population. Maine and South Carolina only provide the county name. West Virginia and Nevada provide no address information and Rhode Island stopped providing any in 2007. While all the geographic information would be HIPAA equivalent, Colorado, New York and Washington State provide birth month information, which would not be allowed if the data were covered by HIPAA. Seven states released age in age groups, which is stricter than HIPAA regulations. Age groups were variable in size though most were 5-year groups with different size groupings for children and infants. Missouri released the birth year and the rest of the states’ age data were released as age in years. Either would be HIPAA compliant.

Table 1a. Comparison of demographic data in patient-specific hospital discharge data by state, Alabama through Massachusetts. Diagonal line pattern indicates that State does not release public data. 1Age groups are variable in size. Many are groups of 5 years with different size groupings for children and infants.

Table 1b. Comparison of demographic data in patient-specific hospital discharge data by state, Michigan through Texas. Diagonal line pattern indicates that State does not release public data. 1 Age groups are variable in size. Many are groups of 5 years with different size groupings for children and infants. 2 Nebraska (via NAHDO) says they do not release but HCUP says that they release data.

Table 1c. Comparison of demographic data in patient-specific hospital discharge data by state, Utah through Wyoming. Diagonal line pattern indicates that State does not release public data. 1 Age groups are variable in size. Many are groups of 5 years with different size groupings for children and infants.

Table 1 Demographic information released by each state

Most States are Stricter than HIPAA on Patient Demographic Fields

Source: http://thedatamap.org/states-demogs.html My Note: PDF is just an image not fielded data

State Gender Address Age
Alabama      
Alaska      
Arizona Yes 5 digit zip code In Years
Arkansas Yes 3 digit zip code In Years
California Yes 3 digit (or nothing if not unique) subject to masking In Years (subject to masking)
Colorado Yes 3 digit zip code Birth month and year
Connecticut      
Delaware      
District of Columbia      
Florida Yes 5 digit zip code In Years
Georgia      
Hawaii Yes 5 digit zip code Age Group (Birth Year in HCUP)
Idaho      
Illinois Yes 3 digit zip code Age Group
Indiana      
Iowa Yes 5 digit zip code In years
Kansas      
Kentucky Yes 5 digit zip code In years
Louisiana      
Maine Yes County In Years
Maryland Yes 3 digit zip code In Years
Massachusetts Yes 3 digit zip code In Years
Michigan      
Minnesota      
Mississippi      
Missouri Yes First 3 digits if first 3 digits of ZIP has population >20,000 Birth year
Montana      
Nebraska      
Nevada Yes State In Years
New Hampshire Yes 5 digit zip code In Years
New Jersey Yes 5 digit zip code In Years
New Mexico Yes 3 digit zip code In Years
New York Yes 5 digit zip code Birth month and year
North Carolina Yes 5 digit zip code In Years
North Dakota      
Ohio      
Oklahoma Yes 5 digit zip code Age Group
Oregon Yes 3 digit zip code In Years
Pennsylvania Yes 5 digit zip code In Years
Rhode Island Yes Removed in 2007 In Years
South Carolina Yes County Age Group
South Dakota Yes 5 digit zip code In Years
Tennessee Yes County, 5 digit zip code In Years
Texas Yes 5 digit zip code (last two digits are blank if a ZIP code has fewer than 30 cases) Age Group (expanded for HIV/drug/alcohol)
Utah Yes 5 digit zip code Age Group
Vermont Yes 3 digit zip code (categories; 5-digit if pop>10k) Age Group
Virginia Yes 5 digit zip code In Years
Washington Yes 5 digit zip code In months
West Virginia Yes State In years
Wisconsin Yes 5 digit zip code In years
Wyoming      

Notes: A blank value indicates that State does not release public data. Age groups are variable in size; many are groups of 5 years with different size groupings for children and infants. The HIPPA Safe Harbor is standard requires dates be in years and geography in 5-digits (or the first 2 digits for populations less than 20,000).

Table 2 shows if and how admit date, discharge date and discharge status are released. For example, Virginia releases the year and quarter of admission and discharge as well as length of stay. All the states that released discharge data released discharge status, such as “Routine discharge” or “Dsch/Trnf to skilled nursing facility w/Medicare”, or if the patient died at the hospital. Five states released the date in the admission and/or discharge data, and 21 others released the month or quarter along with year. None of this information would be HIPAA compliant; only 7 states released HIPAA conforming year only dates (hour or day of week are not restricted by HIPAA) for both admission and discharge fields.

Table 2a. Comparison of admission and discharge fields in patient-specific hospital discharge data by state, Alabama through Massachusetts. Diagonal line pattern indicates that State does not release public data.

Table 2b. Comparison of admission and discharge fields in patient-specific hospital discharge data by state, Michigan through Texas. Diagonal line pattern indicates that State does not release public data. Nebraska (via NAHDO) says they do not release but HCUP says that they release data.

Table 2c. Comparison of admission and discharge fields in patient-specific hospital discharge data by state, Utah through Wyoming. Diagonal line pattern indicates that State does not release public data.

Table 2 Release of state data

Most States are Less Strict than HIPAA on Admission and Discharge Fields

Source: http://thedatamap.org/states-other.html My Note: PDF is just an image not fielded data

State Admission Date Discharge Date Discharge Status
Alabama      
Alaska      
Arizona Year, Month, Hour Year, Month, Hour, Length of Stay Yes
Arkansas Year, Month, Hour Year, Month, Hour, Length of Stay Yes
California Year, Quarter Year, Length of Stay Yes
Colorado Year, Month, Date, Hour Year, Month, Day of Week, Length of Stay Yes
Connecticut      
Delaware      
District of Columbia      
Florida Year, Hour Year, Length of Stay Yes
Georgia      
Hawaii Year, Month Year, Month, Length of Stay Yes
Idaho      
Illinois Year, Quarter Year, Quarter Yes
Indiana      
Iowa Year, Month, Date Year, Month, Day of Week Yes
Kansas      
Kentucky   Year, Quarter, Length of Stay Yes
Louisiana      
Maine Date Date Yes
Maryland Year, Month Year, Length of Stay Yes
Massachusetts Year, Month Year, Length of Stay Yes
Michigan      
Minnesota      
Mississippi      
Missouri Year, Hour Year, Hour, Length of Stay Yes
Montana      
Nebraska      
Nevada Year, Month, Hour Year, Month, Hour, Length of Stay Yes
New Hampshire Year, Hour Year, Hour Yes
New Jersey Year, Month, Hour Year, Hour, Length of Stay Yes
New Mexico Year, Month, Hour Year, Month, Hour, Length of Stay Yes
New York Year, Month Date Yes
North Carolina Year, Month Year, Month Yes
North Dakota      
Ohio      
Oklahoma Year, Month, Day of Week Year, Month, Day of Week Yes
Oregon   Year, Length of Stay Yes
Pennsylvania Year, Day of Week, Hour Year, Day of Week, Hour, Length of Stay Yes
Rhode Island Year, Month Year, Month, Length of Stay Yes
South Carolina Year, Month, Day of Week Year, Month, Day of Week, Length of Stay Yes
South Dakota Year, Month Year, Length of Stay Yes
Tennessee Date, Hour Date Yes
Texas Year, Day of Week Year, Quarter, Length of Stay Yes
Utah   Year, Quarter, Length of Stay Yes
Vermont   Year, Length of Stay Yes
Virginia Year, Quarter Year, Quarter, Length of Stay Yes
Washington Year, Hour Month, Hour, Length of Stay Yes
West Virginia   Year, Length of Stay Yes
Wisconsin   Year, Quarter, Length of Stay Yes
Wyoming      

Notes: A grey row indicates that State does not release public data. The HIPPA Safe Harbor is standard requires dates be in years and geography in only 3 digits (or none for populations less than 20,000).

Table 3 lists where to get the data, cost, and if the state has a mandate to release the data. Forty states have a legal mandate to collect hospital data (not all distribute it though). Some states like Washington and New Hampshire distribute the data directly, some like Virginia work through separate nonprofits to do so, while 14 (not including Nebraska) rely on HCUP to collect and distribute the information. Several states that distribute information directly or through a nonprofit also have their information available through HCUP, though the data available through HCUP may be a different price and may offer different fields than data directly from the state. Prices ranged from free to ten thousand dollars for a year’s amount of data, and often had discounts for educational institution’s or other non-profits and had different pricing for data sets with more potentially identifiable information. The costs reported here are to research institutions for the inpatient-unrestricted version of the public data file from the most recent year available.

Table 3a. Administrative information by state, Alabama through Massachusetts. 2 Data available through HCUP may be different price and may offer different fields than one from state [7]. 3Cost to research institutions for inpatient unrestricted version of public data file for most recent year available.

Table 3b. Administrative information by state, Michigan through Texas. 1 Nebraska (via NAHDO) does not release but HCUP says they release data. 2Data available through HCUP may be different price and offer different fields than one from state [7]. 3Cost to research institutions for inpatient unrestricted public data file for most recent year available. 4 Office of Health Statistics, Tennessee Department of Health also has an order form for a public use file, though the fields released are not listed. http://health.state.tn.us/statistics/index.htm

Table 3c. Administrative information by state, Utah through Wyoming. 2Data available through HCUP may be different price and offer different fields than one from state [7]. 3Cost to research institutions for inpatient unrestricted public data file for most recent year available.

Table 3 Where to get state data

States sell and give away patient-level health information

Source: http://thedatamap.org/states-admin.html My Note: PDF is just an image not fielded data, but includes an additional field Through MCUP

State Mandate Organization Through MCUP Cost
Alabama        
Alaska        
Arizona Yes Arizona Department of Health Services. http://www.azdhs.gov/ Yes $135
Arkansas Yes Arkansas Department of Health. http://www.healthy.arkansas.gov/ Yes $485
California Yes California Office of Statewide Health Planning & Development. http://www.oshpd.ca.gov/ Option 2 $200
Colorado No Colorado Hospital Association. http://www.cha.com/   $0.02 for record
Connecticut        
Delaware        
District of Columbia        
Florida Yes Florida Center for Health Information and Policy Analysis. http://ahca.myflorida.com/SCHS/ Option 2 $100 / year ($25/quarter)
Georgia        
Hawaii No Hawaii Health Information Corporation. http://www.hhic.org/ Yes $1,035
Idaho        
Illinois Yes Illinois Department of Public Health. http://www.idph.state.il.us/   $100+
Indiana        
Iowa Yes Iowa Hospital Association. http://www.ihaonline.org/ Yes $585
Kansas        
Kentucky Yes Kentucky Cabinet for Health and Family Services- Office of Health Policy. http://www.chfs.ky.gov/ohp/ Yes $1,535
Louisiana        
Maine Yes Maine Health Data Organization. http://www.maine.gov/mhdo/ Option for some years 2 $675
Maryland Yes Health Services Cost Review Commission. http://www.hscrc.state.md.us/ Yes $35
Massachusetts Yes Division of Health Care Finance and Policy. http://www.mass.gov/dhcfp Yes $835
Michigan        
Minnesota        
Mississippi        
Missouri Yes Missouri Department of Health and Senior Services. http://www.dhss.mo.gov/   $150+
Montana        
Nebraska        
Nevada Yes Healthcare Cost and Utilization Project (HCUP). http://www.hcup-us.ahrq.gov/sidoverview.jsp   $$435
New Hampshire Yes New Hampshire Department of Health & Human Service. http://www.dhhs.nh.gov/   Free
New Jersey Yes New Jersey Department of Health & Senior Services. http://www.state.nj.us/health/   $60
New Mexico Yes New Mexico Department of Health. http://www.health.state.nm.us/   $485 (only 2009 and 2010 available)
New York Yes New York State Dept of Health. http://www.health.state.ny.us/   About $350, final price determined at time of request
North Carolina Yes Cecil G. Sheps Center for Health Services Research at the University of North Carolina at Chapel Hill. http://www.shepscenter.unc.edu   $535
North Dakota        
Ohio        
Oklahoma Yes Oklahoma State Department of Health. http://www.ok.gov/health/   $50-7500
Oregon Yes Office for Oregon Health Policy and Research. http://www.oregon.gov/OHPPR/   $250
Pennsylvania Yes Pennsylvania Health Care Cost Containment Council (PHC4). http://www.phc4.org/   $4,500
Rhode Island Yes Rhode Island Department of Health. http://www.health.state.ri.us/   $115
South Carolina Yes South Carolina State Budget & Control Board, Office of Research and Statistics. http://ors.sc.gov/   $1.25 per 1,000 records with a minimum of a $100 charge
South Dakota Yes South Dakota Association of Healthcare Organizations. http://www.sdaho.org/   $785
Tennessee Yes Tennessee Hospital Association (THA) Health Information Network. http://www.tha-hin.com/   $10,000
Texas Yes Texas Health Care Information Collection, Center for Health Statistics, Texas Department of State Health Services. www.dshs.state.tx.us/thcic/   $6,000 recent years, 1999-2006 free
Utah Yes Office of Health Care Statistics, Utah Department of Health. http://www.health.utah.gov/hda/   $3,150
Vermont Yes Division of Health Care Administration. http://www.bishca.state.vt.us/   Free
Virginia Yes Virginia Health Information. http://www.vhi.org/   $975-$2500
Washington Yes Washington State Department of Health. http://www.doh.wa.gov/   $50
West Virginia Yes West Virginia Health Care Authority. http://www.hcawv.org/   $510
Wisconsin Yes Wisconsin Hospital Association. http://www.wha.org/   $835
Wyoming        

Notes: Costs are those to research institutions for inpatient unrestricted versions of public data files for most recent year available. A blank value indicates that State does not release public data. Some States have data available through HCUP, which may be at a different price and may offer different fields than those available directly from the State.

Table 4 shows whether a state’s data would be HIPAA compliant. The hospital data released by states is not covered under HIPAA, but we assessed whether it would be equivalent to HIPAA rules in Table 4. This was created using Tables 1 and 2 and assessing whether the data was equivalent to HIPAA standards - in this case all dates reported year minimally and geographic information was minimally 3 digit ZIP codes. Interestingly, six states’ demographic data not only adhered to HIPAA standards, but was stricter. However, for many states, the admission and discharge information they release was not HIPAA equivalent, only one of those states was among the three states whose data would be fully HIPAA compliant.

Table 4a. Assessment of HIPAA equivalence by state, Alabama through Massachusetts. Diagonal line pattern indicates that State does not release public data. 1,2 HIPAA equivalent if ZIP is only 3 digits and dates (including age) given only in years. "Stricter" if all fields were HIPAA equivalent and at least one is stricter. "Less strict" if any fields were not equivalent to HIPAA standard. 3 Only "yes" responses reported. All blanks that do not have diagonal pattern are "no".

Table 4b. Assessment of HIPAA equivalence by state, Michigan through Texas. Diagonal line pattern indicates that State does not release public data. 1,2 HIPAA equivalent if ZIP is only 3 digits and dates (including age) given only in years. "Stricter" if all fields were HIPAA equivalent and at least one is stricter. "Less strict" if any fields were not equivalent to HIPAA standard. 3Only "yes" responses reported. All blanks that do not have diagonal pattern are "no". 4 Nebraska (via NAHDO) says they do not release but HCUP says that they release data.

Table 4c. Assessment of HIPAA equivalence by state, Utah through Wyoming. Diagonal line pattern indicates that State does not release public data. 1,2 HIPAA equivalent if ZIP is only 3 digits and dates (including age) given only in years. "Stricter" if all fields were HIPAA equivalent and at least one is stricter. "Less strict" if any fields were not equivalent to HIPAA standard. 3Only "yes" responses reported. All blanks that do not have diagonal pattern are "no".

Table 4 Where state data is HIPPA compliant

Only 3 States Provide Protections Equivalent to HIPAA

Source: http://thedatamap.org/states-hipaa.html My Note: PDF is just an image not fielded data

State HIPAA Equivalence for Demographic Data HIPAA Equivalence for for Admission and Discharge HIPAA Equivalence or Better for Both
Alabama      
Alaska      
Arizona No No No
Arkansas Yes No No
California Yes No No
Colorado No No No
Connecticut      
Delaware      
District of Columbia      
Florida No Yes No
Georgia      
Hawaii No No No
Idaho      
Illinois Stricter No No
Indiana      
Iowa No No No
Kansas      
Kentucky No No No
Louisiana      
Maine Stricter No No
Maryland Yes No No
Massachusetts Yes No No
Michigan      
Minnesota      
Mississippi      
Missouri Yes Yes Yes
Montana      
Nebraska      
Nevada Stricter No No
New Hampshire No Yes No
New Jersey No No No
New Mexico Yes No No
New York No No No
North Carolina No No No
North Dakota      
Ohio      
Oklahoma No No No
Oregon Yes Yes Yes
Pennsylvania No Yes No
Rhode Island Stricter No No
South Carolina Stricter No No
South Dakota No No No
Tennessee No No No
Texas No No No
Utah No No No
Vermont No Yes No
Virginia No No No
Washington No No No
West Virginia Stricter Yes Yes
Wisconsin No No No
Wyoming      

Notes: A blank value indicates that State does not release public data. HIPAA equivalent if ZIP is only 3 digits and dates given only in years. "Stricter" if all fields were HIPAA equivalent or stricter. "No" (Less Strict) if any fields were not equivalent to HIPAA standard. Less strict if date, month, or quarter are given, but hour, length of stay and day of week do not effect strictness.

Figure 2 shows a map of the states whose demographic data would be HIPAA compliant, states whose data release is stricter than HIPAA and the three states that would not be HIPAA equivalent as detailed in Table 1. Figure 3 shows a map of how the 13 states whose demographic data is HIPAA equivalent drops to 3 states when admission and discharge data is screened for HIPAA equivalence.

Figure 2. United States map showing states where demographic data is HIPAA equivalent (yellow) or non-HIPAA equivalent (red)

White states do not release data.

http://thedatamap.org/states.html

SurveyofPubliclyAvailableStateHealthDatabases-Figure2.png

Figure 3. United States map showing states where demographic and admission/discharge data is HIPAA equivalent (yellow) versus non-HIPAA equivalent (red)

White states do not release data.

http://thedatamap.org/states.html

SurveyofPubliclyAvailableStateHealthDatabases-Figure3.png

Discussion

Is there vulnerability with using a standard less than HIPAA’s? Is the HIPAA standard too stringent? Washington State releases data less strict than the HIPAA standard and in recent work, Sweeney showed how patients could be matched to records in the Washington State dataset to put names to the records [8]. Table 4 shows that Washington does not seem to be alone in its vulnerability to re-identification; re-identifications may be as possible on data from the other 30 states that release fields less than the HIPAA equivalent. If so, these vulnerabilities may threaten worthy and viable uses of the data unnecessarily.

Having more identifiable data readily available makes it difficult for other entities to share their data widely too. Data with some of the same fields as these hospital records becomes vulnerable to re-identification if the data to be shared can be linked to the more identifiable hospital data. The goal is not to deprive society from the many worthy uses of the data made possible by sharing, but to match access requirements with risk, so society can enjoy the benefits of data sharing without unnecessary risks to patients. This seems achievable by making a public version of the data HIPAA equivalent and making more detailed information available under more stringent requirements.

Acknowledgments

The authors thank Amanda Black and Ryan Joyce for help locating and reviewing materials. The information presented here was culled from many sources and the best effort was made to collect the most accurate and up-to-date information. Please send the authors any updates or corrections for rectification. Check http://dataprivacylab.org/projects/50 states and thedatamap.org for the latest information. Sean Hooley’s work on this paper has been supported in part by a National Institutes of Health Grant (1R01ES021726), and Dr. Sweeney’s in part by an NSF grant (CNS-1237235).

References

1

Basu J, Friedman B. A Re-examination of Distance as a Proxy for Severity of Illness and the Implications for Differences in Utilization by Race/Ethnicity. Health Economics 2007;16(7):687-701

2

Li P, Schneider JE, Ward MM. Effect of Critical Access Hospital Conversion on Patient Safety. Health Services Research 2007;42(6 Pt 1):2089-2108.

3

Smith RB, Cheung R, Owens P, Wilson RM, Simpson L. Medicaid Markets and Pediatric Patient Safety in Hospitals. Health Services Research 2007;42(5):1981-1998.

4

Misra A. Impact of the HealthChoice Program on Cesarean Section and Vaginal Birth after C-Section Deliveries: A Retrospective Analysis. Maternal and Child Health Journal 2007;12(2):266-74.

5

Coben JH, Steiner CA, Miller TR. Characteristics of Motorcycle-Related Hospitalizations: Comparing States with Different Helmet Laws. Accident Analysis and Prevention 2007;39(1):190-196.

6

Robertson J. States’ Hospital Data for Sale Puts Privacy in Jeopardy. Bloomberg News. June 5, 2013. http://www.businessweek.com/news/201...rivacy-at-risk

7

Healthcare Cost and Utilization Project (HCUP). SID/SASD/SEDD Application Kit. May 15, 2013. http://www.hcup-us.ahrq.gov/db/state...SEDD_Final.pdf

8

Sweeney L. Matching Known Patients to Health Records in Washington State Data. Harvard University. Data Privacy Lab. 1089-1. June 2013. http://dataprivacylab.org/projects/wa/index.html

Page statistics
3782 view(s) and 51 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments