Observational Medical Outcomes Partnership

Last modified





Spotfire Dashboard


Research Notes

http://omop.org/contact: Sent request on 111/2014

Standard Vocabulary Specification Version 4.5: http://omop.org/VocabV4.5

http://vocabqueries.omop.org/: OMOP Vocabulary Queries Downloads


News & Events

OMOP Community

Observational Medical Outcomes Partnership

OMOP has been transitioned from the Foundation for the National Institutes of Health to the Reagan-Udall Foundation for the Food and Drug Administration. OMOP has demonstrated the feasibility of establishing a common infrastructure which can accommodate observational data of different types (both claims and EHRs) from sources around the world, and successfully developed and executed large-scale statistical analyses capable of enabling active drug safety surveillance across prescription medications.

About Us

Source: http://omop.org/node/22

OMOP has been transitioned from the Foundation for the National Institutes of Health to the Reagan-Udall Foundation for the Food and Drug Administration. OMOP has demonstrated the feasibility of establishing a common infrastructure which can accommodate observational data of different types (both claims and EHRs) from sources around the world, and successfully developed and executed large-scale statistical analyses capable of enabling active drug safety surveillance across prescription medications.


America’s drug-approval process has set the global standard for rigorous safety and effectiveness review, but even with clinical trials and other safeguards, it is impossible to fully understand the impact of any particular medical intervention until it is widely used. Once on the market, drugs are further studied by pharmacoepidemiologists and other researchers, who work diligently to identify safety issues and potential unanticipated benefits. Many important discoveries have been made, but researchers were always hampered by a reliance on voluntary reporting as the primary source of data. As a result, there is growing interest to use other sources of data generated within the healthcare setting. Unlike clinical trials, the use of observational data and methods to monitor medical product safety is challenged by the lack of accepted research methods and practices, leading to the well-known phenomenon of multiple studies of the same intervention yielding vastly different results.

OMOP’s Challenge

In 2007, recognizing that the increased use of electronic health records (EHR) and availability of other large sets of marketplace health data provided new learning opportunities, Congress directed the FDA to create a new drug surveillance program to more aggressively identify potential safety issues. The FDA launched several initiatives to achieve that goal, including the well-known Sentinel program to create a nationwide data network.

In partnership with PhRMA and the FDA, the Foundation for the National Institutes of Health (FNIH) launched the Observational Medical Outcomes Partnership (OMOP), a public-private partnership. This interdisciplinary research group has tackled a surprisingly difficult task that is critical to the research community’s broader aims: identifying the most reliable methods for analyzing huge volumes of data drawn from heterogeneous sources.

Employing a variety of approaches from the fields of epidemiology, statistics, computer science and elsewhere, OMOP seeks to answer a critical challenge: what can medical researchers learn from assessing these new health databases, could a single approach be applied to multiple diseases, and could their findings be proven? Success would mean the opportunity for the medical research community to do more studies in less time, using fewer resources and achieving more consistent results. In the end, it would mean a better system for monitoring drugs, devices and procedures so that the healthcare community can reliably identify risks and opportunities to improve patient care.

OMOP’s Work

Officially launched in late 2008 as a two-year pilot, OMOP worked to design experiments testing a variety of analytical methodologies in a range of data types to look for drug impacts that are already well-known. The initial findings were inconclusive, and the team found more challenges than answers.

From this initial base, the project moved forward with additional research, and found greater success. Over the course of 2011 and 2012, research yielded greater confidence that particular methods used with particular types of data can reliably identify correlations between individual medical interventions and specific health outcomes. While there is still work to be done, the findings suggest meaningful progress toward the ultimate goal.

OMOP’s Future

Having achieved its mission as envisioned by the founding members of the partnership, the OMOP Pilot concluded at FNIH in June, 2013 and the OMOP program was transferred to the Reagan-Udall Foundation for the FDA. Moving forward, OMOP will continue refining its experimental approach and expanding its dialogue with the medical and research community. Project investigators have been regularly publishing findings and discussing their work at a range of scientific meetings.


The OMOP program will be under the governance structure for the Innovation in Medical Evidence Development and Surveillance (IMEDS) Program.
To view the governance structure go to the RUF website to view the committee members and responsibilities for IMEDS Governing Bodies.


The Observational Medical Outcomes Partnership (OMOP) is currently funded by the Reagan-Udall Foundation for the FDA through contributions from AstraZeneca and Pharmaceutical Research Manufacturers of America (PhRMA).

Research Investigators

Source: http://omop.org/Investigators

William DuMouchel, PhD

Chief Statistical Scientist, Oracle Health Sciences

Bill's current research focuses on statistical computing and Bayesian hierarchical models, including applications to meta-analysis and data mining. He is the inventor of the empirical Bayesian data mining algorithm known as Gamma-Poisson Shrinker (GPS) and its successor MGPS, which have been applied to the detection of safety signals in databases of spontaneous adverse drug event reports. These methods are now used within the FDA and industry. From 1996 through 2004 he was a senior member of the data mining research group at AT&T Labs. Before that, he was Chief Statistical Scientist at BBN Software Products, where he was lead statistical designer of software advisory systems for experimental design and data analysis called RS/Discover and RS/Explore. He has been on the faculties of the University of California at Berkeley, the University of Michigan, MIT, and most recently was Professor of Biostatistics and Medical Informatics at Columbia University from 1994-1996. He has authored approximately fifty papers in peer-reviewed journals and has also been an associate editor of the Journal of the American Statistical Association, Statistics in Medicine, Statistics and Computing, and the Journal of Computational and Graphical Statistics.

Abraham G. Hartzema PharmD, MSPH, PhD, FISPE

Professor and Eminent Scholar, Perry A. Foote Chair in Health Outcomes and Pharmacoeconomics
Professor, Department of Epidemiology and Biostatistics, College of Public Health and Health Professions, University of Florida

Dr. Hartzema investigates health outcomes with an emphasis on pharmacoepidemiology, passive and active drug safety surveillance systems, therapeutic risk management, and program evaluation. He has served as principal and co-investigator on major grants from the National Institutes of Health, government entities, foundations and the pharmaceutical industry. He has co-authored and edited three books, two of which are in multiple editions, two monographs, one of which is translated in several languages, and has published and presented more than 100 chapters, journal articles, abstracts and presentations. He has served on the Scientific Board of the International Pharmaceutical Federation, and serves or served on eight editorial boards. He is the recipient of the 2007 UF Foundation research Award. He is an elected Fellow of the International Society for Pharmacoepidemiology. In 2008-2009, he spent his sabbatical in the Immediate Office of the Food and Drug Administration’s Commissioner working on the Sentinel system.

David Madigan, PhD

Executive Vice President for Arts and Sciences and Dean of the Faculty of Arts and Sciences, Columbia University

Dr. Madigan received a bachelor’s degree in Mathematical Sciences and a Ph.D. in Statistics, both from Trinity College Dublin. He has previously worked for AT&T Inc., Soliloquy Inc., the University of Washington, Rutgers University, and SkillSoft, Inc. He has over 100 publications in such areas as Bayesian statistics, text mining, Monte Carlo methods, pharmacovigilance and probabilistic graphical models. He is an elected Fellow of the American Statistical Association and of the Institute of Mathematical Statistics. He recently completed a term as Editor-in-Chief of Statistical Science.

Marc Overhage, MD, PhD

Chief Medical Informatics Officer, Siemens Health Services

Prior to joining Siemens where he leads product strategy and research, Dr. Overhage was the founding Chief Executive Officer of the Indiana Health Information Exchange and was Director of Medical Informatics at the Regenstrief Institute, Inc., and a Sam Regenstrief Professor of Medical Informatics at the Indiana University School of Medicine. He has spent over 25 years developing and implementing scientific and clinical systems and evaluating their value. With his colleagues from the Regenstrief Institute, he created a community wide electronic medical record (Indiana Network for Patient Care) containing data from many sources including laboratories, pharmacies, physician practices and hospitals in central Indiana. In order to create a sustainable financial model, he helped create the Indiana Health Information Exchange, a not-for-profit corporation. He has developed and evaluated clinical decision support including inpatient and outpatient computerized physician order entry and the underlying knowledge bases to support them.

Over the last decade, Dr. Overhage has played a significant regional and national leadership role in advancing the policy, standards, financing and implementation of health information exchange. He serves on the Health Information Technology Standards Committee and the Board of Directors of the National Quality Form as well as being engaged in a number of national healthcare initiatives. He is a member of the Institute of Medicine, a fellow of the American College of Medical Informatics and the American College of Physicians. He received the Davies Recognition Award for Excellence in Computer-Based Patient Recognition for the Regenstrief Medical Record System. Dr. Overhage was a resident in internal medicine, a medical informatics and health services research fellow and then chief medical resident at the Indiana University School of Medicine. He practiced general internal medicine for over 20 years including the ambulatory, inpatient and emergency care settings.

Christian Reich, MD, PhD

Global Head of Informatics, AstraZeneca PLC

Christian has more than 15 years of experience in life science research and medicine. He was a practicing physician in Berlin and Ulm, Germany before moving to the European Bioinformatics Institute to work on the Human Genome Project. He then joined the biotech industry in 1998, where he worked in various positions on typical challenges in drug research and development, such as gene sequence and expression analysis, clinical trial design and analysis, systems biology, and outcome research, applying computational methods to large scale biological data. Christian is the Head of Discovery Informatics at AstraZeneca. He received his bachelor’s degree in preclinical training from Humboldt University in Berlin and holds his M.D. and doctorate from the Medical University of Lübeck, Germany where he focused his research on T-cell activation and regulation.

Patrick Ryan, PhD

Head of Epidemiology Analytics, Janssen Research and Development

Dr. Ryan is leading efforts at Janssen to develop and apply analysis methods to better understand the real-world effects of medical products. As part of OMOP, he is conducting methodological research to assess the appropriate use of observational health care data to identify and evaluate drug safety issues. Patrick received his undergraduate degrees in Computer Science and Operations Research at Cornell University, his Master of Engineering in Operations Research and Industrial Engineering at Cornell, and his PhD in Pharmaceutical Outcomes and Policy from University of North Carolina at Chapel Hill. Patrick has worked in various positions within the pharmaceutical industry at Pfizer and GlaxoSmithKline, and also in academia at the University of Arizona Arthritis Center.

Martijn Schuemie, PhD

Associate Director, Epidemiology Analytics at Janssen Research and Development

Dr. Schuemie received his Master’s degree in Economics from the Erasmus University in Rotterdam, and his PhD in informatics from the Delft University of Technology. His past research includes phobia treatment using virtual reality, and text-mining in scientific literature. More recently, his work at the Erasmus University Medical Center is focused on the re-use of EHRs for research. He was one of the principal investigators of the EU-ADR project, heading the development and comparison of statistical methods for signal detection using EHRs, and developed techniques for text-mining patient records.

Paul Stang, PhD

Senior Director, Epidemiology, Janssen Research and Development

Dr. Stang has held a number of positions over the past 25 years in epidemiology and pharmacoepidemiology including the past 5 years as Senior Director of Epidemiology at Janssen Research and Development. Previously, he was a Vice-President at Cerner Corporation, which he joined after co-founding and serving as the Chief Scientific Officer of Galt Associates, a health care consulting and informatics start-up that was acquired by Cerner. He previously served in positions at other health care companies and academic medical centers including SUNY-Stony Brook Department of Neurosurgery, and the UNC Department of Neurosurgery.

Marc Suchard, MD, PhD

Professor, University of California, Los Angeles

Marc is in the forefront of high-performance statistical computing. He is a leading Bayesian statistician who focuses on inference of stochastic processes in biomedical research and in the clinical application of statistics. His training in both Medicine and Applied Probability help bridge the gap of understanding between statistical theory and clinical practicality. He has been awarded several prestigious statistical awards such as the 2003 Savage Award, the 2006 Mitchell Prize, as well as a 2007 Alfred P. Sloan Research Fellowship in computational and molecular evolutionary biology and a 2008 Guggenheim Fellowship to further computational statistics. Recently, he received the 2011 Raymond J. Carroll Young Investigator Award for a leading statistician within 10 years post-Ph.D. He is an elected Fellow of the American Statistical Association.


Source: http://omop.org/Staff

Emily Welebob





Senior Program Manager, Research

Mark Khayter





Statistics and Programming Team
Ephir, Inc.

Don Torok

Database Admin/Programming

Ephir, Inc.

2013 Symposium

Source: http://omop.org/2013Symposium

The OMOP-IMEDS 2013 Symposium sponsored by the Reagan-Udall Foundation for the FDA, had over 200 people in attendance, including representation from FDA, academic, pharmaceutical and research organizations. If you missed this year’s symposium or want to hear it again, click below on any of the presentations and download the audio/slides. Some files may take a few seconds to download.


I'm in the audience at OMOP-watching talks as they are told-Searching through the SQL-Just to find the perfect code
Must be a signal in these data-Have no doubt deep in my mind-With the OMOP toolbox-I'll be able to find.

November 5 - 6, 2013 Symposium Materials

Day 1, November 5, 2013

The first link will download for you the slides in Adobe format. The second link will start the slides/audio presentation.

​Day 2, November 6, 2013


Attachment Size
01_MapReduce_ETL_Ephir.png 1.04 MB
02_Harpaz_New Interfaces to Established Methods.pdf 430.86 KB
03_Gouripeddi_OpenFurther_OMOP Datasets.pdf 606.92 KB
04_Knoll_OMOP CDM for Clincial Trial Feasibility Assessment.pdf 499.16 KB
05_Meeker_System and User Interface.pdf 835.98 KB
06_Huser_NIH_IDR comparison and desiderata.pdf 424.06 KB
07_Rocca_FDA_Harminization of the OMOP CDM with BRIDG Model.pdf 906.42 KB
08_Jones_Using an Online DB Resource to Characterize HC Data Linkage.pdf 186.92 KB
09_Fraeman_Evidera_Creating Modified OMOP Drug Eras.pdf 435.64 KB
10_Karr_Effects of Method Parameters.pdf 8.31 MB
11_Obenchain_NISS_Observational CER.pdf 871.84 KB
12_Wu_Sanofi_Evaluation of Analytical Method Performance Using SCCS.pdf 1009.68 KB
13_Gruber_Impact of OSIM2.pdf 1.23 MB
14_Zhou_Leverging THIN Data in the OMOP CDM.pdf 929.38 KB
15_Boyce-GeriOMOP-OSIM-MDS-2013-CORRECTED.pdf 416.38 KB
16_Hosokawa_Operationalizing Asthma Analytic Plan Using OMOP CDM.pdf 1.19 MB
17_Kawabata_An Examination of OMOP CDM Vocabulary.pdf 245.52 KB
18_Voss_Quality Checks and Measures into a CDM.pdf 383.06 KB
19_Zhang_OMOP CDM conversion into AsPEN for SCAN project.pdf 649.08 KB
20_Murray_Premier Alliance Data into the OMOP CDM.pdf 1.12 MB
21_Makadia_Premier DB to the OMOP CDM.pdf 362.04 KB
22_Yamashita_Mapping Medicaid and Medicare claims.pdf 427.18 KB
23_Matcho_CPRD to the OMOP CDM.pdf 290.44 KB
26_Juneau_Graphically Examining Potential Hetergeneous Tx Effects.pdf 1.1 MB
27_Esposito_IPF cases in an electronic insurance claims db.pdf 180.52 KB
28_West_Multistate Markov Modeling.pdf 1.04 MB
29_Ofner_Regenstrief_Sensitivity Analysis of Methods for AMI.pdf 231.17 KB
30_Li_Regenstrief_Calibrating Strength of Association of Drug-Outcome Pairs.pdf 668.23 KB
31_Shaddox_Extending Bayesian.pdf 247.36 KB
32_Zhu_Switching in Medication Adherence.pdf 316.14 KB
33_Lo_Mining ADR from EHR.pdf 151.92 KB
34_Weinstein_ALI seasonality.pdf 948.77 KB
35_Shahn_Predicting Strokes.pdf 545.22 KB


Source: http://omop.org/Research

OMOP is working with research investigators and organizations to conduct methodological research and make the findings available to the public.

OMOP 2013-2016 Research Agenda
OMOP's direction is defined by the OMOP Strategic Objectives and Tasks 2013-2016. The overall strategy has an emphasis on building upon OMOP's prior research and empirical evidence base while increasing alignment with the needs of the FDA Sentinel Initiative and the community. Please contact OMOP if you have any questions or comments about the objectives and tasks.

OMOP Strategic Objectives and Tasks 2013-2016

  1. Objectives

Through its research, the Observational Medical Outcomes Partnership aims to:

  • Create a foundation for the common representation of disparate healthcare data so they can be subjected to reproducible analysis;
  • Create an empirical evidence base to support study of the association between medical products and outcomes, both known and unknown,
  • Create a framework for incorporation of observational data into evidence based decision making

If this work is successfully completed, appropriately skilled entities:

  • will be able to investigate the association between drugs of interest and the most important adverse outcomes, with an empirically based foundation for interpretation of findings
  • will have an understanding of the utility of observational healthcare data for the identification of unknown outcomes associated with medical products
  • will have a framework for incorporating the results of observational studies into decisions about medical interventions.
  1. 2013-2016 Infrastructure and Operations Tasks
  • Continue the development of the Common Data Model, standardized vocabularies, and data characterization tools that support the use of healthcare system data for the assessment of medical product safety and effectiveness.
  • Function as the catalyst for active engagement of the research, regulatory, and healthcare communities to share in the development, adoption, and implementation of the CDM and standardized vocabularies
  • A long-term goal of this task is for the healthcare system data user community to converge on a single CDM that it adopts and maintains as a data standard
  • Coordinate, conduct and collaborate in research that will enable the implementation of the FDA Sentinel project as a widely accessible, empirically characterized risk identification and analysis system, and the practice of others in the fields of risk and effectiveness identification and analysis (pharma company projects, PCORI, and others)
  • Provide a testbed to generate empirical evidence on the performance of methods and data sources prior to implementation within the FDA Sentinel Project
  • In collaboration with key stakeholders, translate key research findings into standards of practice
  • Actively communicate the OMOP research and results within the scientific, medical and healthcare community
  1. 2013-2016 Research Tasks
  • Expand the empirical evidence to support the study of the effects of medical products in observational data
    1. Extend the evidence base from the 4 health outcomes of interest that were the subject of the 2011-12 OMOP research to the approximately 20 key health outcomes of interest for pharmacovigilance (Ref: Trifiro, G., et al., Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? Pharmacoepidemiol Drug Saf, 2009. 18(12): p. 1176-84)
    2. Evaluate the contribution of observational clinical data (eg, electronic health record data), compared to claims data, to the study of medical product effects
    3. Continue the development of novel methods and the improvement of existing methods for observational data analysis
    4. Begin to explore the use of observational data for the identification of benefits
  • Develop a framework for incorporating observational data and stakeholder perspectives into decision making.
  • Develop analytical methods and evidence to enable active surveillance of newly marketed medical products over time, including establishing a framework for evaluating how and when observational data can be used to rule out medical product - outcome associations of concern or interest.
  • Develop empirical evidence to evaluate the performance of methods for detecting unknown associations between medical products and outcomes

Research Lab

Source: http://omop.org/ResearchLab

The OMOP Research Laboratory is a secure, interactive computing environment available to OMOP researchers facilitating the programming and management of a large library of analytical methods, a robust database environment to query and transform very large data sets for analysis, and a computational environment to test and analyze the performance of these methods. The OMOP Research Lab includes all processes and mechanisms for collecting and visualization of method produced results.

Technology behind the OMOP Research Lab
The OMOP Cloud Research Lab is based on Amazon.com's Elastic Cloud Computing (EC2) technology. This technology allows users to create and use virtual computers (called instances) based on predefined configurations (called machine images) on which to run their own research applications. The images contain all necessary computing environments (SAS, R, C++, Perl etc.). Observational data that OMOP acquired for its research purposes is maintained in an encrypted form in the cloud storage. These datasets are attached to the instances during the initialization process. Using cloud computing allows reducing the time required to obtain and quickly scale capacity, both up and down, as the computing requirements of a researcher changes.

Depending on the need, the following virtual machines are available:

Within the instances, the observational data are available as SAS datasets. In addition, data are stored in an Oracle database for maintenance and access to non-SAS tools. Users interact with the system in two ways:

1. Provisioning of virtual machine RL instances using the RL Web Application. The RL Launcher (RL Web App) is an EC2-hosted web application developed by OMOP. User authentication and budget management is also handled through this website. The picture below shows user access to the Research Lab Instances.

2. Users logon to the RL instances provisioned to them. Storage is automatically attached to the instantiated instances containing the observational data and providing additional workspace for the research as required. Each user also has a permanent storage that is attached to each new instance belonging to this user. Only users with the appropriate security credentials get access to the instances, and no other staff (Amazon, OMOP) can be in possession of the keys.

Data Security
OMOP is committed to effective protection of the sensitive health information from large-scale observational healthcare data obtained for OMOP research. Though these data do not contain direct Electronic Protected Health Information (ePHI), the RL was designed with the possibility to handle such data. In addition, OMOP needs to safeguard derivable Personal Identifiable Information (PII) which could lead to ePHI and privacy compliance issues unless proper protective measures are applied. OMOP has developed a comprehensive set of security policies and infrastructure to address these security issues.

Archived Research

Source: http://omop.org/ResearchArchive

OMOP has conducted a series of experiments to generate empirical evidence about the performance of observational analysis methods in their ability to identify true risks of medical products and discriminate from false findings. These experiments were designed to inform the development of a risk identification and analysis system, as envisioned by various pharmaceutical research companies and now mandated for the FDA by Congress through the FDA Amendment Act of 2007. We define a ‘risk identification and analysis system’ as a systematic and reproducible process to generate evidence efficiently to support the characterization of the potential effects of medical products from across a network of disparate observational healthcare data sources.

2011-2012 Experiments

In June 2012, the OMOP research team presented results from its latest experiments, which sheds light on recommendations for building a risk identification and analysis system, as well as guidance for interpreting observational studies. The proceedings from the 2012 OMOP Symposium are available to listen to or download the materials/presentations. Below are the OMOP 2011-2012 Test Case Reference and Research Results:

(The results file is large and will take several minutes to download)

OMOP's latest experiment, the team evaluated the performance of a risk identification system for four health outcomes of interest: acute myocardial infarction, acute liver injury, acute renal failure, and gastrointestinal bleeding. For these outcomes, OMOP established a reference set of 399 test cases: 165 ‘positive controls’ that represent medical product exposures for which there is evidence to suspect an association with the outcome, and 234 ‘negative controls’ that are drugs for which there is no evidence that they are associated with the outcome. The fundamental goal of OMOP’s research is to develop and evaluate standardized algorithms that can reliably discriminate the positive controls from the negative controls, and to understand how an estimated effect from an observational study relates to the true relationship between medical product exposure and adverse events.

From the current experiment, several insights were gained about expected behavior of a risk identification system. We observed that self-controlled designs are optimal across all outcomes and all sources, but the specific settings are different in each scenario. All sources achieve good performance (Area under ROC curve > 0.80) for acute kidney injury, acute MI, and GI bleed, while acute liver injury has consistently lower predictive accuracy. A risk identification system should confidently discriminate positive effects with relative risk>2 from negative controls, but smaller effect sizes will be more difficult to detect. There was no evidence that any of the five data sources were consistently better or worse than others, but we did observe substantial variation in estimates across sources pointing to the need to routinely assess consistency across a network of databases. The results underscore the importance of transparency and complete specification and reporting of analyses, as all study design choices were shown to have the potential to substantially shift effect estimates.

Diversity in performance and heterogeneity in estimates arose not only from different study design choices (e.g., cohort versus case-control) but also from analytic choices within study design (e.g., number of controls per case in a case-control study). We caution against generalizing these results to other outcomes or other data sources. However, we do think OMOP has now provided a well-defined procedure for how to profile a database and construct an optimal analysis strategy for a given outcome, which can be systematic, reproducible, and yield defined performance characteristics that can directly inform decision-making. OMOP’s research continues to reaffirm the notion that advancing the science of observational research requires an empirical and reproducible approach to methodology and systematic application.

2010 Experiments

The 2010 Experiment Results contains results for all of the original data partners (central and distributed partners) in addition to meta-analytic composite estimates in the OMOP2010_METHOD_RESULTS file. Secondly, the file OMOP2010_ANALYSIS_REF shows the parameter settings that were chosen for a given analysis_id in the results file.

Background Materials

Public License

Source: http://omop.org/publiclicense

The following is the license agreement for use of OMOP material:

Terms and Conditions For Use, Reproduction, and Distribution

1. Definitions.

"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:

a. You must give any other recipients of the Work or Derivative Works a copy of this License; and

b. You must cause any modified files to carry prominent notices stating that You changed the files; and 

c. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 

d. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.


Appendix: How to apply the Apache License to your work

To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.omop.org/publiclicense.
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
either express or implied.

Any redistributions of this work or any derivative work or modification
based on this work should be accompanied by the following source attribution: 
"This work is based on work by the Observational Medical Outcomes Partnership
(OMOP) and used under license from the RUF at
Any scientific publication that is based on this work should include
a reference to http://www.omop.org.

© http://www.apache.org/licenses

OMOP Implementation

Source: http://omop.org/OMOPimplementation

OMOP created a framework for observational research (My Note: See Above). This page contains the components and step-by-step instructions of how to implement this framework for your own environment.

Please note that all tools are Open Source (My Note: See Above) and research grade. If you need help with implementation, contact us and we will bring you into the community of peers and vendors.


Data Transformation and Cohorts

Data Characterization and Quality Control

Queries, Analyses and Methods


Common Data Model - This gives you the description and the DDL to create your own CDM database instance. It also contains ETL implementations for a number of popular databases, including code.


Vocabularies - This provides all the vocabularies used in the CDM. It also gives you the mapping tables you will need to convert from the source codes to the Standard Vocabularies (e.g. ICD-9 to SNOMED-CT or NDC to RxNorm).


Health Outcomes of Interest - This is a library of definitions of outcomes routinely studied for drugs.


OSCAR - Tool for systematic counts and summary statistics of the data.


Data Quality and GROUCH - Tool for systematic testing for outliers in frequencies, data over time and boundaries.


NATHAN - This tool creates information about the natural history of disease.


Standardized Vocabulary Queries - A collection of queries to find concepts through use of the Standard Vocabulary.


Standardized CDM Data Queries - A collection of queries you can use to interrogate the data.


Methods Library - A suite of analytical methods to detect association between intervention and outcome.


Common Data Model

Source: http://omop.org/CDM

The purpose of the Common Data Model (CDM) is to standardize the format and content of the observational data, so standardized applications, tools and methods can be applied to them.

This page explains the Common Data Model. It also provides a collection of Data ETL for a number of popular databases.


Common Data Model

This is the latest CDM in Version 4.0. In addition to person, condition, drug, procedure and visit information, it now models provider and cost information. This will support health economics use cases and medical treatment outcome studies, including medical device safety, comparative effectiveness and healthcare quality.
2 Extract, Transfrom and Load (ETL)
This is the most time consuming part of creating a database in OMOP format. You need to write a script or program to convert your data to meet the specifications. To make things easier and get things organized we provide a template for mapping:
  Dec. 2013
  • Background
  • HOI Implementation
  • OMOP Experiments


Related Publications

Drug Safety 36(1) Supplement. Published online: 29 Oct. 2013
Alternative outcome definitions and their effect on the performance of methods for observational outcome studies

Christian Reich, Patrick B. Ryan, Martijn J. Schuemie.

Health outcomes, definitions, claims database, electronic health record database, surveillance, drug safety surveillance, acute kidney injury, acute liver injury, acute myocardial infarction

29th ICPE, August 25-28, 2013
Alternative outcome definitions and their effect on observational outcome studies

Christian Reich, on behalf of the OMOP research team

HOI Implementation

The majority of the HOI definitions can be implemented with Regularized Identification of Cohorts (RICO). RICO is a procedure that standardizes patient cohort selection. To create HOIs, RICO is employed using cohort definitions are created

using criteria that are specified in input parameters. Patients meeting the criteria are selected from the common data model. The majority of HOI definitions can be implemented using RICO. Two exceptions are the "Acute Myocardial Infarction Definition #4" and the

"Mortality after MI Definition #4", which both involve application of the American College of Cardiology definition that includes a complicated set of criteria about troponin, CK, and EKG results. A stand-alone SAS procedure for identifying the AMI#4 cohort was developed, but

is untested for lack of relevant EKG data in the central databases within the OMOP Research Lab. However, the SAS code is available to the broader OMOP research community to adapt to their data, at http://omop.org/PrepData. There is no SQL version of this script.
Another exception is Hip Fracture Definition #3, which involves text evidence from radiology reports which is not available for development and testing within the Research Lab. An implementation of this definition should be undertaken if the data source supports this kind of


The Implementation of the HOI Definition Process in OMOP
A Request for Proposal was disseminated and members of the OMOP research team, Scientific Advisory Board, and Executive Committee selected two independent research organizations after careful evaluation of proposals. Both organizations selected had extensive

experience in systematic reviews of the literature to inform meta-analysis, guideline development, and evidence-based medicine reviews.

The OMOP research team defined the following process to be independently followed by the two HOI

research organizations:

  1. Develop optimal search strategy to identify published manuscripts of studies of an HOI in

    observational datasets

  2. Identify the relevant literature of studies conducted in observational databases that would inform

    our definition of an HOI.
    These were to include any studies reporting definitions, validation studies that measure case

    ascertainment performance (including sensitivity, specificity, positive predictive value), coding

    guidelines and clinical diagnostic guidelines

  3. For each paper, summarize results in an evidence table to help inform the final definition to be

    implemented in OMOP studies

No communication between the two HOI research organizations was permitted. The research organizations were also asked to identify and abstract clinical guidelines for a given HOI to help further inform the OMOP research team.
The OMOP research team, in collaboration with the two research organizations, developed the evidence table format. The OMOP researchers evaluated actual search strategies and the articles retrieved from each research team as the effort progressed to identify and

correct obvious shortcomings in search strategy or results based mainly on relevant citations that were identified by the OMOP researchers but were not captured in the research organization searches. These article citations were provided to the research organization to

help them identify gaps in their search strategies.
An analysis of the systematic reviews was conducted to assess the concordance between the reviews and to develop a composite search strategy that took advantage of the positive attributes and capture of each review. The goal of the composite search strategy to

increase the capture of relevant observational database and validation studies and was based on the findings from the independent research organizations. The composite search strategy was implemented in PubMed, and all returned articles were abstracted to determine

the relevance based on the original criteria provided to the independent reviewers. Studies conducted outside an observational database or that did not include a specific definition were intended to be excluded from the evidence tables. Relevant articles identified by

designated members of the OMOP research team using the composite search strategy were compared to those studies identified by each of the two independent reviews. Comparisons between the research organization findings and the composite OMOP research team

effort were undertaken to determine the effectiveness of the composite search strategy and the extent to which the findings between the two research organizations agreed.

OMOP Phase

HOI process being conducted within the OMOP Project


Phase 1

a. Perform systematic literature review for each Health Outcome of Interest

  • Identify clinical diagnostic criteria and coding guidelines
  • Extract operational definitions previously used in observational studies
  • Describe any validation studies that measure case ascertainment performance (including

    sensitivity, specificity, positive predictive value)

  • Synthesize national estimates of prevalence of disease (if available)

b. Draft HOI design document characterizing the pool of potential definitions to consider in

observational data

  • Submit HOI definition design document for public comment and review by Scientific Advisory


  • Elicit feedback from experts in specific clinical domains


Phase 2

a. Develop software code to apply definitions to Research Core databases (on raw data and common data

b. Compare the prevalence of HOI in each observational data source with a national estimate or other

published data
c. Conduct descriptive analysis of the ‘natural history’ of the HOI (including demographics, prior

conditions, prior drug utilization) in each source
d. Validate HOIs in the data source if available

Phase 3

a. Integrate summary of OMOP analyses for each HOI
b. Draft recommended best practices for HOI definition and solicit public feedback
c. Publish standardized HOI summary document


Source: http://omop.org/OSCAR

Observational Source Characteristics Analysis Report (OSCAR) Design Specification and Feasibility Assessment

In order to interpret the results of any analysis on a data source, the characteristics of the data source be clearly understood. The Observational Source Characteristics Analysis Report (OSCAR) provides a systematic approach for summarizing all observational healthcare data within the OMOP common data model. The procedure creates structured output of descriptive statistics for all relevant tables within the model to facilitate rapid summary and interpretation of the potential merits of a particular data source for addressing active surveillance needs.

Observational Source Characteristics Analysis Report (OSCAR) and Source Code:
If you have implemented CDM v4.0, use OSCAR for CDM v4.0 otherwise, use OSCAR for CDM v2.0.

OSCAR code and specifications for CDM v4.0

OSCAR code and specifications for CDM v2.0

OSCAR has many uses, including:

  • Automatic summarization of available data from a given source
  • Providing context for interpreting and analyzing findings of drug safety studies
  • Facilitating comparisons between data sources
  • Enabling comparison of overall database to specific subpopulations of interest
  • Supporting validation of transformation from raw data to OMOP common data model

OSCAR provides descriptive statistics that summarizes the entire database as a means to benchmark all studies. The diagram below outlines how we envision OSCAR fitting into the workflow for validating the transformation from raw data to the OMOP common data model.

The only prerequisite for OSCAR is that the program must be applied to a data source that conforms to the OMOP common data model, including all necessary tables and fields, and SAS 9.1 has to be available. OSCAR creates a summary result dataset in a structured format. This dataset contains descriptive statistics for all the various data elements with the common data model, but do not contain any person-level data. Organizations within the OMOP Data Community are encouraged to share these aggregate summary results by loading them into the OMOP Research Lab, where comparative analyses across the different sources can be conducted.

Data Quality

Source: http://omop.org/DataQualityGROUCH

Page not found

The requested page could not be found.


Source: http://omop.org/GROUCH

Generalized Review of OSCAR Unified Checking

Generalized Review of OSCAR Unified Checking (GROUCH) is a program that produces a summary report for each data source of warnings of implausible and suspicious data observed from the OSCAR summary. It identifies potential issues across all OMOP common data model tables, including potential concerns with all drug exposures and all conditions. GROUCH allows for data quality review of specific drugs (such as the ingredients that comprise the OMOP drugs of interest) or specific conditions (including population-level prevalence of the health outcomes of interest, and unexpected gender-specific rates, such as males with pregnancy, and females with prostate cancer).

The GROUCH Specification and source code are available. Click on the document titles below to obtain the following:

You are encouraged to implement GROUCH within your own data environments.


Source: http://omop.org/NATHAN

Utility of Natural History Information

The natural history report is a standardized summary of information about populations of interest. Natural history information is descriptive, with the intent to provide some context and expected rates of drug utilization and condition occurrence to facilitate the interpretation of benefit and risk information. Observational data offers potential value in providing summary information, both about the populations that experience a condition (the disease natural history) as well as the populations that are exposed (natural history of drug utilization). OMOP has developed the Natural History Analysis program, NATHAN, that produces a standardized report to summarize characteristics about the population of interest, including demographic factors (age and gender), co-morbidities and concomitant medications, and health service utilization prior to, during, and after the event onset.

Download NATHAN Specification and SAS Code:
OMOP NATHAN Specification, Code and Release Notes

The Observational Source Characteristics Analysis Report (OSCAR) provides a systematic approach for summarizing all data within the OMOP common data model. NATHAN is an extension of OSCAR, where data characteristics can be produced for a particular subpopulation of interest. When OSCAR is used on a data source, the following questions are addressed for all patients represented in the data:

• “How many patients were exposed to a particular drug?’
• “What is the average number of drug exposure records that comprise a drug era?”
• “What is the median age at time of diagnosis?
• “What is the largest number of condition occurrences for a given person?”
• “What is the median length of a drug era?”
• “What is the maximum time a person was observed within the database?”
• “On average, how many visits does each person have recorded?”
• “What is the distribution of age at the start of a drug therapy?”

When using NATHAN, the same questions can be answered for subgroups of patients. For example, “For patients with an MI, how many had an exposure to a particular drug?” NATHAN takes this a step further and provides some temporal association. From the first MI, NATHAN can address:

• “For patients with an MI, how many had an exposure to a particular drug in the previous 30 days?”
• “For patients with an MI, how many had an exposure to a particular drug in the previous 6 months?”
• “For patients with an MI, how many had an exposure to a particular drug in the previous 1 year?”
• “For patients with an MI, how many had an exposure to a particular drug at any time in the patients observed history?”

Standardized Vocabulary Queries

Source: http://vocabqueries.omop.org/

Many researchers have implemented the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and Standard Vocabularies (http://omop.org/CDMvocabV4). The purpose of this document is to provide the research community with a set of queries that will simpilify access to most common vocabulary questions which operate in Version 4.0 of the OMOP CDM and Standard Vocabularies.

The queries are designed to work in an OMOP CDM environment. The queries were tested on an Oracle database. While the queries are "generic", they may require minor syntax modifications to work on other Relational Databases.

Standardized CDM Data Queries

Source: http://cdmqueries.omop.org/

Many researchers have implemented the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The purpose of this document is to provide the research community with a set of queries that will simpilify access to most common questions which operate in Version 4.0 of the OMOP CDM.

The queries are designed to work in an OMOP CDM environment. Unless otherwise specified, the queries were tested on an Oracle and RedShift database. While the queries are "generic", they may require minor syntax modifications to work on other Relational Databases.

Methods Library

Source: http://omop.org/MethodsLibrary

OMOP has built a library of methods, developed for the OMOP Common Data Model, to address the analysis problems of Monitoring of Health Outcomes of Interest and Identification of Non-Specified Conditions. These methods are tested across the OMOP Data Community. These methods are available under the Apache public license. If you would like to contribute to the methods, please contact OMOP by adding a new comment below.

In 2011, the OMOP completed its originally defined set of research experiments to empirically evaluate the performance of alternative methods on their ability to identify true associations between drugs and outcomes. This initial research highlighted opportunities for methods enhancement. The links below contain source code and instructions on how to execute these updated methods.

OMOP Initial Research Methods

The links below contain source code and instructions on how to execute these methods for the OMOP performance measurement experiments.

Regularized Identification of Cohorts (RICO) - ProSanos Corporation

Technology Requirements

Source: http://omop.org/TechnologyRequirements

OMOP Minimal Technology Requirements


  1. Space requirements are dependent on the size of the raw databases being used.
  2. To create CDM, you will need 3 times as much freespace as the size of the database.
  3. Data can be stored in either SAS or RDBMS; the OMOP Research Lab stores data in both SAS and Oracle.
  4. You will need an additional 3 times the size of database for tempspace (TEMP, UTILLOC) to run the methods.
  5. For running methods: minimal recommendations include 20 processors, 64-bit OS (Windows or Unix), 32-gigabytes of RAM.

SAS is the minimal software requirement with some analysis methods requiring additional software packages. The OMOP tools and procedures listed below are listed with software dependencies.

Tools and Procedures Software Dependencies
Thomson Reuters ETL Oracle
GE ETL Oracle
Any other ETL Your choice
OSCAR- Observational Source Characteristics Analysis Report (OSCAR) Base SAS, SAS Access (if CDM is in RDB)
NATHAN- Natural History Analysis (NATHAN) Base SAS, SAS Access (if CDM is in RDB)
RICO- Regularized Identification of Cohorts (RICO) Oracle or SAS
Generalized ERA Logic Developer (GERALD) Oracle or SAS
Era Integration (ERIN) Oracle or SAS
Generalized Review of OSCAR Unified Checking (GROUCH) Base SAS
Standard terminologies datasets SAS or text
Vocabulary Use Cases Oracle
Analysis Methods Software Dependencies
Disproportionality Analysis (DP) Base SAS, SAS/IML
Multiset Case-Control Estimation (MSCCE) Base SAS, SAS Access (if CDM is in RDB)
Univariate Self-controlled case series (USCCS) Base SAS, SAS/IML, SAS Access (if CDM is in RDB)
Bayesian logistic regression (BLR) Base SAS, SAS/IML, BBRtrain, C++
Observational Screening (OS) Base SAS, SQL, SAS Access (if CDM is in RDB)
Case-Control Surveillance (CCS) Base SAS, SAS Access (if CDM is in RDB)
Case-crossover (CCO) Base SAS, SAS Access (if CDM is in RDB)
Statistical Relational Learning (SRL) PROLOG, Python
Maximized Sequential Probability Ratio Test (MSPRT) Base SAS
Conditional Sequential Sampling Procedure (CSSP) Base SAS
Temporal Pattern Discovery (ICTPD) R, Oracle or SQL Server
High-dimensional propensity scoring cohort (HDPS) Base SAS, SAS/IML optional
Incident user design (IUD-HOI) Base SAS
Highthroughput Screening by Indiana University (HSIU) Base SAS
Multivariate self-controlled case series (MSCCS) Base SAS, C++

The software packages are required in the following versions or higher:


GCC 3.4.6

Perl 5.8.3 built for sun4-solaris-64int

Prolog 5.1.3

Python 2.4.3

Foundation for Statistical Computing 2.9.0

SAS 9.1.3
SAS Analytics Pro
Base SAS Software
SAS/GRAPH Software
SAS/STAT Software
SAS/IML Software

Distributed Partner Technology
Click on the Distributed Research Partner to view thier hardware and software requirements implemented to perform OMOP research.

1. Humana, Inc.

2. Partners Healthcare System

3. Regenstrief Institute / Indiana University

4. SDI Health

5. Department of Veterans Affairs Center for Medication Safety / Outcomes Research, Pharmacy Benefits Management Services

Simulated Data

Source: http://omop.org/node/70

Methodological research typically requires some benchmark or ‘gold standard’ against which to measure performance. In this context, a desired gold standard would be a true causal relationship between a drug and a health outcome. Unfortunately, most observational data sources are poorly characterized, clinical observations may be insufficiently recorded or poorly validated, and actual ‘truth’ may not be absolutely determined. True relationships between drugs and outcomes may be difficult to ascertain as these ‘known associations’ may be affected by issues including sample size, adequacy of data capture, and confounding.

Because of these issues and the desire to have a common, acceptable test set, OMOP designed and developed an automated procedure to construct simulated datasets to supplement the methods evaluation. The simulated datasets (OSIM - Observational Medical Dataset Simulator) are modeled after real observational data sources, and comprised of hypothetical persons with fictional drug exposure and health outcomes occurrence, but representative of the types of relationships expected to be observed within real observational data sources. Because the simulated data will represent hypothetical patients, fictional drug classes and outcomes types, there can be no clinical interpretations drawn from the data.

The simulated datasets will only be used to perform statistical evaluations of the analytical methods offered to identify drug-outcome associations. The performance characteristics (sensitivity, specificity, positive and negative predictive value) of the analytical methods can then be empirically measured in terms of the known characteristics of the data will enable the classification of the drug-outcome relationships as ‘true’ or ‘false’ and methods will be executed to classify the drug-outcome pairs as ‘positive’ or ‘negatives’.

OSIM Publications 
Murray RE, Ryan PB, & Reisinger SJ. (2011). Design and Validation of a Data Simulation Model for Longitudinal Healthcare Data. AMIA Annu Symp Proc., USA, 2011: 1176–1185.


Source: http://omop.org/OSIM2

My Note: I downloaded here

Observational Medical Dataset Simulator Generation 2

Simulated Observational Data - OSIM2 Available for CDM V2 only

This web page presents the Observational Medical Dataset Simulator (OSIM) Version 2
(updated April 11, 2012).

The initial Observational Medical Dataset Simulator was released in 2009 and used to generate datasets with millions of hypothetical patients with drug exposure, background conditions, and known adverse events for the purpose of benchmarking methods performance. OSIM has provided large-scale datasets to methodologists and facilitated the establishment of the OMOP Cup Competition. It also advanced the OMOP Research Team's insights about the complex interdependencies between clinical observations in real data, and how those relationships may influence a method's behavior in identifying true associations and discerning from false positive findings.

Based on these insights, continued research has resulted in the development of a second-generation simulated dataset procedure, known as OSIM2. OSIM2 represents an alternative design to accommodate additional complexities observed in real-world data, including advanced modeling of the correlations between drugs and conditions. OSIM2 allows for more direct comparisons between simulated data and real observational databases, and should enable greater methods evaluation by allowing assessment of how methods accommodate these complex interrelationships. At OMOP, OSIM2 is used to benchmark the performance of methods to estimate the strength of association between drug treatment and outcome.

OSIM2 source code, documentation, and databases are available for download:

Download of OSIM2 Datasets
We have generated 16 OSIM2 datasets that are now available for download. Each dataset is a 10m person dataset modeled after Thomson Reuters MarketScan® Lab Database (MSLR), one without any signals injected, and then the other 15 databases have different size/types of signals (relative risk: 1.25, 1.5, 2, 4, 10; and risk type: acute onset (equals 'any exposure' events occurring within 30d of exposure start), insidious, and accumulative). MSLR, covering 2003 – 2009, represents privately-insured population, with administrative claims from inpatient, outpatient, and pharmacy services supplemented by laboratory results.

The datasets listed below are freely available for download through OMOP’s anonymous FTP server. For example, you can download: OSIM2_10M_MSLR_MEDDRA_6, which has a set of signals injected at RR=1.50 and with insidious onset (during exposure or 30d afterwards).

OSIM2 Datasets Injected Signals at Relative Risk Equals Risk Type Size
OSIM2_10M_MSLR_MEDDRA_0 None None 3.5GB
OSIM2_10M_MSLR_MEDDRA_3 1.25 Insidious 3.5GB
OSIM2_10M_MSLR_MEDDRA_6 1.5 Insidious 3.5GB
OSIM2_10M_MSLR_MEDDRA_9 2 Insidious 3.5GB
OSIM2_10M_MSLR_MEDDRA_12 4 Insidious 3.5GB
OSIM2_10M_MSLR_MEDDRA_15 10 Insidious 3.8GB
OSIM2_10M_MSLR_MEDDRA_2 1.25 Any Exposure 3.5GB
OSIM2_10M_MSLR_MEDDRA_5 1.5 Any Exposure 3.5GB
OSIM2_10M_MSLR_MEDDRA_8 2 Any Exposure 3.5GB
OSIM2_10M_MSLR_MEDDRA_11 4 Any Exposure 3.5GB
OSIM2_10M_MSLR_MEDDRA_14 10 Any Exposure 3.6GB
OSIM2_10M_MSLR_MEDDRA_1 1.25 Accumulative 3.5GB
OSIM2_10M_MSLR_MEDDRA_4 1.5 Accumulative 3.5GB
OSIM2_10M_MSLR_MEDDRA_7 2 Accumulative 3.5GB
OSIM2_10M_MSLR_MEDDRA_10 4 Accumulative 3.5GB
OSIM2_10M_MSLR_MEDDRA_13 10 Accumulative 3.7GB

Please note that these are very large files. We have tested the OSIM2 dataset downloads using FileZilla and WS-FTP. FileZilla is free open source client software that can be downloaded from: http://filezilla-project.org/download.php

To log in to the anonymous FTP server use the following credentials:

Login: anonymous
Password: blank
Our FTP server supports SFTP protocol (port 22)

On the server, there are two main folders:
● MedDRA: All data in this folder use MedDRA based condition concepts.
○ Transition Matrices. Currently there are transition matrices available for the following databases: GE, MDCD, MDCR, MSLR
○ OSIM2 dataset. All 16 OSIM2 datasets are available in individual directories. These folders contain simulated data in Common Data Model Version 2. OSIM2 is not available in CDM V3, only in V2 format.

● SNOMED: All data in this folder use SNOMED-CT based condition concepts.
○ Transition Matrices. Currently there are transition matrices available for the following databases: CCAE, MDCD, MDCR, MSLR
○ IN THE FUTURE: OSIM2 data will be available in SNOMED format.

Please contact OMOP to share with us your experience with OSIM2 datasets.


Source: http://omop.org/OSIM

Observational Medical Dataset Simulator Generation 1

The Observational Medical Dataset Simulator (OSIM) is an open-source software application, written in R, that allows users to create simulated datasets that conform to the OMOP Common Data Model.  The simulation creates hypothetical persons with fictitious drug exposure and condition occurrence, with known characteristics that represent the types of scenarios expected in real observational sources.  The procedure is being used to create simulated datasets to support OMOP's central methods development activities, as well as to facilitate the OMOP Cup methods competition.  OMOP hopes OSIM will provide the broader research community a valuable tool to support the implementation and evaluation of alternative approaches for observational analyses.  

Click on the document titles below to obtain the following:


Page statistics
1741 view(s) and 8 edit(s)
Social share
Share this page?


This page has no custom tags.
This page has no classifications.


You must to post a comment.