Table of contents
  1. Story
  2. Slides
  3. Spotfire Dashboard
  4. Research Notes
  5. NASA Partners with Amazon to Put Climate "Big Data" Online
  6. Amazon Public Data Sets Program
    1. Featured Public Data Sets
      1. Common Crawl Corpus
      2. 1000 Genomes Project
      3. Google Books Ngrams
    2. Other Public Data Sets
      1. Common Crawl Corpus
      2. NASA NEX
      3. Ensembl Annotated Human Genome Data (MySQL Release 73)
      4. Ensembl Annotated Human Genome Data (FASTA Release 73)
      5. Human Microbiome Project
      6. 1000 Genomes Project
      7. Model Organism Encyclopedia of DNA Elements (modENCODE)
      8. Japan Census Data
      9. Enron Email Data
      10. Denisova Genome
      11. Google Books Ngrams
      12. Sloan Digital Sky Survey DR6 Subset
      13. The Cannabis Sativa Genome
      14. Apache Software Foundation Public Mail Archives
      15. Freebase Simple Topic Dump
      16. Freebase Quad Dump
      17. Wikipedia Page Traffic Statistic V3
      18. Material Safety Data Sheets
      19. Million Song Dataset
      20. Million Song Sample Dataset
      21. Marvel Universe Social Graph
      22. The WestburyLab USENET corpus
      23. Wikipedia Traffic Statistics V2
      24. Human Liver Cohort (Sage Bionetworks)
      25. C57BL/6J by C3H/HeJ Mouse Cross (Sage Bionetworks)
      26. DBpedia 3.5.1
      27. Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Set
      28. M-Lab dataset: Network Diagnostic Tool (NDT)
      29. M-Lab dataset: Network Path and Application Diagnosis tool (NPAD)
      30. GenBank
      31. YRI Trio Dataset
      32. Petroleum Public Data Set (working Title)
      33. OpenStreetMap Rendering Database
      34. Ensembl - FASTA Database Files
      35. Wikipedia XML Data
      36. Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)
      37. Wikipedia Page Traffic Statistics
      38. Federal Reserve Economic Data - Fred
      39. Twilio/Wigle.net Street Vector Data Set
      40. 2008 TIGER/Line Shapefiles
      41. Transportation Databases
      42. Labor Statistics Databases
      43. 2000 US Census
      44. 2003-2006 US Economic Data
      45. Business and Industry Summary Data
      46. 1990 US Census
      47. 1980 US Census
      48. Federal Contracts from the Federal Procurement Data Center (USASpending.gov)
      49. University of Florida Sparse Matrix Collection
      50. Freebase Data Dump
      51. Wikipedia Extraction (WEX)
      52. 3D Version of the PubChem Library
      53. PubChem Library
      54. Unigene
      55. AnthroKids - Anthropometric Data of Children
      56. Influenza Virus (including updated Swine Flu sequences)
  7. Overview of NASA NEX Public Data Sets on AWS
    1. Accessing NASA NEX Data
    2. Available NASA NEX Data Sets
    3. Learn More
    4. Education Grants Program
  8. NASA Ames Research Center
    1. Open Government at NASA
      1. Open Government Plan Version 3 Released
      2. International Space Apps Challenge 2014
      3. Open Government 2014 Plan Update
      4. Latest Posts from the open.NASA blog
        1. Open Government Version 3 Plan Released
        2. International Space Apps 2014 Global Winners
        3. Citizen Engagement
        4. Share Your Thoughts: Open Gov 3.0 Plan
        5. Space Apps: Inspiration Beyond the Stars
        6. International Space Apps Challenge: where in the WORLD will YOU be this weekend?
        7. Innovating Together
        8. A 48-Hour Career: Space Apps By Numbers
      5. About Open Government
        1. Highlights of NASA's Open Government Activities
        2. Unique to NASA
  9. Big Data. Big Future
  10. Earth Observatory
    1. Atmosphere: All Images
      1. Images: Climate Change in the United States
  11. Earth Right Now
    1. Overview: A Big Year for NASA Earth Science
  12. NEXT

NASA Data Publications

Last modified
Table of contents
  1. Story
  2. Slides
  3. Spotfire Dashboard
  4. Research Notes
  5. NASA Partners with Amazon to Put Climate "Big Data" Online
  6. Amazon Public Data Sets Program
    1. Featured Public Data Sets
      1. Common Crawl Corpus
      2. 1000 Genomes Project
      3. Google Books Ngrams
    2. Other Public Data Sets
      1. Common Crawl Corpus
      2. NASA NEX
      3. Ensembl Annotated Human Genome Data (MySQL Release 73)
      4. Ensembl Annotated Human Genome Data (FASTA Release 73)
      5. Human Microbiome Project
      6. 1000 Genomes Project
      7. Model Organism Encyclopedia of DNA Elements (modENCODE)
      8. Japan Census Data
      9. Enron Email Data
      10. Denisova Genome
      11. Google Books Ngrams
      12. Sloan Digital Sky Survey DR6 Subset
      13. The Cannabis Sativa Genome
      14. Apache Software Foundation Public Mail Archives
      15. Freebase Simple Topic Dump
      16. Freebase Quad Dump
      17. Wikipedia Page Traffic Statistic V3
      18. Material Safety Data Sheets
      19. Million Song Dataset
      20. Million Song Sample Dataset
      21. Marvel Universe Social Graph
      22. The WestburyLab USENET corpus
      23. Wikipedia Traffic Statistics V2
      24. Human Liver Cohort (Sage Bionetworks)
      25. C57BL/6J by C3H/HeJ Mouse Cross (Sage Bionetworks)
      26. DBpedia 3.5.1
      27. Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Set
      28. M-Lab dataset: Network Diagnostic Tool (NDT)
      29. M-Lab dataset: Network Path and Application Diagnosis tool (NPAD)
      30. GenBank
      31. YRI Trio Dataset
      32. Petroleum Public Data Set (working Title)
      33. OpenStreetMap Rendering Database
      34. Ensembl - FASTA Database Files
      35. Wikipedia XML Data
      36. Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)
      37. Wikipedia Page Traffic Statistics
      38. Federal Reserve Economic Data - Fred
      39. Twilio/Wigle.net Street Vector Data Set
      40. 2008 TIGER/Line Shapefiles
      41. Transportation Databases
      42. Labor Statistics Databases
      43. 2000 US Census
      44. 2003-2006 US Economic Data
      45. Business and Industry Summary Data
      46. 1990 US Census
      47. 1980 US Census
      48. Federal Contracts from the Federal Procurement Data Center (USASpending.gov)
      49. University of Florida Sparse Matrix Collection
      50. Freebase Data Dump
      51. Wikipedia Extraction (WEX)
      52. 3D Version of the PubChem Library
      53. PubChem Library
      54. Unigene
      55. AnthroKids - Anthropometric Data of Children
      56. Influenza Virus (including updated Swine Flu sequences)
  7. Overview of NASA NEX Public Data Sets on AWS
    1. Accessing NASA NEX Data
    2. Available NASA NEX Data Sets
    3. Learn More
    4. Education Grants Program
  8. NASA Ames Research Center
    1. Open Government at NASA
      1. Open Government Plan Version 3 Released
      2. International Space Apps Challenge 2014
      3. Open Government 2014 Plan Update
      4. Latest Posts from the open.NASA blog
        1. Open Government Version 3 Plan Released
        2. International Space Apps 2014 Global Winners
        3. Citizen Engagement
        4. Share Your Thoughts: Open Gov 3.0 Plan
        5. Space Apps: Inspiration Beyond the Stars
        6. International Space Apps Challenge: where in the WORLD will YOU be this weekend?
        7. Innovating Together
        8. A 48-Hour Career: Space Apps By Numbers
      5. About Open Government
        1. Highlights of NASA's Open Government Activities
        2. Unique to NASA
  9. Big Data. Big Future
  10. Earth Observatory
    1. Atmosphere: All Images
      1. Images: Climate Change in the United States
  11. Earth Right Now
    1. Overview: A Big Year for NASA Earth Science
  12. NEXT

  1. Story
  2. Slides
  3. Spotfire Dashboard
  4. Research Notes
  5. NASA Partners with Amazon to Put Climate "Big Data" Online
  6. Amazon Public Data Sets Program
    1. Featured Public Data Sets
      1. Common Crawl Corpus
      2. 1000 Genomes Project
      3. Google Books Ngrams
    2. Other Public Data Sets
      1. Common Crawl Corpus
      2. NASA NEX
      3. Ensembl Annotated Human Genome Data (MySQL Release 73)
      4. Ensembl Annotated Human Genome Data (FASTA Release 73)
      5. Human Microbiome Project
      6. 1000 Genomes Project
      7. Model Organism Encyclopedia of DNA Elements (modENCODE)
      8. Japan Census Data
      9. Enron Email Data
      10. Denisova Genome
      11. Google Books Ngrams
      12. Sloan Digital Sky Survey DR6 Subset
      13. The Cannabis Sativa Genome
      14. Apache Software Foundation Public Mail Archives
      15. Freebase Simple Topic Dump
      16. Freebase Quad Dump
      17. Wikipedia Page Traffic Statistic V3
      18. Material Safety Data Sheets
      19. Million Song Dataset
      20. Million Song Sample Dataset
      21. Marvel Universe Social Graph
      22. The WestburyLab USENET corpus
      23. Wikipedia Traffic Statistics V2
      24. Human Liver Cohort (Sage Bionetworks)
      25. C57BL/6J by C3H/HeJ Mouse Cross (Sage Bionetworks)
      26. DBpedia 3.5.1
      27. Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Set
      28. M-Lab dataset: Network Diagnostic Tool (NDT)
      29. M-Lab dataset: Network Path and Application Diagnosis tool (NPAD)
      30. GenBank
      31. YRI Trio Dataset
      32. Petroleum Public Data Set (working Title)
      33. OpenStreetMap Rendering Database
      34. Ensembl - FASTA Database Files
      35. Wikipedia XML Data
      36. Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)
      37. Wikipedia Page Traffic Statistics
      38. Federal Reserve Economic Data - Fred
      39. Twilio/Wigle.net Street Vector Data Set
      40. 2008 TIGER/Line Shapefiles
      41. Transportation Databases
      42. Labor Statistics Databases
      43. 2000 US Census
      44. 2003-2006 US Economic Data
      45. Business and Industry Summary Data
      46. 1990 US Census
      47. 1980 US Census
      48. Federal Contracts from the Federal Procurement Data Center (USASpending.gov)
      49. University of Florida Sparse Matrix Collection
      50. Freebase Data Dump
      51. Wikipedia Extraction (WEX)
      52. 3D Version of the PubChem Library
      53. PubChem Library
      54. Unigene
      55. AnthroKids - Anthropometric Data of Children
      56. Influenza Virus (including updated Swine Flu sequences)
  7. Overview of NASA NEX Public Data Sets on AWS
    1. Accessing NASA NEX Data
    2. Available NASA NEX Data Sets
    3. Learn More
    4. Education Grants Program
  8. NASA Ames Research Center
    1. Open Government at NASA
      1. Open Government Plan Version 3 Released
      2. International Space Apps Challenge 2014
      3. Open Government 2014 Plan Update
      4. Latest Posts from the open.NASA blog
        1. Open Government Version 3 Plan Released
        2. International Space Apps 2014 Global Winners
        3. Citizen Engagement
        4. Share Your Thoughts: Open Gov 3.0 Plan
        5. Space Apps: Inspiration Beyond the Stars
        6. International Space Apps Challenge: where in the WORLD will YOU be this weekend?
        7. Innovating Together
        8. A 48-Hour Career: Space Apps By Numbers
      5. About Open Government
        1. Highlights of NASA's Open Government Activities
        2. Unique to NASA
  9. Big Data. Big Future
  10. Earth Observatory
    1. Atmosphere: All Images
      1. Images: Climate Change in the United States
  11. Earth Right Now
    1. Overview: A Big Year for NASA Earth Science
  12. NEXT

Story

A Big Year for NASA Earth Science Big Data

Since the agency's inception in 1958, NASA has established itself a world leader in Earth science and climate studies. That will never be more apparent than in the next 12 months, when five NASA Earth-observing missions will be launched – more than NASA has conducted in a single year in over a decade​.Source: http://www.nasa.gov/content/overview...earth-science/

My Comment: NASA is back down to earth and this compliments what NOAA does for the United States.

The have been copied to a spreadsheet and formatted as linked data tables. The linked data tables are in both relational and graph formats to capture the relationships and visualize them. There is considerable data science and art involved in building the knowledge base, spreadsheets, and interactive dashboard.

There are three parts to this activity as follows:

  • A Scraper Wiki (MindTouch) for Web Pages to produce a detailed Wiki Table of Contents and multiple Spreadsheet Tables for Spotfire analytics; and
  • A Visualization Tool (Spotfire) so very large relational databases (INSERT SLECTED DATA SET HERE) can be used all in memory for Spotfire analytics.
  • A Meetup to mentor and train data scientists and others in creating a series of Data Publications in Data Browsers for NASA  (in process)

The Slides below are screen captures to show the methodology and results.

MORE TO FOLLOW

Slides

Spotfire Dashboard

Research Notes

I am working with Brand Niemann on an approach for data publications for NASA earth science that builds on many strands (analogous work in other subject areas; NSF Earth Cube; etc.)  This is in response to the new federal policy for data openness (OSTP Increasing Access to the Results of Federal Funded Scientific Research; OMB .Open Data Policy - Managing Information as an Asset). Brand and I could put together a piece to see if it might be of interest to NASA, working through USRA. One key element needed is a compelling example of incorporating a data publication with a traditional research publication.  Given the traditional publication, Brand and I could work up the data focus. Can you suggest a publication from NASA?  perhaps related to NASA Ames putting Climate Big Data in the Cloud? NASA Partners with Amazon to Put Climate "Big Data" Online November 22, 2013 http://www.ccst.us/news/2013/1122cloud.php

How about linking to the raw data from stories that appear in The Earth Observatory? Or Earth Right Now? See: http://www.nasa.gov/content/earth-right-now/#.U43uNyiorq4 and http://earthobservatory.nasa.gov/

The other thought would be to add links to raw data in EOS from AGU or some of the other professional journals (JGR, etc ...)

Regarding the NASA Ames story and NEX, Forrest Melton is well connected and part of that group. I think you might keep track of ROSES announcements at NASA to identify a funding opportunity that could support this kind of work: http://science.nasa.gov/researchers/sara/grant-solicitations/

NASA Partners with Amazon to Put Climate "Big Data" Online

Source: http://www.ccst.us/news/2013/1122cloud.php

 

NASA Partners with Amazon to Put Climate "Big Data" Online

November 22, 2013
Datasets from the NASA Terra satellite, the flagship of NASA's Earth Observation System, are part of the collection uploaded to Amazon Web Services in an experimental partnership between NASA and Amazon. Image courtesy of NASA. 
 

A large collection of NASA climate and Earth science satellite data are being made available through the Amazon Web Services (AWS) cloud in a novel partnership designed to enhance research and educational opportunities nationwide.

To date, several terabytes of data from three satellite and computer modeling datasets have been uploaded to the AWS platform: a set of high-resolution climate change projections from the NASA Earth Exchange (NEX) research program; a global view of Earth's surface provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on NASA's Terra and Aqua satellites; and the Landsat data record from the U.S. Geological Survey, which provides the longest existing space-based record of Earth's land. Ultimately, the service will include a wider variety of NASA satellite data sets, including metrics such as temperature, precipitation, and forest cover.

The service also includes data processing tools from the NEX program, based at the NASA Ames Research Center. (NASA Ames is a CCST Affiliate Institution.) NEX is a collaboration and analytical platform that combines state-of-the-art supercomputing, Earth system modeling, workflow management and NASA remote-sensing data. Through NEX, users can explore and analyze large Earth science data sets, run and share modeling algorithms, collaborate on new or existing projects and exchange workflows and results within and among other science communities.

"We are excited to grow an ecosystem of researchers and developers who can help us solve important environmental research problems," said Rama Nemani, principal scientist for the NEX project at Ames, in a statement from NASA on the launch of the partnership. "Our goal is that people can easily gain access to and use a multitude of data analysis services quickly through AWS to add knowledge and open source tools for others' benefit."

 
Related links:
*My Note: Look at content at these links - see below
 

As CCST has previously noted*, increasing access to and use of "big data" - extremely large data sets - has rapidly become a key basis of competition, underpinning new waves of productivity, growth and innovation. NASA has always supported and provided open public access to research data, but the collaboration with AWS makes these data more accessible to a much wider audience, allowing the ability to access and process information without the need to directly download unwieldy amounts of data.

Increased access to, and the ability to analyze, data sets such as this are particularly important for states such as California, which are working to plan long-term strategies for more effective water system management - a planning process which depends greatly upon the most accurate understanding possible of climate trends and impacts on the state's water resources.


An example of data similar to that available from the website. The animation shows the results of analyzing satellite radar data collected between mid-2007 and 2011 - about 3.5 years - in the southern San Joaquin Valley. Basically, the analysis technique subtracts 1) the time it takes a radar pulse to leave a satellite, reflect off the ground, and return to the satellite on, for example, July 1, 2007, from 2) a subsequently measured time of travel collected, for example, on December 30, 2111. If it takes longer for the radar signal to travel back to the satellite on the second pass (Dec. 2011), that means the distance between the satellite and the ground has increased. The increased distance is mostly due to land subsidence. The orbits of the satellite are known so precisely that the subsidence can be measured at sub centimeter levels, over great areas on the ground, at about 90 meters square pixel size. The technique is called InSAR (Interferometric Synthetic Aperture Radar). Video courtesy of JPL/NASA.



CCST Spotlight is a weekly newsletter focusing on CCST activities and highlighting innovative science and technology research, applications, and policy issues in California. The Spotlight editor is Danny DeCillis. We welcome information and feedback from our readers about science and technology at work in the private, public, and education sectors. To send us questions or comments, contact us at ccst@ccst.us, or (951) 682-8701.

 

Amazon Public Data Sets Program

Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum.

My Note: I have worked with a few of the Other Public Data Sets like OpenStreetMap Rendering Database, Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD), 2008 TIGER/Line Shapefiles, Transportation Databases, Labor Statistics Databases, 1980 US Census, and Federal Contracts from the Federal Procurement Data Center (USASpending.gov), 

Featured Public Data Sets

Common Crawl Corpus

A corpus of web crawl data composed of over 5 billion web pages. This data set is freely available on Amazon S3 and is released under the Common Crawl Terms of Use.

1000 Genomes Project

The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available.

Google Books Ngrams

A data set containing Google Books n-gram corpuses. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from http://books.google.com/ngrams/.

Other Public Data Sets

Showing 1-56 of 56 results.
Sort by: 

Common Crawl Corpus

A corpus of web crawl data composed of over 5 billion web pages. This data set is freely available on Amazon S3 and is released under the Common Crawl Terms of Use.
Last Modified: Mar 17, 2014 17:51 PM GMT

NASA NEX

Three NASA NEX datasets are now available, including climate projections and satellite images of Earth.
Last Modified: Nov 12, 2013 13:27 PM GMT

Ensembl Annotated Human Genome Data (MySQL Release 73)

The Ensembl project produces genome databases for human as well as over 50 other species, and makes this information freely available.
Last Modified: Oct 8, 2013 14:38 PM GMT

Ensembl Annotated Human Genome Data (FASTA Release 73)

The Ensembl project produces genome databases for human as well as over 50 other species, and makes this information freely available.
Last Modified: Oct 8, 2013 14:37 PM GMT

Human Microbiome Project

Human Microbiome Project Data Set
Last Modified: Sep 26, 2013 17:58 PM GMT

1000 Genomes Project

The 1000 Genomes Project, initiated in 2008, is an international public-private consortium that aims to build the most detailed map of human genetic variation available.
Last Modified: Jul 18, 2012 16:34 PM GMT

Model Organism Encyclopedia of DNA Elements (modENCODE)

A collection of data from the modENCODE project ( http://www.modencode.org )
Last Modified: Apr 24, 2012 21:18 PM GMT

Japan Census Data

Multiple data sets including: (1) Population Census of Japan (1995, 2000, 2005, 2010), (2) Establishment and Enterprise Census of Japan (1999, 2001, 2004, 2006), and (3) Economic Census of Japan (2009).
Last Modified: Mar 4, 2012 3:22 AM GMT

Enron Email Data

Enron email data publicly released as part of FERC's Western Energy Markets investigation converted to industry standard formats by EDRM. The data set consists of 1,227,255 emails with 493,384 attachments covering 151 custodians. The email is provided in Microsoft PST, IETF MIME, and EDRM XML formats.
Last Modified: Feb 15, 2012 2:26 AM GMT

Denisova Genome

The high-coverage genome sequence of a Denisovan individual sequenced to ~30x coverage on the Illumina platform. Together with their sister group the Neandertals, Denisovans are the most closely related extinct relatives of currently living humans.
Last Modified: Feb 15, 2012 2:22 AM GMT

Google Books Ngrams

A data set containing Google Books n-gram corpuses. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from http://books.google.com/ngrams/.
Last Modified: Jan 21, 2012 2:12 AM GMT

Sloan Digital Sky Survey DR6 Subset

The Sloan Digital Sky Survey is the most ambitious astronomical survey ever undertaken.
Last Modified: Jan 20, 2012 21:49 PM GMT

The Cannabis Sativa Genome

Whole Genome Shotgun Sequencing of the Cannabis Sativa Cultivar "Chemdawg"
Last Modified: Aug 22, 2011 22:33 PM GMT

Apache Software Foundation Public Mail Archives

A collection of all publicly available Apache Software Foundation mail archives as of July 11, 2011
Last Modified: Aug 15, 2011 22:00 PM GMT

Freebase Simple Topic Dump

A data dump of the basic identifying facts about every topic in Freebase
Last Modified: Jun 24, 2011 18:08 PM GMT

Freebase Quad Dump

A data dump of all the current facts and assertions in Freebase
Last Modified: Jun 24, 2011 18:04 PM GMT

Wikipedia Page Traffic Statistic V3

This dataset contains a 150 GB sample of the data used to power trendingtopics.org. It includes a full 3 months of hourly page traffic statistics from Wikipedia (1/1/2011-3/31/2011).
Last Modified: Apr 28, 2011 0:00 AM GMT

Material Safety Data Sheets

230,000 Material Safety Data Sheets.
Last Modified: Apr 1, 2011 0:00 AM GMT

Million Song Dataset

The Million Songs Collection is a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.
Last Modified: Feb 8, 2011 0:00 AM GMT

Million Song Sample Dataset

This is a 10,000 song subset of audio features and metadata from the Million Songs collection - a collection of 28 datasets containing audio features and metadata for a million contemporary popular music tracks.
Last Modified: Feb 8, 2011 0:00 AM GMT

Marvel Universe Social Graph

This dataset is an example of a social collaboration network based on the characters in The Marvel Universe, that is, the artificial world that takes place in the universe of the Marvel comic books.
Last Modified: Feb 3, 2011 0:00 AM GMT

The WestburyLab USENET corpus

The WestburyLab USENET corpus is an anonymized compilation of postings from 47,860 English-language newsgroups from 2005-2010.
Last Modified: Nov 17, 2010 1:17 AM GMT

Wikipedia Traffic Statistics V2

Contains 16 months of hourly pageview statistics for all articles in Wikipedia
Last Modified: Sep 8, 2010 21:47 PM GMT

Human Liver Cohort (Sage Bionetworks)

Human Liver Cohort characterizing gene expression in liver samples
Last Modified: Sep 8, 2010 20:56 PM GMT

C57BL/6J by C3H/HeJ Mouse Cross (Sage Bionetworks)

C57BL/6J by C3H/HeJ mouse cross from the Jake Lusis lab at UCLA
Last Modified: Sep 8, 2010 20:53 PM GMT

DBpedia 3.5.1

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web
Last Modified: Aug 10, 2010 15:23 PM GMT

Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Set

Jay Flatley (CEO of Illumina) human genome data set.
Last Modified: Jan 20, 2010 21:54 PM GMT

M-Lab dataset: Network Diagnostic Tool (NDT)

NDT test results created through Measurement Lab (M-Lab) between February 2009 and September 2009
Last Modified: Dec 10, 2009 2:00 AM GMT

M-Lab dataset: Network Path and Application Diagnosis tool (NPAD)

NPAD test results created through Measurement Lab (M-Lab) between February 2009 and September 2009
Last Modified: Dec 10, 2009 2:00 AM GMT

GenBank

An annotated collection of all publicly available DNA sequences including more than 85.7B bases and 82.8M sequence records.
Last Modified: Dec 9, 2009 2:49 AM GMT

YRI Trio Dataset

Complete genome sequence data for three Yoruba individuals from Ibadan, Nigeria
Last Modified: Oct 19, 2009 16:57 PM GMT

Petroleum Public Data Set (working Title)

Public-domain data for the oil & gas industry, assembled from the contributions of participating agencies in the United States, Canada and around the world. This data provides industry stakeholders with an opportunity to focus their efforts on the analysis and interpretation of this data without concern for the trivial and time-consuming tasks of locating, downloading, reformatting and integrating the data prior to value-added work being performed.
Last Modified: Oct 19, 2009 15:32 PM GMT

OpenStreetMap Rendering Database

A PostGIS 8.3 data cluster of all OpenStreetMap data for the planet.
Last Modified: Oct 9, 2009 21:34 PM GMT

Ensembl - FASTA Database Files

Ensembl sequence databases of transcript and translation models
Last Modified: Oct 1, 2009 22:34 PM GMT

Wikipedia XML Data

A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML.
Last Modified: Sep 29, 2009 1:09 AM GMT

Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD)

A collection of daily weather measurements (temperature, wind speed, humidity, pressure, &c.) from 9000+ weather stations around the world.
Last Modified: Sep 29, 2009 0:48 AM GMT

Wikipedia Page Traffic Statistics

Contains 7 months of hourly pageview statistics for all articles in Wikipedia
Last Modified: Jun 10, 2009 3:29 AM GMT

Federal Reserve Economic Data - Fred

Database of 20,059 U.S. economic time series.
Last Modified: Jun 5, 2009 22:50 PM GMT

Twilio/Wigle.net Street Vector Data Set

Twilio/Wigle.net database of mapped US street names and address ranges.
Last Modified: Jun 4, 2009 20:26 PM GMT

2008 TIGER/Line Shapefiles

Census 2000 and Current United States shapefiles
Last Modified: Jun 4, 2009 20:26 PM GMT

Transportation Databases

Various transportation statistics
Last Modified: Jun 4, 2009 20:26 PM GMT

Labor Statistics Databases

Various Labor Statistics
Last Modified: Jun 4, 2009 20:25 PM GMT

2000 US Census

Data from the 2000 US Census
Last Modified: Jun 4, 2009 20:25 PM GMT

2003-2006 US Economic Data

US Economic Data for years 2003 to 2006
Last Modified: Jun 4, 2009 20:25 PM GMT

Business and Industry Summary Data

US Business and Industry Summary Data
Last Modified: Jun 4, 2009 20:24 PM GMT

1990 US Census

Data from the 1990 US Census
Last Modified: Jun 4, 2009 20:24 PM GMT

1980 US Census

Data from the 1980 US Census
Last Modified: Jun 4, 2009 20:24 PM GMT

Federal Contracts from the Federal Procurement Data Center (USASpending.gov)

A data dump of all federal contracts from the Federal Procurement Data Center found at USASpending.gov.
Last Modified: Jun 4, 2009 20:23 PM GMT

University of Florida Sparse Matrix Collection

The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications.
Last Modified: Jun 4, 2009 20:23 PM GMT

Freebase Data Dump

Freebase is an open database of the world's information, covering millions of topics in hundreds of categories
Last Modified: Jun 4, 2009 20:22 PM GMT

Wikipedia Extraction (WEX)

A processed dump of the English language Wikipedia
Last Modified: Jun 4, 2009 20:21 PM GMT

3D Version of the PubChem Library

3D Version of the PubChem Library
Last Modified: Jun 4, 2009 20:21 PM GMT

PubChem Library

A data set of information on the biological activities of small molecules.
Last Modified: Jun 4, 2009 20:21 PM GMT

Unigene

UniGene: An Organized View of the Transcriptome.
Last Modified: Jun 4, 2009 20:19 PM GMT

AnthroKids - Anthropometric Data of Children

Anthropometric data on children from two studies in 1975 and 1977
Last Modified: Jun 4, 2009 20:19 PM GMT

Influenza Virus (including updated Swine Flu sequences)

NCBI Influenza Resource Center Data.
Last Modified: Jun 4, 2009 20:18 PM GMT

Overview of NASA NEX Public Data Sets on AWS

NASA NEX is a collaboration and analytical platform that combines state-of-the-art supercomputing, Earth system modeling, workflow management and NASA remote-sensing data. Through NEX, users can explore and analyze large Earth science data sets, run and share modeling algorithms, collaborate on new or existing projects and exchange workflows and results within and among other science communities.

Three NASA NEX data sets are now available to all via Amazon S3. One data set, the NEX downscaled climate simulations, provides high-resolution climate change projections for the 48 contiguous U.S. states. The second data set, provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on NASA's Terra and Aqua satellites, offers a global view of Earth's surface every 1 to 2 days. Finally, the Landsat data record from the U.S. Geological Survey provides the longest existing continuous space-based record of Earth's land.

Accessing NASA NEX Data

AWS is making the NASA NEX data available to the community free of charge. There are a variety of ways to access the data:

The data is hosted for free by AWS as part of the AWS Public Data Sets program.

Available NASA NEX Data Sets

Downscaled Climate Projections (NEX-DCP30)

The NASA Earth Exchange (NEX) Downscaled Climate Projections (NEX-DCP30) dataset is comprised of downscaled climate scenarios for the conterminous United States that are derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al. 2012] and across the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs) [Meinshausen et al. 2011] developed for the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5). The dataset includes downscaled projections from 33 models, as well as ensemble statistics calculated for each RCP from all model runs available. The purpose of these datasets is to provide a set of high resolution, bias-corrected climate change projections that can be used to evaluate climate change impacts on processes that are sensitive to finer-scale climate gradients and the effects of local topography on climate conditions.

Each of the climate projections includes monthly averaged maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2005 (Retrospective Run) and from 2006 to 2099 (Prospective Run).

Available at s3://nasanex/NEX-DCP30

MOD13Q1 (Vegetation Indices 16-Day L3 Global 250m)

Global MODIS vegetation indices are designed to provide consistent spatial and temporal comparisons of vegetation conditions. Blue, red, and near-infrared reflectances, centered at 469-nanometers, 645-nanometers, and 858-nanometers, respectively, are used to determine the MODIS daily vegetation indices.

The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA's Advanced Very High Resolution Radiometer (AVHRR) NDVI products and provides continuity for time series historical applications. MODIS also includes a new Enhanced Vegetation Index (EVI) that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke and sub-pixel thin cloud clouds. The MODIS NDVI and EVI products are computed from atmospherically corrected bi-directional surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows.

Global MOD13Q1 data are provided every 16 days at 250-meter spatial resolution as a gridded level-3 product in the Sinusoidal projection. Lacking a 250m blue band, the EVI algorithm uses the 500m blue band to correct for residual atmospheric effects, with negligible spatial artifacts.

Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and land cover changes. These data may be used as input for modeling global biogeochemical and hydrologic processes and global and regional climate. These data also may be used for characterizing land surface biophysical properties and processes, including primary production and land cover conversion.

Available at s3://nasanex/MODIS

Landsat GLS (Global Land Survey)

In the past, the U.S. Geological Survey (USGS) and NASA collaborated on the creation of four global land data sets from Landsat images: one from the 1970s, and one each from circa 1990, 2000, and 2005. Each of these global data sets was created from the primary Landsat sensor in use at the time: the Multispectral Scanner (MSS) in the 1970s, the Thematic Mapper (TM) in 1990, Enhanced Thematic Mapper Plus (ETM+) in 2000, and a combination of TM and ETM+ in 2005.

Available at s3://nasanex/Landsat

Learn More

  • NASA NEX My Note: Resolves to https://nex.nasa.gov/nex/static/htdo...extra/opennex/
    • OpenNEX is designed to engage the global community of Earth scientists in cross-disciplinary research by combining global Earth observation datasets, shared scientific tools and workflows, and the power of cloud computing to enhance scientific collaboration and accelerate progress towards understanding emerging changes in the Earth system. OpenNEX is developing resources for scientists seeking to enhance their skills on a variety of research topics. OpenNEX learning resources include online lectures from the world's leading scientific experts and hands-on data analysis and modeling exercises enabled through virtual machines and shared workflows.
  • AWS Public Data Sets My Note: I have this above

Education Grants Program

Educators, researchers and students can apply for free credits to take advantage of the utility computing platform offered by AWS, along with Public Datasets such as the NASA NEX data. If you have a research project which could take advantage of the hosted NASA NEX data set, you can apply for an AWS Grant.

NASA Ames Research Center

Ames Research Center, one of 10 NASA field centers, is located in the heart of California's Silicon Valley. For more than 70 years, Ames has been a leader in conducting world-class research and development.

Open Government at NASA

Source: http://www.nasa.gov/open/index.html

Open Government Plan Version 3 Released

The Open Innovation team is pleased to announce the publication of the NASA Open Government Plan Version 3.  Please view or download the plan athttp://open.nasa.gov/plan

International Space Apps Challenge 2014


On April 12-13 2014, at 94 physical locations in 46 countries, with 8,126 participants registered to attend these events (not including student participation) and 735 virtual participants, 630 projects were created over the weekend and 69 of them by teams that were completely virtual. Now it's time to recognize the incredible work they have created. Visit spaceappschallenge.org to see the projects, and check back later this month to find out which projects our judges have chosen as global winners.  Find out more

Open Government 2014 Plan Update

The Open Innovation team is currently working on the Open Government Version 3.0 Plan. Whether NASA is using social networks to allow students to interact directly with astronauts or to give unprecedented access to scientific data, we have embraced Open Government. But there is still more to do: NASA is expanding transparency, participation, and collaboration and creating a new level of openness and accountability. The release of the Open Government Version 3.0 Plan will be in June 2014.  View the Open Government Version 3.0 Outline

Latest Posts from the open.NASA blog

Open Government Version 3 Plan Released
June 2, 2014
Over the past five years at NASA, I have been blessed to find myself surrounded by a family of dreamers, darers, and doers...
International Space Apps 2014 Global Winners
May 15, 2014
We’re thrilled to announce this year’s winners of the 2014 International Space Apps Challenge...
Citizen Engagement
May 7, 2014
Recently, the White House hosted Stakeholder Engagement Workshops – an informal meet-up for citizens and federal agencies to discuss progress on Open Government...
Share Your Thoughts: Open Gov 3.0 Plan

April 23, 2014
Four years ago, NASA released the first gen version of our Open Government Plan. We’re working on our third revision: Open Gov 3.0, which updates our progress from the last plan, and offers three new Flagship Initiatives...

Space Apps: Inspiration Beyond the Stars

April 10, 2014
This year, we kicked off our activities in Doha, Qatar at midnight on Thursday night Qatar time and closed out in Seattle, Washington at 6 p.m. Pacific time — 76 hours of uninterrupted around-the-world hacking on NASA challenges...

International Space Apps Challenge: where in the WORLD will YOU be this weekend?

April 10, 2014
Greetings from Toronto! We’re on the eve of Space Apps 2014 in 95+ locations around the world. This is our third year, and my first. After a six-month sprint to get Space Apps off the ground with a new team ...

Innovating Together

April 10, 2014
Welcome back to open.nasa.gov! We have a new team to tackle new challenges and exciting opportunities to allow you to engage with NASA’s data, tools, and resources ...

A 48-Hour Career: Space Apps By Numbers

August 6, 2013
When the International Space Apps Challenge wrapped up on the evening of April 21th 2013 in Maui, it had become the largest ...

About Open Government

NASA’s founding legislation in 1958 instructed NASA to “…provide for the widest practicable and appropriate dissemination of information…” The principles of Open Government have been embedded in our operations for 50 plus years. We recognize that open government is a process rather than a product, and have taken a continuous-learning approach. Please browse and read Version 2.0 of the NASA Open Government Plan. We invite you to contribute your ideas, solutions, and comments on every element of our plan.

Highlights of NASA's Open Government Activities
Unique to NASA

Big Data. Big Future

Source: http://www.ccst.us/annualreport/2011...2011-12AR1.php

My Note: Interesting figures on sizes of big data and California big data companies.

Letter from the Chairs

An important key to California's future - economic, technological, and social - is information. Research indicates that analyzing large data sets is rapidly becoming a key basis of competition, underpinning new waves of productivity, growth and innovation. This genuinely is the beginning of a new information age.

We have been hearing about the 'information age' for so long that the phrase has become trite. Nonetheless we are reaching a point where the knowledge available to us has exceeded our ability to easily grasp it. According to a recent report by Cisco, by 2015, global internet traffic may reach 966 exabytes (1018) per year. The Pentagon is working to expand its worldwide communication network to go beyond these limits, handling yottabytes (1024) of data, each of which is the equivalent of 500 quintillion (1018) pages of text.

The rise in information available coincides with the increasing ability to gather information inexpensively in a wide range of new settings. Networks of inexpensive sensors gather vastly expanded data collection on geochemical characteristics of land areas under environmental scrutiny. The costs of sequencing genomes have dropped from millions of dollars to hundreds, significantly expanding personalized diagnoses and treatments.

California is at the forefront of the big data revolution in a number of ways. It is home to many of the companies pioneering the acquisition of information (e.g., Google and 23andMe) and the integration of large data sets into practice, not to mention the Blue Gene Q supercomputer at Lawrence Livermore National Laboratory, - currently the fastest in the world. California is also home to The Global Information Industry Center at UC San Diego, a nationally renowned interdisciplinary center that seeks to identify and describe the underlying issues and consequences of technology- enabled change in information and communications practices in government and industry.

CCST has long advocated an approach to policy based on the best and most complete scientific knowledge available; being able to access and use substantially more data in its decision-making processes would, in principle, allow the state to adopt more efficient and effective approaches to infrastructure and environmental issues. In some cases, there is the possibility of solving highly complex technical problems, such as environmental management, in a more systematic way. Even more promising, though, is the notion that the wealth of data being gathered - still a largely untapped resource - stands to benefit most those research institutions and communities which are able to collaborate in new and potentially unprecedented ways.

The advent of big data poses challenges, as well. Concerns about privacy and security are real and significant. In addition, amassing overwhelming quantities of data without effective systems for storage and analysis may hinder, rather than enhance, productive discourse. Solutions to these issues will require state, national, and international coordination, but California can be an important trendsetter. Indeed, if there is any state poised to benefit from integrating and analyzing unprecedented amounts of information, it is California. CCST's role, as an unbiased facilitator for bringing together all sectors of the S&T community to advise the state and develop long-term visions for California, has never been more important.

Earth Observatory

Source: http://earthobservatory.nasa.gov/

Atmosphere: All Images

Source: http://earthobservatory.nasa.gov/Ima...y.php?cat_id=1

My Note: See example of one of 26 below that needs a link to the data source.

NASAEarthObservatoryAtmosphereAllImages.png

 

Images: Climate Change in the United States

Source: http://earthobservatory.nasa.gov/IOT...w.php?id=83624

Climate Changes in the United States
Color bar for Climate Changes in the United States
acquired 1991 - 2012 download large image (1 MB, JPEG, 3219x1776)
Climate Changes in the United States
Color bar for Climate Changes in the United States
acquired 1991 - 2012 download large image (2 MB, JPEG, 3178x1897)

“Climate change is already affecting the American people in far-reaching ways.” So begins an extensive reportissued by the U.S. Global Change Research Program on May 6, 2014. The Global Change Research Act of 1990 requires that Congress and the President should be presented every four years with an assessment of the effects of climate change on the United States, so a team of more than 300 experts assembled the report from peer-reviewed science and observations. It contains 12 key findings, the first of which notes that climate is changing globally and that change is apparent in the United States.

Among the changes is an increase in temperature, as illustrated in the above image. Since consistent record-keeping began in 1895, the average temperature in the United States has increased by 1.3 to 1.9 degrees Fahrenheit (0.8 to 1.1° Celsius), and most of that change has happened since 1970. The warmest year on record for the United States was 2012. The map above shows temperature changes between 1991 and 2012 compared to the average temperature between 1901 and 1960. Bold lines divide the country into regions, and the change is uneven across the regions. “Multiple lines of independent evidence confirm that human activities are the primary cause of the global warming of the past 50 years,” says the report.

Scientists also observed changes in precipitation. On average, precipitation has increased since 1900, but some areas have gotten drier. This map shows how much precipitation changed between 1991 and 2012 compared to the average precipitation observed between 1901 and 1960. While there is an increase, it is harder to attribute the change to human activity, since rain and snowfall totals vary widely from year to year. For example, significant droughts in the 1930s and 1950s dropped the average precipitation for the earlier period, making recent years seem much wetter in comparison.

However, if warming continues, climate models project that wet areas will continue to get wetter while dry areas will get drier. The changes will include more frequent downpours, floods, droughts, and heat waves. In fact, heavy rainfall events have already become more common throughout the United States, while heat waves and droughts have become more frequent and intense, particularly in the West.

Additional findings discuss the impacts of climate change on a variety of sectors, including human health, infrastructure, water quality and availability, agriculture, ecosystems, and the ocean. The report also looks atactions being taken to prepare for impacts and to mitigate future changes.

Reference

  1. U.S. Global Change Research Program (2014) Climate change impacts in the United States. Accessed May 7, 2014.
  2. Temperature and precipitation images courtesy the U.S. Global Change Research Program. Caption by Holli Riebeek.
 
Instrument(s): 
In situ Measurement

Earth Right Now

Source: http://www.nasa.gov/content/earth-right-now/

Overview: A Big Year for NASA Earth Science

Source: http://www.nasa.gov/content/overview...earth-science/

My Comment: NASA is back down to earth and this compliments what NOAA does for the United States!

Key Bullet: Since the agency's inception in 1958, NASA has established itself a world leader in Earth science and climate studies. That will never be more apparent than in the next 12 months, when five NASA Earth-observing missions will be launched – more than NASA has conducted in a single year in over a decade​.

 
January 22, 2014

Of all the planets NASA has explored, none yet have matched the dynamic complexity of our own Earth. Earth teems with life and liquid water; massive storms rage over land and oceans; environments range from deserts to tropical forests to the icy poles. And amid all of that, seven billion people carve out a daily life.

And our planet is changing. Through the gradual build-up of more greenhouse gases in the atmosphere, Earth is warming. As Earth warms, ocean waters expand and ice melts to make sea levels rise. The cycle of rainfall and evaporation accelerates, leading to more severe droughts and more severe bouts of rainfall. Heat waves become more frequent and more intense.

It is this changing world that NASA continues to explore and strives to understand, so that societies can meet the challenges of the future.

Since the agency's inception in 1958, NASA has established itself a world leader in Earth science and climate studies. That will never be more apparent than in the next 12 months, when five NASA Earth-observing missions will be launched – more than NASA has conducted in a single year in over a decade. This is Earth Right Now.

  • The launch of the Global Precipitation Measurement Core Observatory will inaugurate an unprecedented international satellite constellation to produce frequent global observations of rainfall and snowfall -- revolutionary new data that will help answer questions about our planet's life-sustaining water cycle and improve weather forecasting and water resource management.
  • The Soil Moisture Active Passive satellite will take its place in the fleet of NASA satellites now observing every phase of Earth's critical water cycle, allowing the agency to "follow the water" from underground aquifers to the oceans to moisture and rainfall in the clouds. Scientists look to the changes in this cycle as a signature of climate change. Understanding how and how quickly those changes will happen will be vital toward allowing cities and countries to adapt.
  • As carbon dioxide levels in Earth's atmosphere continue to rise, NASA will launch the Orbiting Carbon Observatory-2 to make a completely new set of global, satellite measurements of the still mysterious ways that carbon moves through the atmosphere, land and ocean.
  • The deployment of two new instruments on the International Space Station will for the first time convert the orbiting astronaut lab into a 24-7 platform for Earth science. The ISS-RapidScat instrument will observe how winds behave around the globe to benefit weather forecasts and hurricane monitoring, while the Cloud-Aerosol Transport System, or CATS, instrument will make critical measurements of clouds and aerosols – still the two climate change variables most difficult to measure and predict.

NASA does more than develop and build Earth-observing spacecraft and sensors. The agency's multi-disciplinary team of scientists, engineers and computer modelers also analyze vast archives of data for insights into Earth's interconnected systems -- atmosphere, ocean, ice, land, biosphere -- and openly provide that data to the global community. They design and deploy airborne, ground-based and ocean-going field campaigns to study Earth from the heights of the stratosphere to the depths of the ocean to the remote ice caps at the poles. And they work with other government agencies and partner organizations to apply NASA data and computer models to improve decision-making and solve problems.

NEXT

Page statistics
673 view(s) and 16 edit(s)
Social share
Share this page?

Tags

This page has no custom tags.
This page has no classifications.

Comments

You must to post a comment.

Attachments