Term

Definition

Catalog

(source: Data.gov)

A catalog is a collection of datasets. Data.gov has three types of searchable data catalogs: The "Raw Data Catalog" features instant view/download of datasets; the "Tool Catalog" contains simple, application-driven access to federal data; and the "Geodata Catalog" contains federal geospatial data.
Category 

(source: Data.gov)

The category identifies the type of dataset (e.g., administrative, geospatial, research, statistical). Some types of data have additional metadata requirements.
CSV 

(source: Wikipedia)

A comma separated values (CSV) file is a computer data file used for implementing the tried and true organizational tool, the Comma Separated List. The CSV file is used for the digital storage of data structured in a table of lists form. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, and each field belongs to one table column. CSV files are often used for moving tabular data between two different computer programs (like moving between a database program and a spreadsheet program).
A value or set of values representing a specific concept or concepts. Data become "information" when analyzed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary depending on its context.
Data Extraction Tool

(source: Data.gov)

Data extraction tools allow a user to select a data basket full of variables and then recode those variables into a form that the user desires. The user can then develop customized displays of any selected data.
Dataset

(adapted from: Wikipedia)

A dataset is an organized collection of data. The most basic representation of a dataset is data elements presented in tabular form. Each column represents a particular variable. Each row corresponds to a given value of that column's variable. A dataset may also present information in a variety of non-tabular formats, such as an extended mark-up language (XML) file, a geospatial data file, or an image file, etc.
KML

(source: Wikipedia)

Keyhole Markup Language (KML) is an XML-based language schema for expressing geographic annotation and visualization on existing or future Web-based, two-dimensional maps and three-dimensional Earth browsers.
KMZ

(source: Wikipedia)

KML files are very often distributed in KMZ files, which are zipped files with a ".KMZ" extension. When a KMZ file is unzipped, a single "doc.kml" is found along with any overlay and icon images referenced in the KML as well as any network-linked KML files.
Metadata describes a number of characteristics, or attributes, of data; that is, "data that describes data". (ISO 11179-3). For any particular datum, the metadata may describe how the datum is represented, ranges of acceptable values, it should be labeled, as well as its relationship to other data. Metadata also may provide other relevant information, such as the responsible steward, associated laws and regulations, and access management policy. The metadata for structured data objects describes the structure, data elements, interrelationships, and other characteristics of information, including its creation, disposition, access and handling controls, formats, content, and context, as well as related audit trails.
Shapefile

(source: ESRI Shapefile Technical Description)

A shapefile stores nontopological geometry and attribute information for the spatial features in a dataset. The geometry for a feature is stored as a shape comprising a set of vector coordinates. Shapefiles can support point, line, and area features.
XML

(source: Wikipedia)

XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages. It is classified as an extensible language, because it allows the user to define the mark-up elements. XML's purpose is to aid information systems in sharing structured data, especially via the Internet, to encode documents, and to serialize data.
From MetaData Page
Date Released

(source: Data.gov)

The date that the dataset was originated.
Date Updated

(source: Data.gov)

The date that the dataset was last modified.
Time Period

(source: Data.gov)

Date or time interval(s) for which the dataset provides data.
Frequency

(source: Data.gov)

Frequency of data collection (one-time, annual, hourly, etc.).
Data.gov Data Category Type

(source: Data.gov)

The category designation for the entry as either an instantly downloadable raw data file or tool (i.e., data extraction and mining or widget).
Specialized Data Category Designation

(source: Data.gov)

The type of dataset (e.g., administrative, geospatial, research, or statistical). Some types of data have additional metadata requirements.
Keywords

(source: Dublin Core)

Used to describe the content of the resource. The element may use controlled vocabularies or words or phrases that describe the subject or content of the resource.
Unique ID

(source: Data.gov)

An unambiguous reference to the resource within a given context. Dublin Core defines best practice for this field as identifying the resource by a unique number (e.g., ISBN, ISSN, URL/URI, etc.). The Unique ID is intended for Data.gov internal reference only.
Citation

(source: FGDC-STD-001-1998)

The recommended reference citation to be used to cite the dataset.
Agency Program Page

(source: Data.gov)

The URL link (and name, if applicable) to the home page of the agency or program that is the dataset owner.
Agency Data Series Page

(source: Data.gov)

The URL link (and name, if applicable) to the agency web page where the link to the dataset is located. This is different from the URL for the actual dataset.
Unit of Analysis

(source: Data.gov)

The level of granularity or aggregation which is represented by a single record or observation in a dataset (e.g. person, household, production workers, establishment, city, country).
Granularity

(source: Dublin Core)

The level of detail at which an information object or resource is viewed or described.
Geographic Coverage

(source: Dublin Core)

Used to designate the extent or scope of the content of the resource and typically includes spatial location (a place name or geographic co-ordinates).
Collection Mode

(source: Data.gov)

Identifies the modality of the instrument used to gather data for the dataset (e.g., phone/paper, phone/computer, person/paper, person/computer, web, fax, other).
Data Collection Instrument

(source: Data.gov)

Identifies the specific instrument or tool (e.g., form, survey questionnaire) used to collect the data in the dataset corresponding to the collection mode.
Data Dictionary/Variable List

(source: Federal Enterprise Architecture: Data Reference Model)

A database used for data that refers to the use and structure of other data; that is, a database for the storage of metadata [ANSI X3.172-1990].
Data Quality

(source: OMB Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies, 67 FR 8452)

"Quality" is an encompassing term comprising objectivity, utility, and integrity. Sometimes these terms are referred to collectively as "quality." Any agency contributing a dataset to Data.gov must certify that the dataset conforms to the agency's information quality guidelines.
Privacy and Confidentiality

(source: 44 U.S.C. 3542)

Preserving authorized restrictions on information access and disclosure, including means for protecting personally identifiable and proprietary information. Any agency contributing a dataset to Data.gov must certify that dissemination of the data is consistent with the agency's responsibilities under the Privacy Act and, if applicable, the Confidential Information Protection and Statistical Efficiency Act of 2002.
Technical Documentation

(source: Data.gov)

Additional documentation that describes a dataset and its intended use.
Additional Metadata

(source: Data.gov)

Additional metadata that may be available for a dataset. Such metadata may conform to an existing standard (e.g., FGDC Metadata Standard).
Statistical Methodology

(source: Data.gov)

A description of the overall approach used for statistical design, sampling, data collection, statistical analysis, and estimation.
Sampling

(source: Box, Hunter, and Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, 1978)

The procedure used to define the total number of statistical observations (i.e., samples) from an overall population size.
Estimation

(source: Box, Hunter, and Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, 1978)

The approach used to compute statistical quantities based on the observations (e.g., mean, mode, standard deviation).
Weighting

(source: Box, Hunter, and Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, 1978)

An approach for applying a scaling factor to observations from one or more combined data series in order to normalize or otherwise adjust the observations.
Disclosure avoidance

(source: Federal Committee on Statistical Methodology)

Techniques (e.g., aggregation) that are applied to statistical data to ensure published data cannot be used to attribute a specific value to an individual.
Questionnaire design

(source: Data.gov)

A structured approach used to develop a questionnaire or survey that describes the structure and content of the survey instrument and the approach intended to be used for analyzing the survey results.
Series breaks

(source: Data.gov)

A discrete event or changes to the sample, the population, their environment, or the survey instrument occurring within a data collection that may affect statistical estimates or inferences.
Non-response adjustment

(source: Box, Hunter, and Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, 1978)

The approach for adjusting observations to account for missing or incomplete data within a series.
Seasonal adjustment

(source: Wikipedia)

A statistical method for removing the effects of seasonal variation of a time series that is used when analyzing non-seasonal trends.
Statistical Characteristics (CV, CI, variance, etc.)

(source: Box, Hunter, and Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, 1978)

Summary of statistical characteristics that reflect the overall accuracy and correlation of a statistical data sample relative to the overall population including coefficients of variation, confidence intervals, and variance.