Darwin Core Data Package guide
Darwin Core Data Package guide
- Title
- Darwin Core Data Package guide
- Date version issued
- 2025-09-10
- Date created
- 2025-08-12
- Part of TDWG Standard
- http://www.tdwg.org/standards/450
- This version
- http://rs.tdwg.org/dwc/doc/dp/2025-09-10
- Latest version
- http://rs.tdwg.org/dwc/dp/
- Previous version
- —
- Abstract
- Specification for creating Darwin Core Data Packages.
- Contributors
- Peter Desmet (INBO), Tim Robertson (Global Biodiversity Information Facility), John Wieczorek (Rauthiflor LLC)
- Creator
- Darwin Core Maintenance Group
- Bibliographic citation
- Darwin Core Maintenance Group. 2025. Darwin Core Data Package guide. Biodiversity Information Standards (TDWG). http://rs.tdwg.org/dwc/doc/dp/2025-09-10.
This guide references non-production URLs for DwC-DP. Once DwC-DP is released, every link containing https://raw.githubusercontent.com/gbif/dwc-dp/refs/heads/master/dwc-dp/0.1/
should be replaced with http://rs.tdwg.org/dwc-dp/1.0/
.
1 Introduction
Darwin Core Data Package (hereafter referred to as “DwC-DP”) is a community-developed container format to exchange biodiversity data. It extends the Data Package specification (developed by Frictionless Data) as an implementation for the Darwin Core Conceptual Model. This document specifies the requirements for datasets to comply with DwC-DP.
1.1 Audience (non-normative)
This guide is intended for biodiversity data providers, curators, aggregators, researchers, software implementers, and standards developers who prepare or consume datasets using Darwin Core. It assumes familiarity with tabular data, but not with the Data Package specification. Where helpful, it references relevant parts of the Data Package specification and the Darwin Core standard.
1.2 Status of the content of this document
All sections of this document are normative (define what is required to comply with the standard), except for sections that are explicitly marked as non-normative (support understand but are not binding).
1.3 RFC 2119 key words
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
2 Example (non-normative)
Consider a dataset containing four bird Occurrences observed during a single parent Event. It can be described with two CSV files, each representing a DwC-DP table:
events.csv
eventID,eventDate,locationID
S229876476,2025-04-26T20:57:00+02:00,https://ebird.org/hotspot/L43523233
occurrences.csv
occurrenceID,eventID,scientificName,organismQuantity,organismQuantityType
1,S229876476,Apus apus,3,individuals
2,S229876476,Troglodytes troglodytes,1,individuals
3,S229876476,Turdus merula,1,individuals
4,S229876476,Erithacus rubecula,1,individuals
This dataset can be described as a DwC-DP with the following descriptor (datapackage.json
):
{
"profile": "https://raw.githubusercontent.com/gbif/dwc-dp/refs/heads/master/dwc-dp/0.1/dwc-dp-profile.json",
"id": "dwc-dp-example-dataset",
"created": "2025-09-08T09:52:03Z",
"version": "1.0",
"resources": [
{
"name": "event",
"path": "event.csv",
"profile": "tabular-data-resource",
"format": "csv",
"mediatype": "text/csv",
"schema": {
"fields": [
{
"name": "eventID",
"title": "Event ID",
"description": "An identifier for a dwc:Event.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/eventID",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/eventID-2023-06-28"
},
{
"name": "eventDate",
"title": "Event Date",
"description": "A date or time interval during which a dwc:Event occurred.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/eventDate",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/eventDate-2025-06-12"
},
{
"name": "locationID",
"title": "Location ID",
"description": "An identifier a dcterms:Location.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/locationID",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/locationID-2023-06-28"
}
],
"primaryKey": ["eventID"]
}
},
{
"name": "occurrence",
"path": "occurrence.csv",
"profile": "tabular-data-resource",
"format": "csv",
"mediatype": "text/csv",
"schema": {
"fields": [
{
"name": "occurrenceID",
"title": "Occurrence ID",
"description": "An identifier for a dwc:Occurrence.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/occurrenceID",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/occurrenceID-2023-06-28"
},
{
"name": "eventID",
"title": "Event ID",
"description": "An identifier for a dwc:Event.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/eventID",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/eventID-2023-06-28"
},
{
"name": "scientificName",
"title": "Scientific Name",
"description": "A full scientific name, with authorship and date information if known. When forming part of a dwc:Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in dwc:verbatimIdentification.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/scientificName",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/scientificName-2023-06-28"
},
{
"name": "organismQuantity",
"title": "Organism Quantity",
"description": "A number or enumeration value for the quantity of dwc:Organisms.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/organismQuantity",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/organismQuantity-2023-06-28"
},
{
"name": "organismQuantityType",
"title": "Organism Quantity Type",
"description": "A type of quantification system used for the quantity of dwc:Organisms.",
"type": "string",
"format": "default",
"dcterms:isVersionOf": "http://rs.tdwg.org/dwc/terms/organismQuantityType",
"dcterms:references": "http://rs.tdwg.org/dwc/terms/version/organismQuantityType-2023-06-28"
}
],
"primaryKey": ["occurrenceID"],
"foreignKeys": [
{
"fields": "eventID",
"reference": {
"resource": "event",
"fields": "eventID"
}
}
]
}
}
]
}
Together with a eml.xml
file containing dataset-level metadata, the dataset would consists of the following files that may be zipped for easier transfer:
datapackage.json
eml.xml
event.csv
occurrence.csv
3 Descriptor content
A DwC-DP has a descriptor: a JSON file named datapackage.json
, which acts as an entry point to the dataset. It contains a reference to the profile the dataset conforms to, a list of data files (resources) and (optionally) dataset-level metadata. The requirements for these elements are described below.
All requirements and examples in this guide use version 1 of the Data Package specification, which is RECOMMENDED for DwC-DPs. Users MAY create descriptors using version 2 of the Data Package specification, which offers functionality that can relax some of the requirements below (e.g., fieldMatch
), but which has limited software support at the time of writing.
3.1 Descriptor file
-
The descriptor MUST follow the Data Package specification and MUST be named
datapackage.json
. -
Dataset metadata MAY be expressed in an
eml.xml
file. It MUST follow the Ecological Metadata Language specification and MUST be placed at the same level as thedatapackage.json
file.
3.2 Package-level properties
-
The descriptor MUST have a
resources
property, with an array of data files that are considered part of a dataset. It MUST follow the Data Package specification and MUST contain at least one resource. See section 3.3 for details. -
The descriptor MUST have a
profile
property, with a URL referencing the profile the dataset conforms to. This MUST be a string representing the URL to a DwC-DP profile served fromhttp://rs.tdwg.org
. The URL MUST include the version of the profile (e.g.,http://rs.tdwg.org/dwc-dp/1.0/dwc-dp-profile.json
, where1.0
is the version).(non-normative) The DwC-DP profile imports all Data Package requirements. A dataset that conforms to the DwC-DP profile will therefore also conform to the Data Package requirements. In other words: a DwC-DP is also a Data Package.
-
The descriptor SHOULD have an
id
property, with an identifier for the dataset, preferably a DOI. It MUST follow the Data Package specification. -
The descriptor SHOULD have a
created
property, with a timestamp indicating when the dataset was created. It MUST follow the Data Package specification. -
The descriptor SHOULD have a
version
property, indicating the version of the dataset. It MUST follow the Data Package specification. -
The descriptor MAY have additional package-level properties. This includes dataset-level metadata defined by the Data Package specification (e.g.,
title
,description
,contributors
,sources
,licenses
) or custom properties.
3.3 Resources
Each data file included in DwC-DP is a resource. Each resource MUST follow the Data Resource specification.
Of special interest are resources with (biodiversity) data organized in tables that implement the Darwin Core Conceptual Model (DwC-CM). These resources/tables (hereafter referred to as “DwC-DP tables”) have additional requirements.
3.3.1 DwC-DP table file requirements
Data files representing a DwC-DP table MUST be delimited text files (hereafter referred to as “CSV files”, irrespective of the chosen delimiter). CSV files MUST follow RFC 4180, with the following exceptions:
-
A CSV file MUST be encoded as UTF-8 OR, when deviating from that encoding, the DwC-DP table MUST have an
encoding
property that MUST follow the Data Resource specification and the files MUST follow that encoding. -
When a CSV file deviates from RFC 4180 regarding dialect (e.g., line terminators, field delimiters, quote characters), the DwC-DP table MUST have a
dialect
property describing the dialect. That property MUST follow the CSV Dialect specification. Only dialect properties deviating from the default SHOULD be provided. If the CSV file follows all defaults, adialect
property SHOULD NOT be provided.
3.3.2 DwC-DP table properties
-
A DwC-DP table MUST have a
name
property, with the name of the table. It MUST follow the Data Resource specification and MUST be one of the reserved table names defined in the DwC-DP profile (e.g.,"event"
,"occurrence"
). See section 4 for an overview. -
A DwC-DP table MUST have a
path
property, with the path to the data file. It MUST follow the Data Resource specification. -
A DwC-DP table MUST have a
profile
property, indicating the type of resource. It MUST be the value"tabular-data-resource"
, thereby indicating that it follows the Tabular Data Resource specification. -
A DwC-DP table SHOULD have a
format
property, indicating the standard file extension of the data file (e.g.,"csv"
,"tsv"
). It MUST follow the Data Resource specification. -
A DwC-DP table MUST have a
mediatype
property, indicating the mediatype of the data file (e.g.,"text/csv"
). It MUST follow the Data Resource specification and MUST be the value"text/csv"
. -
A DwC-DP table MUST have a
schema
property, with a table schema describing the fields and relationships of the table. It MUST follow the Data Resource specification, but MUST be an object representing the schema (and not a string referencing it). See section 3.4 for details.(non-normative) By verbosely including the
schema
, a descriptor does not rely on externally hosted files (except for the DwC-DP profile) to describe the data it represents. -
A DwC-DP table MAY have additional properties. This includes those defined by the Data Resource specification (e.g.,
bytes
,hash
) or custom properties.
3.3.3 Other resources
A DwC-DP MAY include other resources that do not represent a DwC-DP table. They MUST NOT have a name
that is one of the reserved table names defined in the DwC-DP profile. See section 4 for an overview.
3.4 Table Schemas
A table schema describes the fields, relationships and missing values of a tabular data file. A table schema MUST follow the Table Schema specification.
Table schemas are provided at rs.tdwg.org
for each DwC-DP table. See section 4 for an overview. These include all possible fields, primary keys and foreign key relationships a table can have. Use these to select the fields and keys that are applicable to your data.
-
A DwC-DP table schema MUST have a
fields
property, with an array of field descriptors describing the fields/columns in the data file. It MUST follow the Table Scheme specification, but the order and number of elements infields
MUST be the order and number of fields in the CSV file. See section 3.5 for details. -
Each field in a DwC-DP table schema MUST be described with the field descriptor of the table schema provided at
rs.tdwg.org
for that table. For example, if you want to describe an"eventID"
field in an"event"
table, you MUST use the field descriptor for"eventID"
in the table schema for"event"
provided atrs.tdwg.org
. Fields MUST NOT be misrepresented. Custom fields SHOULD NOT be added. -
A DwC-DP table schema SHOULD have a
primaryKey
property, indicating the field(s) that act as primary keys. It MUST follow the Table Schema specification. TheprimaryKey
property is REQUIRED if the field is referenced by another table.primaryKey
values MUST be one or more of theprimaryKey
values defined in the table schema provided atrs.tdwg.org
for that table (i.e., do not define primary keys not defined there). -
A DwC-DP table schema SHOULD have a
foreignKeys
property, with an array of relationships the table has with other tables. It MUST follow the Table Schema specification. If the table has foreign key relationships with other tables, then theforeignKeys
property is REQUIRED and every relationship MUST be expressed.foreignKeys
values MUST be one or more of theforeignKeys
values defined in the table schema provided atrs.tdwg.org
(i.e., do not define foreign key relationships not defined there).foreignKeys
MAY have apredicate
property to document relationship semantics. -
A DwC-DP table schema MAY have a
missingValues
property, indicating what values should be interpreted asnull
. It MUST follow the Table Schema specification. -
A DwC-DP table schema MAY have custom properties.
3.4.1 Relationships example (non-normative)
Consider an "event"
table with the following table schema:
{
"fields": [],
"primaryKey": "eventID",
"foreignKeys": [
{
"fields": "eventConductedByID",
"reference": {
"resource": "agent",
"fields": "agentID"
}
},
{
"fields": "parentEventID",
"reference": {
"resource": "",
"fields": "eventID"
}
}
]
}
For brevity, let’s name fields as table_name.field_name
(e.g. event.eventID
refers to the "eventID"
field in the "event"
table). The above schema expresses:
-
A relationship between the
"event"
and"agent"
tables. For each value inevent.eventConductedBy
a corresponding value is expected inagent.agentID
, linking those records. -
A relationship between the
"event"
table and itself. For each value inevent.parentEventID
a corresponding value is expected inevent.eventID
, linking those records. -
Since
event.eventID
is the target of a foreign key relationship, it must be a primary key.
3.5 Field descriptors
A field descriptor describes a single field in a table schema (e.g., name, description, format, constraints).
-
A field descriptor MUST have a
name
property, with the machine-readable name of the field (e.g.,"eventID"
). It MUST follow the Table schema specification and SHOULD correspond to the name of field/column in the data file (if a header is present). -
A field descriptor MUST have a
title
property, with the human-readable label of the field (e.g.,"Event ID"
). It MUST follow the Table schema specification. -
A field descriptor MUST have a
description
property, with a human-readable description of the field, such as the Darwin Core definition. It MUST follow the Table schema specification. -
A field descriptor MAY have a
comments
property, with usage notes. -
A field descriptor MUST have a
type
property, indicating the data type of values in the field (e.g.,"string"
,"number"
). It MUST follow the Table schema specification. -
A field descriptor SHOULD have a
format
property, indicating how values should be parsed. It MUST follow the Table schema specification. -
A field descriptor MUST have a
dcterms:isVersionOf
property, with the URL of the unversioned source term describing the field (e.g.,"http://rs.tdwg.org/dwc/terms/eventID"
). -
A field descriptor MAY have a
dcterms:references
property, with the URL of the versioned source term describing the field (e.g.,"http://rs.tdwg.org/dwc/terms/version/eventID-2023-06-28"
). -
A field descriptor MAY have a
rdfs:comment
property, with the canonical definition of the source term. -
A field descriptor MAY have a
namespace
property, with an abbreviation of the namespace of the source term (e.g.,"dwc"
,"dcterms"
). -
A field descriptor MAY have a
constraints
property, indicating value requirements that SHOULD be used in validation. It MUST follow the Table Schema specification. -
A field descriptor MAY have additional properties. This includes those defined by the Table Schema specification (e.g.,
example
) or custom properties.
(non-normative) You will meet the requirements for field descriptors by copying field descriptors from the table schemas provided at rs.tdwg.org
.