To foster the sharing and dissemination of data produced by sponsored research, the National Science Foundation requires a data management plan for all proposals. The kinds of data that must be shared generally include whatever the scientific community needs to validate research findings. In particular, researchers must present a plan to share (1) analyzed data, (2) metadata that provide provenance information, and (3) metadata that describe how the data were generated.
For phylogenetics, the three kinds of data mentioned above and required by NSF are all accepted by TreeBASE, whether submitted directly to TreeBASE or indirectly by way of Dryad. For data type (1), we accept NEXUS formatted data with characters of datatype standard, continuous, DNA, RNA, and protein, and non-reticulating phylogenetic trees with branch lengths and clade support values. For metadata type (2) we parse and store morphological character labels and state labels in submitted NEXUS files and we map taxon labels to NCBI and uBio external taxonomies. Additionally, we accept the following metadata: museum specimen numbers in accordance with the Registry of Biological Repositories (RBR), Genbank accession numbers, other accession numbers, and Darwin Core compatible specimen metadata: collecting date, collector, latitude/longitude, elevation, country, state, and locality. For metadata type (3), we store and share the original uploaded NEXUS files (including any program-specific command blocks that can define substitution models and search parameters) as well as provide data entry fields to describe software, algorithm, and commands used. TreeBASE only shares data that are linked to a manuscript that is accepted by a peer reviewed publication (e.g. journal article, reviewed book or book section, or academic thesis approved by a thesis committee).
TreeBASE helps to certify data integrity by:
TreeBASE provides an advanced access URL for anonymous reviewers and referees to provide additional quality control before the data are made public. Although additional NSF requirements relating to provenance and how data were generated are not normally required or scrutinized by TreeBASE staff, submitters who flag their submission (in the submission notes section) as NSF-sponsored data will receive special attention by our staff. In these cases, TreeBASE staff will check to make sure that provenance and analysis metadata are adequately provided, and, as needed, communicate with the submitter and assist in properly formatting and ingesting these data.
TreeBASE plans to remain in compliance with the emerging, but still evolving, standard of Minimal Information for a Phylogenetic Analysis (MIAPA). In addition, TreeBASE publishes persistant and resolvable globally unique identifiers (GUIDs) for all major data objects and disseminates data and metadata using commonly accepted standards. A Restful PhyloWS API exposes metadata using RSS feeds in RDF; a NeXML serialization exposes data marked up with metadata using published vocabularies and fully qualified URIs in compliance with Linked Data standards. Basic record metadata are published through an OIA-PMH service, and TreeBASE records are mirrored by Dryad, which provides a secondary Dryad DataCite DOI. However, for most people in the scientific community, data will be retrieved using the web user interface and downloaded in the NEXUS format, while metadata can be downloaded separately in a tab-separated text format.
Although no data service can guarantee indefinite persistence, TreeBASE will make every effort to preserve its services as long as possible. Additionally, the Articles of Incorporation of the Phyloinformatics Research Foundation, which oversees TreeBASE activities, specify that if dissolution is ever required the assets will be transferred to a similar entity with a comparable mission.
Scientists are welcome to designate TreeBASE as their selected repository and dissemination service for phylogenetic data generated by sponsored research. In their Data Management Plan, we suggest that the following be mentioned:
TreeBASE suggests that for each submission of data from sponsored research you contribute at least $100 towards defraying the costs of storage and dissemination, as well as in support of the additional scrutiny by TreeBASE staff for NSF data management compliance. This fee is collected by the Phyloinformatics Research Foundation, which oversees TreeBASE activities. Anticipated costs can be budgeted under publication expenses in your grant proposal's budget.
Data storage fee for submissions resulting from sponsored research where TreeBASE provides added validation to help with data preparation and to ensure compliance with NSF directives: | Alternatively, for sponsored research that had not budgeted for data sharing with TreeBASE, please consider making a voluntary donation: |