Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän

CoordinateCleaner: Standa… - Göteborgs universitet Till startsida
Till innehåll Läs mer om hur kakor används på gu.se

CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases

Artikel i vetenskaplig tidskrift
Författare Alexander Zizka
Daniele Silvestro
Tobias Andermann
Josué Azevedo
Camila Ritter
Daniel Edler
Harith Farooq
Andrei Herdean
María Ariza
Ruud Scharn
Sten Svantesson
Niklas Wengström
V. Zizka
Alexandre Antonelli
Publicerad i Methods in Ecology and Evolution
Volym 10
Nummer/häfte 5
Sidor 744-751
ISSN 2041-210X
Publiceringsår 2019
Publicerad vid Institutionen för biologi och miljövetenskap
Sidor 744-751
Språk en
Länkar dx.doi.org/10.1111/2041-210x.13152
Ämnesord biodiversity institutions, data quality, fossils, GBIF, geo-referencing, palaeobiology database (PBDB), r, big data, diversity, Environmental Sciences & Ecology
Ämneskategorier Miljövetenskap


Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo-referencing or dating, can diminish their usefulness. Manual cleaning is time-consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records. Here, we present CoordinateCleaner, an r-package to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo-referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio-temporal tests for fossils. We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing datasets (18.5%) might be biased by rasterized coordinates. In PBDB, 1205 records (6.3%) are potentially problematic. All cleaning functions and the biodiversity institution database are open-source and available within the CoordinateCleaner r-package.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?