Package org.gbif.api.service.crawler
Interface DatasetProcessService
public interface DatasetProcessService
This service exposes information regarding current crawling process and is not intended to provide historical
information. Only information about currently queued and running crawl jobs is exposed.
We distinguish between XML based (BioCASe, DiGIR, TAPIR), Darwin Core archive, ABCD archive and Camtrap Data Package
datasets. These don't share the same work queues because their processing is different in the beginning
(downloading an archive vs. request-response type iterating over the endpoint. They do however share the same
pipeline for processing the gathered data.
-
Method Summary
Modifier and TypeMethodDescriptiongetDatasetProcessStatus
(UUID datasetKey) Returns the processing status for a particular dataset identified by aUUID
key.
-
Method Details
-
getDatasetProcessStatus
Returns the processing status for a particular dataset identified by aUUID
key.- Parameters:
datasetKey
- the dataset key- Returns:
- a consolidated object populated with the crawl status for the specific dataset. Returns null if the dataset is not currently being processed
-
getRunningDatasetProcesses
- Returns:
- the processing status for all datasets that are currently being worked on (XML and DwC-A). These might be in different states, some can still be crawled on a page by page basis, some may be downloaded in the case of DwC-A and for some only the interpretation is still running. There is a chance that some processes will be returned that are already finished.
-
getPendingXmlDatasetProcesses
- Returns:
- an ordered list of dataset processing statuses for all XML based datasets that are currently waiting to be crawled
-
getPendingDwcaDatasetProcesses
- Returns:
- an ordered list of dataset processing statuses for all DwC-A based datasets that are currently waiting to be crawled
-
getPendingAbcdaDatasetProcesses
- Returns:
- an ordered list of dataset processing statuses for all ABCD-A based datasets that are currently waiting to be crawled
-
getPendingCamtrapDpDatasetProcesses
- Returns:
- an ordered list of dataset processing statuses for all CamtrapDP based datasets that are currently waiting to be crawled
-