Package org.gbif.api.service.crawler
Interface DatasetProcessService
-
public interface DatasetProcessService
This service exposes information regarding current crawling process and is not intended to provide historical information. Only information about currently queued and running crawl jobs is exposed. We distinguish between XML based (BioCASe, DiGIR, TAPIR), Darwin Core archive, ABCD archive and Camtrap Data Package datasets. These don't share the same work queues because their processing is different in the beginning (downloading an archive vs. request-response type iterating over the endpoint. They do however share the same pipeline for processing the gathered data.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description DatasetProcessStatus
getDatasetProcessStatus(UUID datasetKey)
Returns the processing status for a particular dataset identified by aUUID
key.List<DatasetProcessStatus>
getPendingAbcdaDatasetProcesses()
List<DatasetProcessStatus>
getPendingCamtrapDpDatasetProcesses()
List<DatasetProcessStatus>
getPendingDwcaDatasetProcesses()
List<DatasetProcessStatus>
getPendingXmlDatasetProcesses()
Set<DatasetProcessStatus>
getRunningDatasetProcesses()
-
-
-
Method Detail
-
getDatasetProcessStatus
@Nullable DatasetProcessStatus getDatasetProcessStatus(UUID datasetKey)
Returns the processing status for a particular dataset identified by aUUID
key.- Parameters:
datasetKey
- the dataset key- Returns:
- a consolidated object populated with the crawl status for the specific dataset. Returns null if the dataset is not currently being processed
-
getRunningDatasetProcesses
Set<DatasetProcessStatus> getRunningDatasetProcesses()
- Returns:
- the processing status for all datasets that are currently being worked on (XML and DwC-A). These might be in different states, some can still be crawled on a page by page basis, some may be downloaded in the case of DwC-A and for some only the interpretation is still running. There is a chance that some processes will be returned that are already finished.
-
getPendingXmlDatasetProcesses
List<DatasetProcessStatus> getPendingXmlDatasetProcesses()
- Returns:
- an ordered list of dataset processing statuses for all XML based datasets that are currently waiting to be crawled
-
getPendingDwcaDatasetProcesses
List<DatasetProcessStatus> getPendingDwcaDatasetProcesses()
- Returns:
- an ordered list of dataset processing statuses for all DwC-A based datasets that are currently waiting to be crawled
-
getPendingAbcdaDatasetProcesses
List<DatasetProcessStatus> getPendingAbcdaDatasetProcesses()
- Returns:
- an ordered list of dataset processing statuses for all ABCD-A based datasets that are currently waiting to be crawled
-
getPendingCamtrapDpDatasetProcesses
List<DatasetProcessStatus> getPendingCamtrapDpDatasetProcesses()
- Returns:
- an ordered list of dataset processing statuses for all CamtrapDP based datasets that are currently waiting to be crawled
-
-