Interface DatasetProcessService


public interface DatasetProcessService
This service exposes information regarding current crawling process and is not intended to provide historical information. Only information about currently queued and running crawl jobs is exposed.

We distinguish between XML based (BioCASe, DiGIR, TAPIR), Darwin Core archive, ABCD archive and Camtrap Data Package datasets. These don't share the same work queues because their processing is different in the beginning (downloading an archive vs. request-response type iterating over the endpoint. They do however share the same pipeline for processing the gathered data.

  • Method Details

    • getDatasetProcessStatus

      Returns the processing status for a particular dataset identified by a UUID key.
      Parameters:
      datasetKey - the dataset key
      Returns:
      a consolidated object populated with the crawl status for the specific dataset. Returns null if the dataset is not currently being processed
    • getRunningDatasetProcesses

      Returns:
      the processing status for all datasets that are currently being worked on (XML and DwC-A). These might be in different states, some can still be crawled on a page by page basis, some may be downloaded in the case of DwC-A and for some only the interpretation is still running.

      There is a chance that some processes will be returned that are already finished.

    • getPendingXmlDatasetProcesses

      Returns:
      an ordered list of dataset processing statuses for all XML based datasets that are currently waiting to be crawled
    • getPendingDwcaDatasetProcesses

      Returns:
      an ordered list of dataset processing statuses for all DwC-A based datasets that are currently waiting to be crawled
    • getPendingAbcdaDatasetProcesses

      Returns:
      an ordered list of dataset processing statuses for all ABCD-A based datasets that are currently waiting to be crawled
    • getPendingCamtrapDpDatasetProcesses

      Returns:
      an ordered list of dataset processing statuses for all CamtrapDP based datasets that are currently waiting to be crawled