Interface DatasetProcessService


  • public interface DatasetProcessService
    This service exposes information regarding current crawling process and is not intended to provide historical information. Only information about currently queued and running crawl jobs is exposed.

    We distinguish between XML based (BioCASe, DiGIR, TAPIR), Darwin Core archive, ABCD archive and Camtrap Data Package datasets. These don't share the same work queues because their processing is different in the beginning (downloading an archive vs. request-response type iterating over the endpoint. They do however share the same pipeline for processing the gathered data.

    • Method Detail

      • getDatasetProcessStatus

        @Nullable
        DatasetProcessStatus getDatasetProcessStatus​(UUID datasetKey)
        Returns the processing status for a particular dataset identified by a UUID key.
        Parameters:
        datasetKey - the dataset key
        Returns:
        a consolidated object populated with the crawl status for the specific dataset. Returns null if the dataset is not currently being processed
      • getRunningDatasetProcesses

        Set<DatasetProcessStatusgetRunningDatasetProcesses()
        Returns:
        the processing status for all datasets that are currently being worked on (XML and DwC-A). These might be in different states, some can still be crawled on a page by page basis, some may be downloaded in the case of DwC-A and for some only the interpretation is still running.

        There is a chance that some processes will be returned that are already finished.