Class CrawlJob


  • @Immutable
    @ThreadSafe
    public class CrawlJob
    extends Object
    This class represents a job to be worked on by a crawler. That can be either one of the XML based protocols (BioCASe, DiGIR, TAPIR) or a DwC-Archive.

    For now this object will be used in JSON serialized form in ZooKeeper.

    • Constructor Detail

      • CrawlJob

        public CrawlJob​(UUID datasetKey,
                        EndpointType endpointType,
                        URI targetUrl,
                        int attempt,
                        @Nullable
                        Map<String,​String> properties)
        Creates a new crawl job.
        Parameters:
        datasetKey - of the dataset to crawl
        endpointType - of the dataset
        targetUrl - of the dataset
        attempt - a monotonously increasing counter, increased every time we try to crawl a dataset whether that attempt is successful or not
        properties - a way to provide protocol or crawl specific options
      • CrawlJob

        public CrawlJob​(UUID datasetKey,
                        Integer attempt,
                        EndpointType endpointType,
                        URI targetUrl)
        Constructor with mandatory fields. Properties field is set to null.
        Parameters:
        datasetKey - of the dataset to crawl
        endpointType - of the dataset
        targetUrl - of the dataset
        attempt - a monotonously increasing counter, increased every time we try to crawl a dataset whether that attempt is successful or not