Skip to content

ScopeMArchiver

A ingester and archiver service that allows uploading data and registering it with SciCat.

Archiver

Prefect.io is used to orchestrate the asynchronous jobs (flows) to archive (and retrieve) datasets. The detail sequence of steps can be found here

Services

There are services that are running and serving the archiver and other services that are just run to configure the archiver:

Runtime Services

Name Description Endpoint
traefik Reverse Proxy http://localhost/traefik/dashboard/#/
backend Endpoint for client applications and Scicat http://localhost/api/v1/docs
prefect-server Workflow orchestration https://www.prefect.io http://localhost/prefect-ui/dashboard
prefect-worker Worker that spawns new flows
postgres Database for Prefect
minio S3 Storage http://localhost/minio/
scicatmock Mock implementation of SciCat API

Configuration Containers

In addition to the services, several docker containers are started that configure the Prefect server with workpools, variables and flows defined in the repository. The workpools use a docker executor, i.e. the prefect-workers start every flow in its own container.

Name Description Input
prefect-config backend/config.toml
prefect-deploy backend/prefect.yaml
prefect-archival-worker backend/prefect-jobtemplate-prod.json
prefect-retrieval-worker backend/prefect-jobtemplate-prod.json

Deployment

  1. Startup services Using docker compose allows starting up all services.

    docker compose --env-file .production.env up -d
    

    This sets up all the necessary services, including the Prefect server instance.

Note: The environment variable HOST in .production.env determines where the services are hosted and are accessible from

  1. Configure Secrets

    Before being able to deploy flows secrets to the Github container registry need to be configured manually in the Prefect UI as a Secret.

    Name Description
    github-openem-username Username for Github container registry
    github-openem-access-token Personal access token to Github container registry
  2. Deploy Flows

    The flows can be deployed using a container:

    docker compose --profile config --env-file .prodduction.env run --rm
    

    This deploys the flows as defined in the prefect.yaml and requires the secrets set up in the previous step.

Development

Setup Development Services

  1. Start up all the services

    docker compose --env-file .production.env --env-file .development.env up -d
    

Note: Secrets and flows don't need to be deployed as in the production deployment

Debugging Flows Locally

For development and debugging, a local process can serve flows, for example by running python -m archiver.flows. However, some more configuration is required to fully integration with the other services; therefore a VS Code launch command Prefect Flows can be used in launch.json)

To a Remote Prefect Server

For deploying to a remote server, the following command can be used

cd backend
PREFECT_API_URL=http://<host>/api python deploy --all

It will tell Prefect to use the flows defined in the git repository and branch configured in deploy.py