Skip to content

Archiving Flow

Upload flow

sequenceDiagram
  autonumber
  participant C as Client
  participant S as Scicat
  participant A as Archiver Service

    C --> C: Select Dataset
    C --) S: Register Dataset: POST /Datasets
   activate S 
      Note left of S: { "isOnCentralDisk" : false, "archivable" : false, "archiveStatusMessage: "filesNotReadyYet"}
      S --) C: PID
   deactivate S
    C -->> A: Upload dataset
      Note left of A: Dataset
    C -->> S: PUT /Datasets/{datasetId}
      Note left of S: { "datasetlifecycle": {"archiveStatusMessage" : "datasetCreated, "archivable" : false, "archiveStatusMessage: "filesNotReadyYet"}}
    C -->> S: POST /api/v4/OrigDatablocks
      Note left of S: {fileBlock, "datasetId": "datasetId"}
    S -->> A: Trigger Archive Job: POST /api/v1/Job
      Note left of A: {}
  • Source folder never used: why?
  • Datafile list vs origdatablocks?

Archival Task Flow

Archival is split into two subflows, Create Datablocks and Move Datablocks to LTS, which can be triggered separately. An archival task can contain multiple datasets; for simplicity the case with only one is depicted here.

Identifier Description
Flow Sequence of tasks necessary for archiving
Subflow Flow triggered by a parent flow
Job Schedules flow for multiple datasets
Archive Flow Subflow that runs archiving for one dataset
User Error Dataset is incomplete, not found, ...
System Error unrecoverable (transient) error in the system
sequenceDiagram
  autonumber
  participant AS as Archival Service
  participant L as Landing Zone
  participant J as Worker
  participant A as Staging
  participant S as SciCat
  participant LTS as LTS


  %% Note over AS, S: Subflow `Create Datablocks`

  S --) AS: Archive: POST /api/v1/jobs
  activate AS
  Note right of AS: Job {"id" : "id", "type":"archive", "datasetlist": [], ... } as defined in Scicat

  AS --) J: Create Archival Flow for All Dataset
  activate S
  AS -->> S: 
  S -->> S: Reply? scheduleForArchiving status? Already set?
  deactivate S

  loop Retry: Exponential backoff
    activate J
      activate S
        J --) S: PATCH /api/v4/Jobs/{JobId}
        Note left of S: {"jobStatusMessage": "inProgress", <br>"updatedAt": "...",<br>"updatedBy": "..."}, 
      deactivate S
    deactivate J
  end
  deactivate AS

  critical Job for datasetlist
    critical Subflow for {DatasetIf}
      activate J
        loop Retry: Exponential backoff
          activate J
            activate S
              J --) S:  PATCH /api/v4/Dataset/{DatasetId}
              Note left of S: {"datasetlifecycle": {"archiveStatusMessage": "started"}, <br> "updatedAt": "...", <br>"updatedBy": "..."}  
            deactivate S
          deactivate J
        end
        loop Retry: Exponential backoff
          activate J
            activate S
              J --> S: GET /api/v4/Datasets/{dataset_id}/origdatablocks
              S --) J:sdf
              Note right of J: {""}, 
            deactivate S
          deactivate J
        end
        J -->> L: Create Datablocks
        L -->> A: Move Datablocks to Staging
        loop Retry: Exponential backoff
          activate J
          J -->> S: Register datablocks POST /api/v4/Datasets/{DatasetID}
          Note left of S: ?
          deactivate J
        end
        J -->> L: Cleanup Dataset files, Datablocks
        Note right of L: Dataset files only get cleaned up when everythings succeeds
      deactivate J
    option Failure User Error
      activate J
        %% J --) S: Report Error: PATCH /api/v4/Jobs/{JobId}
        %% Note left of S: {"jobStatusMessage": "finishedWithDatasetErrors" ,<br>  "updatedAt": "...",<br>  "updatedBy": "..."}
        J --) S: Report Error: PATCH /api/v4/Dataset/{DatasetId}
        Note left of S: {"datasetlifecycle:{"archiveStatusMessage": "missingFilesError"}, <br> "updatedAt": "...", <br> "updatedBy": "..."}
        J -->> A: Cleanup Datablocks
        J -->> L: Cleanup Datablocks
        Note right of L: No cleanup of dataset files
      deactivate J
    option Failure System Error
      activate J
        %% J --) S: Report Error: PATCH /api/v4/Jobs/{JobId}
        %% Note left of S: {"jobStatusMessage": "finishedUnsuccessful",<br>  "updatedAt": "...",<br>  "updatedBy": "..."}
        J --) S: Report Error: PATCH /api/v4/Dataset/{DatasetId}
        Note left of S: {"datasetlifecycle:{"archiveStatusMessage": "scheduleArchiveJobFailed"}, <br> "updatedAt": "...", <br> "updatedBy": "..."}
        J -->> A: Cleanup Datablocks
        J -->> L: Cleanup Datablocks
        Note right of L: No cleanup of dataset files
      deactivate J
    end
    option Failure User Error
      activate J
        J --) S: Report Error: PATCH /api/v4/Jobs/{JobId}
        Note left of S: {"jobStatusMessage": "finishedWithDatasetErrors" ,<br>  "updatedAt": "...",<br>  "updatedBy": "..."}
      deactivate J
    option Failure System Error
      activate J
        J --) S: Report Error: PATCH /api/v4/Jobs/{JobId}
        Note left of S: {"jobStatusMessage": "finishedUnsuccessful",<br>  "updatedAt": "...",<br>  "updatedBy": "..."}
      deactivate J
  end


  Note over J, LTS: Subflow `Move Datablocks to LTS`
    S -->> AS: Optional re-trigger from Scicat
    AS --) J: Create `Move datablock to LTS` Pipeline
  critical
    activate J
    A -->> LTS: Move Datablocks to LTS
    deactivate J
  option Move to LTS Failure
    activate J
    J -->> LTS: Cleanup LTS folder
    J -->> S: Report Error
    Note left of S: ?
    deactivate J
  end
  critical
    J -->> LTS:  Validate Datablocks in LTS
    J -->> L: Cleanup Dataset files, Datablocks
  option Validation Failure
    activate J
    J -->> S: Report Error
    Note left of S: ?
    J -->> LTS: Cleanup Datablocks
    deactivate J
  end
  loop Retry: Exponential backoff
    activate J
    J -->> S:  PATCH /api/v4/Datasets/{DatasetId}
    Note left of S: {"datasetlifecycle": {"retrievable": True, "archiveStatusMessage": "datasetOnArchiveDisk"}, <br> "updatedAt": "...", <br> "updatedBy": "..."} 
    deactivate J
  end 
  loop Retry: Exponential backoff
    activate J
    J -->> S: PATCH /api/v4/Jobs/{JobId}
    Note left of S: {"jobStatusMessage": "finishedSuccessful",<br>  "updatedAt": "...", <br> "updatedBy": "..."} 
    deactivate J
  end 

Retrieval Task Flow

sequenceDiagram
  autonumber
  participant AS as Archival Service
  participant J as Worker
  participant R as Staging
  participant S as SciCat
  participant LTS as LTS

  activate AS
  S --) AS: Retrieve: POST /api/v1/jobs
  Note right of AS: Job {"id" : "id", "type":"retrieve", "datasetlist": [], ... } as defined in Scicat

  AS --) J: Create Retrieval Flow

  AS -->> S: Reply?
  Note left of S: {}
  deactivate AS 
  loop Retry: Exponential backoff
    activate J
      J -->> S: PATCH /api/v4/Jobs/{JobId}
      Note left of S: {"jobStatusMessage": "inProgress",<br> "updatedAt": "...", <br> "updatedBy": "..."}  
      J -->> S: PATCH /api/v4/Datasets/{DatasetId}
      Note left of S: {"datasetlifecycle": {"retrieveStatusMessage": "started"},<br> "updatedAt": "...", <br> "updatedBy": "..."}  
    deactivate J
  end
  critical
    activate J
      LTS -->> R: Download Datablocks from LTS
    deactivate J
  option Retrieval Failure
    activate J
      J --) S: Report Error: PATCH /api/v4/Dataset/{DatasetId}
          Note left of S: {"retrieveStatusMessage": Scicat specific? valid values?<br>, "retrieveReturnMessage": storage specific? free to choose?, <br> "updatedAt": "...", <br> "updatedBy": "..."}
      J --) S: Report Error: PATCH /api/v4/Jobs/{JobId}
         Note left of S: {"jobStatusMessage": "finishedWithDatasetErrors" Scicat specific?,<br>  "updatedAt": "...",<br>  "updatedBy": "...", <br> "jobResultObject" Storage specific?
      J -->> R: Cleanup Files
    deactivate J
  end
  critical
    activate J
    J -->> R: Validate Datablocks
    deactivate J
  option Validation Failure
    activate J
      J -->> S: Report Error ?
      J -->> R: Cleanup Files
    deactivate J
  end
  loop Retry: Exponential backoff
    activate J
      J -->> S: PATCH /api/v4/Jobs/{JobId}
      Note left of S: {"jobStatusMessage": "finishedSuccessful"} 
      J -->> S: PATCH /api/v4/Datasets/{DatasetId}
      Note left of S: {"datasetlifecycle":<br> {"retrieveStatusMessage": "datasetRetrieved"}} ? where to put download url? 
    deactivate J
  end 
graph LR

    Job[Job ID] --> ScheduleJob{ScheduleFlows}
  subgraph parallel flows
    ScheduleJob --> Subflow1(Subflow Dataset ID 1)
    ScheduleJob --> Subflow2(Subflow Dataset ID 2)
  end
  subgraph Parallel move to LTS 2
  Subflow1 --> Move1{Schedule move}
  Move1 --> Task1(Task Datablock 1)
  Move1 --> Task2(Task Datablock 2)
  Move1 --> Task3(Task Datablock 3)
  Move1 --> Task4(Task Datablock 4)
  Task1 --> WaitMove{Wait move}
  Task2 --> WaitMove{Wait move}
  Task3 --> WaitMove{Wait move}
  Task4 --> WaitMove{Wait move}
  end
  subgraph Parallel move to LTS 1
  Subflow2--> Move2{Schedule move}
  Move2 --> Task21(Task Datablock 1)
  Move2 --> Task22(Task Datablock 2)
  Move2 --> Task23(Task Datablock 3)
  Move2 --> Task24(Task Datablock 4)
  Task21 --> WaitMove2{Wait move}
  Task22 --> WaitMove2{Wait move}
  Task23 --> WaitMove2{Wait move}
  Task24 --> WaitMove2{Wait move}
  end

  subgraph Sequential data verification

  WaitMove --> Verification{Schedule Verification}
  WaitMove2 --> Verification{Schedule Verification}

  Verification --> Verify11(Datablock 1)
  Verify11 --> Verify12(Datablock 2)
  Verify12 --> Verify13(Datablock 2)
  Verify13 --> Verify14(Datablock 2)
  Verify14 --> Verify21(Datablock 2)
  Verify21 --> Verify22(Datablock 2)
  Verify22 --> Verify23(Datablock 2)
  Verify23 --> Verify24(Datablock 2)
  end