Workflow requirements

In order to design custom Conveyor workflows for later usage within the MGX platform, there are several constraints to be met which will be described in more detail.

First of all, a dedicated GetMGXJob node (Figure 3.4) has to be present within the workflow; in addition, this node has to be named "mgx". During execution of a pipeline within MGX, this node is configured via an external configuration file, providing required information about a jobs context, like e.g. access to a project database and associated storage.

Figure 3.4: The GetMGXJob node provides necessary context for executing a workflow within MGX, such as database access. By convention, this node has to be named mgx.
Image getjob

Access to metagenome DNA sequences is provided via the ReadCSF node, which will provide all metagenome sequences for a sequencing run object within MGX, except those for which the “discard” flag has already been set. As pipelines are always executed for one single analysis job, this node needs to be connected to the GetMGXJob node (3.5). Figure 3.6 shows a minimal example of a Conveyor-based pipeline for use within the MGX framework. Once executed, the pipeline would set the discard flag for all sequences.

Figure 3.5: The ReadCSF node is used to obtained metagenome sequence data from within MGX; it has one input and needs to be connected to the GetMGXJob node.
Image getjobreadcsf

Figure 3.6: A minimal working example of a pipeline developed for MGX, which would set the discard flag for all sequences.
Image simple

Sebastian Jaenicke, 2020-04-28