![]() |
Annotation of metagenome sequences requires an “attribute type” and an “attribute”. As an example, we will illustrate the implementation of a pipeline for the analysis of GC content within metagenome sequences. We use a StringGenerator node configured to generate the string “GC” to create a label for the attribute type. As GC content is indicated by a number, we appropriately configure the CreateMGXAttributeType node to emit a basic (i.e. not hierarchical) as well as numerical “attribute type” (3.8).
![]() |
![]() |
In a second step, we use the ReadCSF node to obtain access to the individual metagenome sequences; as MGX annotates sequences individually, a connection between ReadCSF and AnnotateAttribute is required (3.9). Subsequently, we implement the actual analysis, which is provided by the GCContent node. It will process all sequences and emit the corresponding GC content for each of them. To convert these values to appropriate “attributes”, an “attribute type” is required for each value; therefore, a Repeat node is inserted between nodes 5 and 7 (3.10).
![]() |
Finally, as an annotation always refers to only a part of a sequence, we will need to generate the corresponding
start and end coordinates; since GC content refers to the full sequence, we can use an ULongGenerator node
configured to emit 0 (MGX uses 0-based coordinates) to generate the start coordinate; this node needs to be connected to
a Repeat node to generate a series of 0s.
The end coordinate can be created based on the sequences' length, with 1 subtracted, obtained through the
GetLength and MinusOne nodes (3.11).
The GetMGXJob node will retain its red border due to missing configuration; this, however, can be
ignored, as appropriate configuration will be provided by the MGX framework automatically.
![]() |