Creating hierarchical attributes

Annotation of hierarchical attributes requires a little more effort. The CreateHierarchicalMGXAttribute node is used to obtain the inner structure of the hierarchy in a bottom-up approach; It contains several loops which will be explained in more detail.

Figure 3.12: The CreateHierarchicalMGXAttribute node requires three loops (note double-ended arrow on third loop between nodes 99 and 79) to create the internal structure of the hierarchy. Several connections were removed from the figure for illustrative purposes.
Image TreeAnnot

A single object, e.g. a NCBI taxon generated by the Kraken [Wood and Salzberg, 2014] classifier, is provided as an input into the node (3.12). The first loop is required to obtain the objects parent object, thus defining the hierarchy. In this example it is implemented using the GetParent and GetMajorRankedTaxon nodes, thus making sure only the major taxonomic ranks (superkingdom, phylum, class, ...) are included.
The second loop is used to obtain the corresponding attribute type for an object: it operates on the initial taxon as well as its parents obtained by the first loop. GetTaxonRank and GetRankName nodes provide the corresponding ranks' name, e.g. “class”; The StringGenerator and Concat nodes are then used to create the attribute type: “NCBI_class”. This value is used to create the corresponding attribute type employing the CreateMGXAttributeType node, which is returned into the CreateHierarchicalMGXAttribute node.
The third and final loop is used to map a data object to its name, which is used to create the attributes value; it is built up using the GetTaxonName node, which delivers its output back into the node.

Thus, the three loops might be termed as Get parent, Get AttributeType for object and Generate value.

The CreateHierarchicalMGXAttribute node emits a hierarchical MGXAttribute for the initial data object, with the corresponding AttributeType provided by loop 2 and the MGXAttribute's value obtained using loop 3. Internally, loop 1 is used repetitively until the root node is reached, with all intermediary results passing through loops 2 and 3, thus generating a single path of hierarchical attributes within the taxonomic tree. The output of the CreateHierarchicalMGXAttribute is connected to the AnnotateAttribute node as in the previous example.

For brevity's sake, several connections are hidden within the image, which have already been explained in the previous section; the CreateMGXAttributeType node needs an incoming connection providing a MGXJob, and the AnnotateAttribute node requires additional connections providing the sequence to be annotated and start/stop coordinates for the subregion which is described by the annotation.

Sebastian Jaenicke, 2020-04-28