Quality control

Figure 2.14: After selecting a sequencing run object, the Quality Control component can be opened from its menubar icon (red circle).
Image QCopen

After sequence import, Quality control reports generated within MGX should be inspected (2.14) before proceding with data analysis. MGX currently offers three types of QC reports: Distribution of GC content, sequence length and nucleotide distribution within the DNA sequences. Those can be used to evaluate overall sequence data quality and check for possible signs of contamination. For demonstration purposes, data shown relates to the artificial simHC metagenome dataset created by the FAMeS [Mavromatis et al., 2007] project. The actual sequence data is publicly available and can be obtained from the FAMeS web site (http://fames.jgi-psf.org/Retrieve_data.html).

Figure 2.15: GC distribution of the simHC dataset.
Image QCgc

Figure 2.16: Nucleotide distribution of the simHC dataset. A high fraction of uncalled bases is apparent from the chart.
Image QCnuc

Figure 2.17: Read length distribution of the simHC dataset.
Image QCreadlen

Figure 2.18: Nucleotide distribution examples.

\begin{subfigure}
% latex2html id marker 270
[b]{0.3\textwidth}
\includegraphic...
...=\textwidth]{img/mgx/highGCnucl}
\caption{High-GC (65\%) data}
\end{subfigure}

\begin{subfigure}
% latex2html id marker 275
[b]{0.3\textwidth}
\includegraphics[width=\textwidth]{img/mgx/ampliconNucl}
\caption{Amplicon data}
\end{subfigure}

\begin{subfigure}
% latex2html id marker 280
[b]{0.3\textwidth}
\includegraphic...
...dth=\textwidth]{img/mgx/adapterNucl}
\caption{Adapter residue}
\end{subfigure}

Depending on the kind of sequence data, different patterns might emerge (2.18), which might or might not warrant any further action. While small amounts of e.g. adapter residue are sometimes encountered and might be considered acceptable, it is up to the individual researcher to check back with their sequencing provider and ask for adapter sequences to execute additional trimming.

Sebastian Jaenicke, 2020-04-28