With the advances of next generation sequencing, bacterial genomics is currently booming. It allows exploring the genetic diversity of isolates from a bacterial population, identifying characteristic functions of pan genomes and doing microbial source tracking. Today two main technologies are mainly used: the Ion Torrent PGM from Life Technologies and the MiSeq from Illumina. These methods require an amplification step of target DNA to bring it to a detectable quantity by machines. They generate short reads from 100 to 400 nucleotides in length and are compatible with paire-end protocols. The short fragments are finally assembled to reconstruct the entire genome of the cell.
Several methods exist to perform this assembly. As for solving a puzzle, algorithms can rely on a model (a reference genome) to rebuild the genome. In this case of resequencing, the bioinformatics strategy is called mapping and require few computation resources. Otherwise, if the target organism is newly studied, de-novo assembly pipelines for short reads are used. This step is still a complicated process even if reads are increasingly long, and it depends on the nature of the target genome (repeat regions, DNA accessibility, etc.). Various assemblers are available such as Velvet, Ray, Mira, Soap or Abyss.
Surprisingly, the efficiency of de-novo assemblers depends on the type of sequencer and the coverage.
A too-low depth leads to the absence of reads for some genetic regions, while a too-high depth increases
the risk of artefacts and decreases the quality of the assembly. Ray assembler is known to be more
effective for MiSeq data with a 100-fold coverage and Mira for Ion Torrent data with a 25-fold coverage
For more information: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625192/ and https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-675.
The limit of NGS resides in two parameters: the size of reads and the amplification steps that produce errors. Third-generation sequencers are able to sequence a single molecule of DNA in its native form. They also produce larger reads from 4 to 20 kb. The assembly step is thus facilitated, leading to the generation of better contigs and scaffolds.