Scaffolding pre-assembled contigs using paired-read data (version 2.0)
Insert size of the paired-end reads, please consult the sequencing facility if you are not sure about this value

How it works

This module uses SSPACE, a stand-alone scaffolding tool that scaffolds or connects the preassembled contigs by using paired-read information. SSPACE begins by mapping paired-end reads back to the contigs. It then extends the length of existing contigs by finding locations where unmapped reads can be added to the assembly by estimating their position in a gap, relative to the mapped member of the pair. After this contig-extension step, SSPACE bridges or scaffolds the contigs into larger assemblies using paired-end information.

In the VirAmp pipeline, we recommend using the original paired-end dataset for this step. In addition to the advantages mentioned above, using original dataset allows SSPACE to recover any information that was lost during the digital normalization or input filtering steps.

What to input

  1. Pre-assembled contigs from de novo assembly in FASTA format
  2. Paired-end sequence reads in either FASTA or FASTQ format
  3. Insert size of the paired-end sequencing library
  4. Indicate the paired-end sequence read format, either FASTA or FASTQ

Consult the SSPACE page for parameters in the advanced settings http://www.baseclear.com/lab-products/bioinformatics-tools/sspace-standard/

How to compute the insert size

The insert size refers to the observed average library fragment size without adaptors during library preparation. Here's an example:

/static/vamp_images/insertion_size.png

Reference

Boetzer, Marten, et al. "Scaffolding pre-assembled contigs using SSPACE." Bioinformatics 27.4 (2011): 578-579.