DECIPHER - Align Profiles

Align Profiles

This short example describes how to use DECIPHER to merge two alignments, as described in:

ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment." BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.

ES Wright (2020) "RNAconTest: Comparing Tools for Noncoding RNA Multiple Sequence Alignment Based on Structural Consistency." RNA, doi:10.1261/rna.073015.119.

For an in-depth tutorial on sequence alignment, see the "The Art of Multiple Sequence Alignment in R" vignette, available from the Documentation page.

How do I align two sets of aligned sequences?

There are two options for merging alignments:

If the two existing alignments are small enough to fit in memory, then it is simplest to use the function 'AlignProfiles'.
If the two alignments are too large to fit in memory, then it is necessary to load them into a sequence database before they can be efficiently aligned with the function 'AlignDB'.

These two options are described separately below. To begin, it is necessary to install DECIPHER and load the library in R. Then for option 1 (below), load the two sequences from separate FASTA files and perform alignment.

Hide output

# load the DECIPHER library in R
> library(DECIPHER)
> 
> # specify the path to both FASTA files (in quotes)
> fas1 <- "<<REPLACE WITH PATH TO FASTA FILE1>>"
> fas2 <- "<<REPLACE WITH PATH TO FASTA FILE2>>"
> 
> # load the sequences from the file
> # change "DNA" to "RNA" or "AA" if necessary
> seqs1 <- readDNAStringSet(fas1)
> seqs2 <- readDNAStringSet(fas2)
> 
> # perform the alignment
> aligned <- AlignProfiles(seqs1, seqs2)
> 
> # view the alignment in a browser (optional)
> BrowseSeqs(aligned, highlight=0)
> 
> # write the alignment to a new FASTA file
> writeXStringSet(aligned,
+    file="<<REPLACE WITH PATH TO OUTPUT FASTA FILE>>")

For option 2, it is first necessary to construct a sequence database containing both alignments. The alignments must each be named with a unique 'identifier'.

# load the DECIPHER library in R
> library(DECIPHER)
> 
> # specify the path to both FASTA files (in quotes)
> fas1 <- "<<REPLACE WITH PATH TO FASTA FILE1>>"
> fas2 <- "<<REPLACE WITH PATH TO FASTA FILE2>>"
> 
> # specify where to create the new sequence database
> db <- "<<REPLACE WITH PATH TO SEQUENCE DATABASE>>"
> 
> Seqs2DB(fas1, "FASTA", db, "Alignment1")
Reading FASTA file from line  1 to 1e+05


175 total sequences in table DNA.
Time difference of 0.11 secs


> Seqs2DB(fas2, "FASTA", db, "Alignment2")
Reading FASTA file from line  1 to 1e+05


Added 175 new sequences to table DNA.
350 total sequences in table DNA.
Time difference of 0.11 secs


> 
> # perform the alignment
> AlignDB(db,
+    identifier=c("Alignment1", "Alignment2"),
+    add2tbl="OutputAlignment")
  |============================================| 100%
Added 350 aligned sequences to table OutputAlignment
with identifier 'Alignment1_Alignment2'.
> 
> # efficiently write the alignment to a new FASTA file
> DB2Seqs("<<REPLACE WITH PATH TO OUTPUT FASTA.gz FILE>>",
+    db,
+    tblName="OutputAlignment",
+    compress=TRUE)
  |============================================| 100%


Wrote 350 sequences.
Time difference of 0.26 secs