Algorithm:
FindingSemo constructs a suffix tree of the input sequences. Motifs are found
using this tree.
Finding Semo can
-
Find Common motifs in a set of sequences
-
Annotate the likely function of a protein by detecting plausible signature
motifs in
that sequence
1.
To find motifs: FindingSemo takes as input multiple sequences in FASTA
format
(see the
example for a sample file). It finds common motifs within these sequences. The
output
lists
the motifs founds, the number of times it was detected, the number of sequences in
which
they were found and a score (observed/expected ratio based) for the motif. The
larger
the
score the greater is its significance.
2. Significant sequences motifs were pre-detected in sequences from the UniProt50
dataset
and were correlated with functions of proteins that contained these motifs. For
functional
annotations, we pick out those motifs from sequences that have strong functional
correlations.