Select this option if you have protein motifs but DNA sequences. The DNA sequences will be translated on the fly to protein at each frame for comparison with the motifs.

[ close ]

Select this option if you want MAST to remove any motifs that are too similar to other motifs in your query. Having pairs of motifs that are too similar will cause the E-values computed by MAST to be inaccurate. If you don't select this option, problematic motifs will be highlighted in your MAST output.

[ close ]

MAST only displays sequences matching your query with E-values below the given threshold you specify here. By default, sequences in the database with matches with E-values less than 10 are displayed. If your motifs are very short or have low information content (are not very specific), it may be impossible for any sequence to achieve a low E-value. If your MAST search returns no hits, you may wish to increase the E-value display threshold and repeat the search.

[ close ]

MAST can ignore motifs in the query with E-values above a threshold you select. This is desirable because motifs with high E-values are unlikely to be biologically significant. If this option is disabled then MAST will use all the motifs in the query, regardless of their E-values.

This option only works for motifs where the E-value is included in the motif file. MAST does not have any capability for re-assessing the signficance of a motif.

[ close ]

This option can improve search selectivity when erroneous matches are due to biased sequence composition.

MAST normally computes E-values and p-values using a random sequence model based on the overall letter composition of the database being searched. Selecting this option will cause MAST to use a different random model for each target sequence. The random model for each target sequence will be based on its letter composition, not that of the entire database.

Using this option will tend to give more accurate E-values and increase the E-values of compositionally biased sequences. This option may increase search times substantially if used in conjunction with E-value display thresholds over 10, since MAST must compute a new set of motif score distributions for each high-scoring sequence.

[ close ]

MAST displays motifs that score above a threshold for all high-scoring sequences. By default, this threshold is based on the probability of the motifs without regard to the length of the sequence. The threshold was chosen with protein sequences of average length in mind. Consequently, many positions in very long sequences may match motifs with scores above this threshold by chance, making the results difficult to interpret.

Selecting this option causes the motif display threshold to take sequence length into account. This will reduce the number of weak motifs displayed in long sequences and minimize the size of the output file.

[ close ]

MAST can automatically generate the reverse complement strand for each nucleotide sequence in the database and treat it in three different ways. ("Given strand" refers to the sequence as it appears in the database MAST is searching.):

combine with given strand
MAST searches for motif occurrences on either the given strand or its reverse complement together, not allowing occurrences on the two strands to overlap each other, and displays them together as a single sequence. This allows motifs to occur on either strand and still count toward the overall E-value of the match. (The given strand is the sequence as it appears in the database MAST is searching.)
treat as separate sequence
MAST to search for motifs in both the given strand and its reverse complement, treating them as two, independent sequences. As of version 4.3.2 the results are displayed together in the html though in previous versions the results were displayed separately for the two strands, as though both had occurred in the database.
MAST searches only the given strand of each sequence in the database.

Note: this field has no effect when the database contains protein sequences.

[ close ]

If your sequences are not in a standard alphabet (DNA, RNA or protein), you must input a custom alphabet file.

[ close ]

Click on the menu at the left to see which of the following motif input methods are available.

Type in motifs
When this option is available you may directly input multiple motifs by typing them (or using "cut-and-paste").  First select the desired motif alphabet using the menu immediately to the left. If you select the "Custom" option then you must provide an alphabet definition in the file input that immediately follows. Warning: custom alphabets are case-sensitive.  You may optionally give each motif an identifier and alternate name by inputting a line like >Identifer Alternate-Name preceeding the motif.  You can then enter each motif as either matrices, sequence sites or regular expressions.  You can enter multiple motifs by typing an empty line after each motif.  Individual motifs will be shown in square brackets, and errors in your motifs will be highlighted in red while warnings will be highlighted in yellow.  Mouse-over individual motifs to display their sequence logos.  View the examples for more information on what is possible.
Upload motifs
When this option is available you may upload a file containing motifs in MEME motif format.  This includes the outputs generated by MEME and DREME, as well as files you create using the motif conversion scripts or manually following the MEME motif format guidelines.
Databases (select category)
When this option is available you can select the category of motif database desired from the list below it. Then select the motif database from the displayed list.  Consult the motif database documentation for descriptions of all the motif databases present on this MEME Suite server.
Submitted motifs
This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program.  By selecting this option you will input the motifs sent by that program.
[ close ]
<< back to overview

Typed Motifs - Matrices

You may input both probability and count matrices of either orientation and the rules described below will be used to convert the matrix into a MEME formatted motif.

Alphabet Order

The counts/probabilities are expected to be ordered based on the alphabetical ordering of their codes.  So DNA is ordered ACGT and protein is ordered ACDEFGHIKLMNPQRSTVWY. For custom alphabets the ordering goes uppercase letters (A-Z), lowercase letters (a-z), numbers (0-9) and finally the symbols '*', '-' and '.'.

Matrix Orientation

Matrix motifs may be input with either one position per row (preferred) which is called row orientation, or one position per column which is called column orientation.  The orientation is determined by picking which dimension (row or column) is equal to the alphabet size.  If both dimensions are equal to the alphabet size then row orientation is assumed.  If neither dimension is equal to the alphabet size then the closest that is still smaller than the alphabet size is picked, however if both are equally smaller then column orientation is assumed.  Finally if none of the above rules work to determine the orientation then row orientation is assumed.

Site counts

Once the orientation is determined, the sum of the numbers that make up the first position is calculated and rounded to the nearest integer.  If that value is larger than 1 then the matrix is assumed to be a count matrix and that value is used as the site count, otherwise the matrix is assumed to be a probability matrix and a site count of 20 is used.

Converting to a normalized probability matrix

Once the orientation is determined then each number in the matrix is converted to a normalized probability by dividing by the sum of all the numbers for that motif position.  If any numbers are missing they are assumed to have the value zero.  As a special case if all numbers in a motif position have the value zero then they are given the uniform probability of 1 / alphabet size.

Yellow highlighting and red annotations

Red asterisks (*) indicate where the parser thinks values are missing.  A yellow highlighted row or column with a red number at the end indicates that the counts for that position don't sum to the same count as the first position. The red number shows the difference. If the red number is negative then that position sums to less then the first position, if it is positive then it sums to more than the first position.

[ close ]
<< back to overview

Typed Motifs - Sequence Sites or Regular Expressions

You may input one or more sites of the motif including using ambiguity codes or bracket expressions to represent multiple possibilities for a single motif position.

Ambiguity Codes

The DNA and protein alphabets include additional codes that represent multiple possible bases. For example the DNA alphabet includes W (for weak) which represents that the given position could be either a A (for adenosine) or a T (for thymidine).

Bracket Expressions

Bracket expressions also group together multiple codes so they share a single position.  Their syntax is a opening square bracket '[' followed by one or more codes and a closing square bracket ']'. For example with a DNA motif the bracket expression [AT] means that both A and T are acceptable and is equivalent to the ambiguity code W.  Any repeats of a base in a bracket expression are ignored so for example a DNA bracket expression [AAT] has the same effect as [AT] or [AW] or W.

Multiple sites

When only one site is provided the site count is set to 20, however you can precisely control the motif by providing multiple sites.  Each of these sites can still contain ambiguity codes and bracket expressions but a single count will be divided among the selected bases for each position.  When multiple sites are provided the site count will be set to the number of sites provided.

[ close ]
<< back to sequence site motifs

DNA Alphabet

DNA motifs support the standard 4 codes for the bases: adenosine (A), cytidine (C), guanosine (G) and thymidine (T) as well as supporting the following ambiguity codes.

WeakWA, T
StrongSC, G
AminoMA, C
KetoKG, T
PurineRA, G
PyrimidineYC, T
Not ABC, G, T
Not CDA, G, T
Not GHA, C, T
Not TVA, C, G
AnyNA, C, G, T
[ close ]
<< back to sequence site motifs

Protein Alphabet

Protein motifs support the standard 20 codes for the amino acids: Alanine (A), Arginine (R), Asparagine (N), Aspartic acid (D), Cysteine (C), Glutamic acid (E), Glutamine (Q), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K), Methionine (M), Phenylalanine (F), Proline (P), Serine (S), Threonine (T), Tryptophan (W), Tyrosine (Y) and Valine (V) as well as supporting the following ambiguity codes.

Asparagine or aspartic acidBN, D
Glutamine or glutamic acidZE, Q
Leucine or IsoleucineJI, L
Unspecified or unknown amino acidXA, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y

Note that the two amino acids Selenocysteine (U) and Pyrrolysine (O) are not supported by the MEME Suite.

[ close ]
<< back to overview

Typed Motifs - Examples

Single site motif using ambiguity codes N and R or bracket expressions. These give an approximation of the other motifs below.


Multiple site motif. This lists all 28 sites and gives the same result as the count matrix below.

Count matrix motif showing row and column orientations.


Note that all of these can be used with an identifier and alternate name like these 3 count matrix motifs from Jaspar.

[ close ]

When enabled this field supports selecting motifs from the file with a space separated list of motif identifiers and/or their positions in the file.

Any numbers in the range 1 to 999 are assumed to refer to the position of the selected motif in the file, so the entry "3" always refers to the third motif.  Any other entry is assumed to be a motif identifier.

Motif identifiers can not start with a dash and can only contain alphanumeric characters as well as colon ':', underscore '_', dot '.' and dash '-'.

[ close ]

Select the desired motif database.

Consult the motif database documentation for descriptions of all the DNA and RNA motif databases present on this MEME Suite server.

[ close ]

This option can help change the alphabet of motifs from a base alphabet to a derived alphabet.

This might be useful if you need to compare an extended DNA motif with a library of DNA motifs, or if you wish to compare RNA motifs to DNA motifs.  Note that this option will also let you do nonsensical things like compare Protein motifs to DNA motifs so use it with care.

The derived alphabet must have all the core symbols of the alphabet that it is derived from. For example if the alphabet is derived from DNA it must have ACGT as core symbols. Expanding the alphabet adds frequencies of zero for every symbol in the derived alphabet that did not exist in the base alphabet.

[ close ]
Click on the menu at the left to see which of the following sequence input methods are available.
Type in sequences
When this option is available you may directly input multiple sequences by typing them. Sequences must be input in FASTA format.
Upload sequences
When this option is available you may upload a file containing sequences in FASTA format.
Databases (select category)
When this option is available you may first select a category of sequence database from the list below it. Two additional menus will then appear where you can select the particular database and version desired, respectively. The full list of available sequence databases and their descriptions can be viewed here.
Submitted sequences
This option is only available when you have invoked the current program by clicking on a button in the output report of a different MEME Suite program. By selecting this option you will input the sequences sent by that program.
[ close ]

Select an available sequence database from this menu.

[ close ]

Select an available version of the sequence database from this menu.

[ close ]

Select an available tissue/cell-specificity from this menu.

[ close ]

Selecting this option will filter the sequence menu to only contain databases that have additional information that is specific to a tissue or cell line.

This option causes MEME Suite to use tissue/cell-specific information (typically from DNase I or histone modification ChIP-seq data) encoded as a position specific prior that has been created by the MEME Suite create-priors utility. You can see a description of the sequence databases for which we provide tissue/cell-specific priors here.

Note that you cannot upload or type in your own sequences when tissue/cell-specific scanning is selected.

[ close ]

Enter text naming or describing this analysis. The job description will be included in the notification email you receive and in the job output.

[ close ]

Data Submission Form

Find sequences that match a set of motifs.

Select the sequence alphabet

Use sequences with a standard alphabet or specify a custom alphabet.

Input the motifs

Enter motifs you wish to scan with.

Input the sequences

Enter sequences or select the database you want to scan for matches to motifs.

Input job details

(Optional) Enter your email address.

(Optional) Enter a job description.

Advanced options hidden modifications! [Reset]

Reverse-complement strand handling

Translate DNA sequences to protein?

Set a sequence display threshold

Use individual sequence composition

Scale the motif display threshold

Filter motifs by E-value threshold

Remove redundant motifs from query?

Warning: Your maximum job quota has been reached! You will need to wait until one of your jobs completes or 1 second has elapsed before submitting another job.

This server has the job quota set to 10 unfinished jobs every 1 hour.

Note: if the combined form inputs exceed 80MB the job will be rejected.