wiki:BackTranslator

Mutalyzer Back Translator Help

The purpose of the Mutalyzer Back Translator is to help gene variant database curators and researchers to predict the potential nucleotide substitutions underlying variants reported as predictions at the protein level only.
Back translation from amino acid substitutions to nucleotide substitutions is achieved by considering all single nucleotide substitutions that may lead to the observed amino acid change.
Note: if more than one substitution is needed to explain the observed amino acid change, the Back Translator will return no results.

We distinguish two types of back translation. In one case we know what the reference sequence of the transcript is; in the other case we do not. Knowledge of the transcript reference sequence may be important as illustrated in the following example.

Suppose we have the predicted amino acid substitution p.Leu92Phe, amongst the possible nucleotide substitutions that may lead to this description are: c.276A>T, c.276G>T and c.274C>T. Obviously, by knowing the reference sequence of the transcript at position 276, two out of these three options can be discarded.

Known transcript reference

The Mutalyzer Back Translator has two methods of retrieving the transcript reference sequence:
1) Directly by providing the transcript accession number (e.g., NM_003002.3)
2) Indirectly by providing the protein accession number (e.g., NP_002993.1).
In the latter case, using the NCBI databases an attempt is made to link the protein accession number to the corresponding transcript accession number. When successful, these two methods provide equally reliable results. Otherwise, a warning will be issued and the fall back method described below will be used.

Unknown transcript reference

Even if the transcript reference is not known, it is possible to do a meaningful back translation by considering all possible reference codons. In general this method will yield more possibilities, but not always. The amino acid substitution p.Asp92Tyr for example, can only be explained by the single nucleotide substitution (c.274G>T), so in this case lack of the transcript reference sequence is not detrimental to our results.

The Mutalyzer Back Translator is aware of all nucleotide substitutions that will be more specific when a nucleotide reference sequence is supplied. If a back translation of such substitutions is requested, warnings will be issued informing the user about of possible improvement of the predictions.

Examples

For the following substitution, the Back Translator can find a link to the transcript reference sequence.

NP_002993.1:p.Asp92Glu

The results are fully HGVS compliant and can be used directly in the Name Checker:

NM_003002.3:c.276C>A
NM_003002.3:c.276C>G

For the next substitution, the link to the transcript reference sequence can not be found. Lack of this knowledge does not restrict the number of possibilities and is detrimental to the output.

NP_000000.0:p.Leu92Phe

Two warnings will be issued: one stating that no nucleotide reference sequence could be found; the other informing the user that the back translation could be improved by supplying this reference sequence. In addition, a list of possible nucleotide substitutions is given.

UNKNOWN:c.274C>T
UNKNOWN:c.276G>T
UNKNOWN:c.276G>C
UNKNOWN:c.276A>C
UNKNOWN:c.276A>T

These variant descriptions can not be used directly in the Name Checker or other interfaces because they lack the required accession numbers.

Last modified 3 years ago Last modified on Nov 9, 2015 6:38:49 PM