wiki:NameChecker

Mutalyzer Name Checker Help

The Mutalyzer Name Checker can check the correctness of a variant Description under the following conditions:

1) The Syntax Checker is able to parse the description.

2) A valid Reference Sequence record is provided.

3) The reference sequence record contains all the sequence affected by the variant description.

4) The reference sequence record annotation contains sufficient information to support the selected position numbering scheme.

5) The semantic nomenclature rules applicable to the variant description are supported by the Name Checker.

Variant description format

If you are not familiar with the HGVS standard human sequence variant nomenclature, try the Name Generator first or check Variant Descriptions.

The Name Checker expects sequence variant descriptions in the following format:

<Accession Number>.<version number>:<sequence type>.<variant>

Example:
NM_003002.1:c.5delC
AL449423.14:g.61866_85191del

If the reference sequence record contains multiple genes, transcript variants or protein isoforms and position numbering becomes ambiguous, this format is extended to:

<Accession Number>.<version number><(Gene Symbol)>:<sequence type>.<variant>
The gene symbol has to be extended with transcript variant or protein isoform numbers (e.g., _v001 or _i001, respectively), if multiple transcript variants or protein isoforms are annotated.

Example:
The genomic description AL449423.14:g.61866_85191del is equivalent to the following unambiguous descriptions:

8 descriptions relative to CDKN2A transcript variants:
AL449423.14(CDKN2A_v001):c.-271-u19352_234del
AL449423.14(CDKN2A_v002):c.5_400del
AL449423.14(CDKN2A_v003):n.1-u19623_508del
AL449423.14(CDKN2A_v004):n.42_437del
AL449423.14(CDKN2A_v005):n.449+371_705del
AL449423.14(CDKN2A_v006):n.481+371_565del
AL449423.14(CDKN2A_v007):n.53+371_859+d18212del
AL449423.14(CDKN2A_v008):n.1-u23242_84del

1 description relative to an MTAP transcript variant:
AL449423.14(MTAP_v005):n.*60994-u23670_*60994-u345del

2 descriptions relative to CDKN2B transcript variants:
AL449423.14(CDKN2B_v001):c.*3084+d8453_*3084+d31778del
AL449423.14(CDKN2B_v002):c.*303+d11537_*303+d34862del

1 description relative to a C9orf53 transcript variant:
AL449423.14(C9orf53_v001):c.*312+d3374_*312+d26699del

Name Checker output

The Name Checker will try to regenerate the variant sequence and apply the semantic rules of the HGVS standard human sequence variant nomenclature to name it accordingly.

The Mutalyzer Name Checker has been designed to issue warnings, when correcting entries, encountering inconsistencies, incomplete sequences or annotation, or identifying variations with potential effects on splicing before presenting the results of the analysis.
Errors will be generated when the entries can not be processed properly (see the conditions mentioned above).

Click the link below for a Name Checker output example:

AL449423.14:g.61866_85191del

General output items

Within the input box:

  • The submitted description
  • Overview of the raw variants:

Top sequence: part of the reference sequence affected by the variant with 25 nucleotide upstream and downstream flanking sequences in 5' to 3' orientation
Bottom sequence: the variant sequence with 25 nucleotide upstream and downstream flanking sequences in 5' to 3' orientation
The raw variant description shows the variation type and the position of the variant from the start of the reference sequence.

  • The "View original variant in UCSC Genome Browser" link. Click this link to see the Mutalyzer custom variant track in the UCSC Genome Browser. Please note that the Base Position track displayed as Full will show amino acid codons for the forward orientation of the chromosomal reference sequence, whereas the codon affected might be on the reverse strand.

Under the "Name checker results:" header

Warnings and errors
See Common errors observed for examples.

Genomic description: (only shown for genomic sequence records)
The genomic description of the variant using the reference sequence specified.
If the reference sequence annotation contains mapping to a chromosomal reference sequence, the corresponding description will be listed under the heading: Alternative chromosomal position

Description relative to transcription start: (only shown for transcript sequence records)
(Not for use in LSDBs in case of protein-coding transcripts).
The description of the variant using the non-coding transcript position numbering.
The link should not be used in combination with protein-coding transcripts, since the n. position will be interpreted as a c. position!

Affected transcripts:
Lists all descriptions relative to transcript variants of genes affected by the variant.
Descriptions are no predictions of variant effects at the RNA level
Note: Substitution descriptions for genes transcribed in the opposite orientation will use the reverse complement of nucleotides shown in the genomic description. Positions of insertion and deletions in those transcripts can shift in opposite directions due to the Position shift rule: According to the standard nomenclature a deletion of a G in a stretch of G's is described using the position of the most 3' G.

Affected proteins:
Lists all descriptions relative to protein isoforms of genes affected by the variant.
The protein variant descriptions following the p. prefix are shown between parentheses to indicate that they are predictions.
The descriptions are generated by translation of the variant coding sequence under the simple assumption that the annotated splice sites unaffected by the variant are still used.

Detailed information about the selected transcript and predicted protein:
Only displayed when descriptions relative to a specific transcript or protein are checked.
Reference protein:
Reference protein sequence in single letter amino acid code
Amino acids affected by the variant are shown in red

Protein predicted from variant coding sequence:
Predicted variant protein sequence in single letter amino acid code
Amino acids not present in the reference protein are shown in red

Additional information about the transcript:

Exon information:
Transcript information extracted from the reference sequence annotation presented in tabular format
Lists all exons of the transcript with their corresponding numbers, genomic (g.) start and end positions and coding DNA (c.) start and end positions.

CDS information:
Lists the Coding sequence (CDS) start and end positions extracted from the reference sequence annotation.

Effects on Restriction sites:
Lists all restriction sites, which are created or deleted by the variant. Restriction sites are identified in the sequence using Biopython. The list is created by comparison of restriction sites present in the reference sequence and the variant sequence.

Legend:
Lists all genes, transcript variants and protein isoforms extracted from the reference sequence annotation and the method to link them to each other.

Links:
Allows download of the reference sequence file

Test Examples

AB026906.1:c.3_4insG

AB026906.1:c.[1del;4G>T]

AL449423.14(CDKN2A_v1):c.1_10del

UD_127955523176(DMD_v002):c.136G>T

LRG_1t1:c.266G>T

Common errors observed

Last modified 2 years ago Last modified on Jun 30, 2016 4:37:07 PM