Purpose: Highlight and tally predicted N-linked glycosylation sites (Nx[ST] patterns, where x can be any amino acid).
New!
Replace Ox[ST] for Nx[ST]
Details:
During glycosylation, an oligosaccharide chain is attached to asparagine (N) occurring in the tripeptide sequence N-X-S or N-X-T, where X can be any amino acid except Pro. This sequence is called a glycosylation sequon.
The N-GlycoSite tool marks and tallies the locations where this pattern occurs.
The likelihood of N-linked glycosylation of a particular site can be influenced by the context in which it is embedded, and could be expanded to a 4-amino acid NX[ST]Z pattern, where the amino acid in the X or Z position can be important determinants of glycosylation efficiency. For example, a proline in position X or Z strongly disfavors N-linked glycosylation.
N-linked glycosylation signals are more difficult predict, but one can estimate their positions using the NetPhos program at Center for Biological Sequence Analysis.
Input:
Input can be one amino acid sequence, or an alignment of amino acid sequences, from any organism. If you just want to tally the number of N-glycosylation sites, the protein sequences do not need to be aligned. Standard sequence alignment formats are recognized.
Exclude NP[ST] pattern:
A second position proline (site pattern NP[ST]) is strongly disfavored for glycosylation. Thus the default option excludes these patterns. You may uncheck the box to include them.
Exclude NN[ST][ST] pattern:
Sites in close proximity to each other may be hindered from being glycosylated at the same time. If you check the box, the tool will exclude a first position asparagine but include a second one.
Grouped Sequence Names:
If you are analyzing multiple sequences, you can choose how to group them in the analysis. If you are analyzing a single sequence, or you do not want to group your sequences, just ignore these options. Your sequences can be grouped by the first character in the sequence names, or by a set of characters delimiting the sequence names, or by providing a list of groups.
Each sequence must be on a separate line, and groups are separated by an empty line. The first item ending in ':' in a group will be taken as the group name, but this line is optional. If group names are omitted, names will be assigned as Group-1, Group-2, etc. Sequences that are not present in any group will be named 'Others' and colored gray. This is useful for highlighting some groups of sequences out of a target set.
The following can be pasted in as the "grouped sequence names" for testing with the Sample Input:
North America: 1a.US.-.HCV-H 1a.US.-.RBPRESC2C4 1a.US.-.US5 1a.US.-.SCPRESC2C9 1a.US.-.BCS1C13 1a.US.78.FM_78 1a.US.-.HCV-PT 1a.US.81.HW_81 1a.US.-.RHPRESC2D 1a.US.-.RJPRESC2D 1a.US.77.JL_77 Other: 1a.-.-.H77 1a.IT.-.I21 1a.-.-.COLONEL 1a.-.-.HCT23 1a.-.-.PHCV-1/SF9_A 1a.-.-.HCT18 1a.-.-.LTD6-2-XF224
References: