Machine Learning and Neural Networks group

Dipartimento di Ingegneria dell'Informazione
Università di Firenze
Via Santa Marta 3
50139 Firenze - Italy


DISULFIND

Cysteines Disulfide Bonding State and Connectivity Predictor

Help and References


The server: description

DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Disulfide bridges play a major role in the stabilization of the folding process for several proteins. Prediction of disulfide bridges from sequence alone is therefore useful for the study of structural and functional properties of specific proteins. In addition, knowledge about the disulfide bonding state of cysteines may help the experimental structure determination process and may be useful in other genomic annotation tasks.
Our server predicts disulfide patterns in two computational stages: (1) the disulfide bonding state of each cysteine is predicted by a BRNN-SVM binary classifier; (2) cysteines that are known to participate in the formation of bridges are paired by a Recursive Neural Network to obtain a connectivity pattern.

Options:
  • Predict both the bonding state of each cystein in the query and the corresponding disulfide arrangment. Option: Predict bonding state + connectivity pattern.
  • Predict only the disulfide pattern of some of the cysteines. Option: Predict connectivity from user specified bonding state
    Note: as you might imagine, this last option represents few possible situations:
    • You know the bonding state of each cysteine in the chain and would like to know how the oxidized cysteines are connected.
    • You know the bonding state of some of the cysteines and would like to know how they would be connected.
    In case the user selects this option, after pressing the submit button, a page will be displayed with a list of checkboxes, each one representing the position of a cysteine in the submitted query. Please: do not make use of the navigation/reload buttons at this stage.
    After pressing the submit button, the user selected checkboxes will trigger the prediction of connectivity among the corresponding cysteines only. Errors will be feedbacked in case of obvious inconsistent selections, e.g. odd or zero number of selected cysteines.
Important Note: In order to trade-off between the computational burden and prediction reliability, DISULFIND can assign at most 5 disulfide bridges. Two remarks are relevant to this limitation. First, chains with more than 5 bridges are rare (no more than 10% of the SwissProt chains annotated with disulfide bridges). Second, the prediction accuracy is already low for chains having 5 bridges because of a limited number of available training examples; hence prediction of patterns with 6 or more bridges would be very inaccurate and could not be reliably trusted.
As a consequence, the connectivity predictor will not return an answer if it is given (either by the bonding state predictor or the user) more than 10 oxidized cysteines.

Bonding State Predictor

DISULFIND employs an SVM binary classifier to predict the bonding state of each cysteine [2,3], followed by a refinement stage that collectively classifies all the cysteines in the chain. Global dependencies represented by the collective bonding state are modelled in two ways. Firstly, the SVM receives as input both local and global descriptive attributes. Secondly, we add a post-processing stage consisting of a bidirectional recurrent neural network (BRNN) in the attempt of recovering some correlation, and a Viterbi decoder that finds the maximum likelihood bonding state assignment which satisfies the constraint of having an even number of disulfide bonded cysteines (interchain bonds are ignored).

Disulfide Connectivity Predictor

Predicting disulfide patterns can be seen as mapping an input sequence with annotated cysteines into an output graphs representing disulfide connectivity. The input is formed by the annotated sequence and a candidate connectivity pattern. The method applied in DISULFIND, documented in [1], employs a bi-directional Recursive Neural Network trained to predict the expected distance of candidate patterns to the real one. Prediction is carried out by running the trained network on all possible connectivity patterns and choosing the one yielding maximum score.

Input formats

Email Address

Your email address, the place where the prediction will be delivered, if option Batch: send output to email address is selected.
NOTE: Check that you typed your address correctly.

Query Name

An optional name for your query. We strongly suggest that you use one, especially if sending more than one query. The order in which you send your queries may not correspond to the order in which you receive the answers.

Amino Acid Sequence

The sequence of aminoacids:

  • A bare sequence is accepted. Please no FASTA format.
  • Spaces, newlines and tabs will be simply ignored.
  • Non alphabetical chars will cause the rejection of the query.
  • Only 1 letter amino acid code accepted. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...


Output Options

Replies are sent either by email (option: Batch: send output to email address) or as an html page (option: Interactive: receive output on the browser). In the latter case, after pressing the send button, an intermediate page will appear and auto-refresh every 10 seconds until the final output is returned.
By default DISULFIND only returns the most likely connectivity pattern. By setting the number of alternatives to an integer k in the range [2,3], the k best ranking patterns will be returned.


EMAIL output example

Query Name: VPRA_DENPO


                          +----------------------+
                     +----|------------+         |                    +---------------+
               +-----|----|+           |         |                  +-|-----+         |
               |     |    ||           |         |                  | |     |         |
         .........10........20........30........40........50........60........70........
AA       AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCL
DB_state       1     1    11           1         1                  1 1     1         1
DB_conf        9     9    99           9         9                  9 9     9         9
DB_flip





         80
AA       SK
DB_state
DB_conf
DB_flip

DB_bond   bond(7,19)
DB_bond   bond(13,31)
DB_bond   bond(18,41)
DB_bond   bond(60,68)
DB_bond   bond(62,78)

Conn_conf 0.442914

----------------------------------------------------------------------------------------

SECOND best ranking connectivity pattern


                     +----+                      +------------------+ +---------------+
               +-----|----|+           +---------|------------------|-|-----+         |
               |     |    ||           |         |                  | |     |         |
         .........10........20........30........40........50........60........70........
AA       AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCL




         80
AA       SK

DB_bond   bond(7,19)
DB_bond   bond(13,18)
DB_bond   bond(31,68)
DB_bond   bond(41,60)
DB_bond   bond(62,78)

Conn_conf 0.422348

----------------------------------------------------------------------------------------

Abbreviations used:

AA        = amino acid sequence
DB_state  = predicted disulfide bonding state
            (1=disulfide bonded, 0=not disulfide bonded)
DB_conf   = confidence of disulfide bonding state prediction (0=low to 9=high)
DB_flip   = an asterisk (*) indicates that the viterbi aligner overruled the
            most likely predition for that residue in order to achieve a
            consistent prediction at a protein level (even number of disulfide
            bonded cysteines, as interchain bonds are ignored).
DB_bond   = position in sequence of a pair of cysteines predicted to be forming
            a disulfide bridge.
Conn_conf = confidence of connectivity assignment given the predicted disulfide
            bonding state (real value in [0,1])

----------------------------------------------------------------------------------------

HTML output example



Results for VPRA_DENPO

                          +----------------------+                                      
                     +----|------------+         |                    +---------------+ 
               +-----|----|+           |         |                  +-|-----+         | 
               |     |    ||           |         |                  | |     |         | 
         .........10........20........30........40........50........60........70........
AA       AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCL
DB_state       1     1    11           1         1                  1 1     1         1 
DB_conf        9     9    99           9         9                  9 9     9         9 

           
           
           
           
         80

AA       SK
DB_state   
DB_conf    

DB_bond   bond(7,19)
DB_bond   bond(13,31)
DB_bond   bond(18,41)
DB_bond   bond(60,68)
DB_bond   bond(62,78)

Conn_conf 0.442914
Second best ranking connectivity pattern

                     +----+                      +------------------+ +---------------+ 
               +-----|----|+           +---------|------------------|-|-----+         | 
               |     |    ||           |         |                  | |     |         | 
         .........10........20........30........40........50........60........70........
AA       AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCL

           
           
           
         80
AA       SK

DB_bond   bond(7,19)
DB_bond   bond(13,18)
DB_bond   bond(31,68)
DB_bond   bond(41,60)
DB_bond   bond(62,78)

Conn_conf 0.422348
  • Please cite:
    • A. Ceroni, A. Passerini, A. Vullo and P. Frasconi. DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server, Nucleic Acids Research, 34(Web Server issue):W177--W181, 2006.
  • Contact information:
  • Abbreviations used:
    • AA amino acid sequence
    • DB_state predicted disulfide bonding state (1=disulfide bonded, 0=not disulfide bonded)
    • DB_conf confidence of disulfide bonding state prediction (0=low to 9=high)
      A red colour means that the viterbi aligner overruled the most likely predition for that residue in order to achieve a consistent prediction at a protein level (even number of disulfide bonded cysteines, as interchain bonds are ignored). See papers for details.
    • DB_bond position in sequence of a pair of cysteines predicted to be forming a disulfide bridge.
    • Conn_conf confidence of connectivity assignment given the predicted disulfide bonding state (real value in [0,1])

References

Please cite:

[1] A. Ceroni, A. Passerini, A. Vullo and P. Frasconi. DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server, Nucleic Acids Research, 34(Web Server issue):W177--W181, 2006. Download PDF

For the disulfide connectivity predictor see also:

[2] A. Vullo and P. Frasconi. Disulfide connectivity prediction using recursive neural networks and evolutionary information, Bioinformatics, 20, 653-659, 2004. Download PDF

For the cystein bonding state predictor see also:

[3] P. Frasconi, A. Passerini, and A. Vullo. A two-stage SVM architecture for predicting the disulfide bonding state of cysteines, Proc. IEEE Workshop on Neural Networks for Signal Processing, pp.25-34, 2002. Download PDF

[4] A.Ceroni, P.Frasconi, A.Passerini and A.Vullo. Predicting the disulfide bonding state of cysteines with combinations of kernel machines, Journal of VLSI Signal Processing, 35, 287-295, 2003. Download PDF

Copyright notice

The documents listed in this site are provided as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


home

top

© 2003-2006 Machine Learning and Neural Networks Group.
For questions and comments: disulfind at ai dot dinfo dot unifi dot it.