FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 |

«Recent enhancements to the Blocks database servers Jorja G. Henikoff, Shmuel Pietrokovski and Steven Henikoff1,* Fred Hutchinson Cancer Research ...»

-- [ Page 1 ] --

Recent enhancements to the Blocks database servers

Jorja G. Henikoff, Shmuel Pietrokovski and Steven Henikoff1,*

Fred Hutchinson Cancer Research Center, 1124 Columbia Street, Seattle, WA 98104,

USA and 1Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center,

Seattle, WA 98104, USA. *To whom correspondence should be addressed.


The Blocks Database contains multiple alignments of conserved regions in protein

families which can be searched by e-mail (blocks@blocks.fhcrc.org) and World Wide Web (http:/blocks.fhcrc.org/) servers to classify protein and nucleotide sequences. Recent enhancements to the servers include: Improved calculation of position-specific scoring matrices from blocks; availability of the Prints protein fingerprint database for searching in Blocks format; a representative sequence biassed towards the blocks of a protein family for sequence database searching;

a tree constructed from the blocks of a protein family to facilitate subfamily classification; links to World Wide Web sites dedicated to a family; and implementation of the Local Alignment of Multiple Alignments (LAMA) method to search a block against a database of blocks.


The Blocks Database (1, 2) represents documented families of protein sequences by ungapped multiple alignments of their conserved regions called "blocks" (3). A family may be represented by a single block, or by several, the average is between three and four (see Figure 1A). The Blocks Database is constructed automatically from groups of proteins known to be related. The spaced triplet algorithm of Smith, et al (4) is used to generate motifs, which are then merged, extended, and assembled into a representative set by the MOTOMAT algorithm (1). Other motif finders can be used with MOTOMAT. For instance, the Block Maker server (5) makes one set of blocks using this procedure and a second set using motifs generated by Gibbs sampling (6). Similarly, while the protein families documented in PROSITE (7) provide the basis for the Blocks Database, only the list of sequences belonging to each family is used, so that other sources of such lists could also be used.

The Blocks Database was designed to classify biological sequences. Protein or nucleotide sequences can be searched against blocks, which are converted to positionspecific scoring matrices (PSSMs) for this purpose. Comparison takes into account both local similarity between the query and family, represented by a single block, and global similarity, represented by the entire set of blocks for families with more than one conserved region (8). Intervening non-conserved regions are ignored, so that the comparison concentrates on the regions characteristic of each documented family.

Searching the Blocks Database can be more sensitive and selective than searching the sequence databases (3, 8).


When a query sequence is compared with a database of blocks, it is slid along each block and every possible alignment with each block is scored. A PSSM, sometimes called a profile (9), is computed from each block for this purpose. A PSSM has 20 rows, one for each amino acid, and as many columns as there as positions in the block. Each column and row entry in a PSSM contains a numerical score for the alignment of that block column with that amino acid in the query sequence. The total alignment score is the summed over all block columns.

Computing PSSM scores is an area of active research (10-13), and those used by the Blocks Searcher were improved in early 1996 to take advantage of this work. The current scores are log-odds ratios which incorporate position-based pseudo-counts computed from the substitution probabilities underlying the BLOSUM series of scoring matrices (14). The addition of pseudo-counts to the observed counts of amino acids in blocks helps compensate for possible inadequate sampling of family members. We have shown these scores to be an improvement over the former scores, which were based on odds ratios, as well as over other methods in general use (12).


The Blocks Database depends on documented groups of protein families. While we continue to depend on PROSITE (7) for this documentation, we have begun to supplement it. The Prints protein fingerprint database (15) is very similar in concept to the Blocks Database, although it is constructed differently (16). Groups of conserved motifs for a protein family are excised semi-manually from sequence alignments and used as fingerprints. Additional family members are sought by scanning the protein databases with these fingerprints.

A copy of the PRINTS Database in Blocks Database format can now be searched by the Blocks WWW server with links to the PRINTS WWW server. While many of the families represented in PRINTS are also in the Blocks Database, the PRINTS blocks may differ. PRINTS also contains families not represented in Blocks, so users are encouraged

to search both databases.


Multiple alignment information represented by the blocks for a family provides a simple strategy for improving similarity searches of sequence databases. Consensus residues obtained from the blocks are embedded into a family member to bias it towards the consensus in the family's conserved regions (17). These COBBLER (COnsensus Biasing By Locally Embedding Residues) sequences are provided for each family in the Blocks Database as part of the "Get Block" request and can be used with standard searching programs such as BLAST (18) and FASTA (19) to improve detection of distant family members in the sequence databases. Figure 1B shows the COBBLER sequence for the lipolytic "G-D-X-G" enzyme family (BL01173).


The "Get Block" request now returns a tree made from the block alignments for the family. The tree is made by the CLUSTAL W program (20) from the alignments in the blocks using the neighbor-joining method (21), and is useful for inferring similar function.

It is based on a matrix of distances between all pairs of block sequence segments. For requests from the Blocks e-mail server, the text "treefile" is returned. It can be displayed by programs such as "drawgram" from the PHYLIP (22) package, or by other programs that recognize the tree format used by PHYLIP. For requests from the Blocks WWW server, there are options to display the tree. The tree made from the lipolytic "G-D-X-G" enzyme family blocks is shown in Figure 1C.


One of the advantages of basing the Blocks Database on PROSITE has been the link to PROSITE's documentation. This documentation can be immediately accessed when a hit is detected in the Blocks Database. With the expansion of the WWW, some researchers are building WWW pages that contain a more comprehensive set of documentation for their family of interest, including graphics, meeting notifications, etc. We have started an effort to promote the development of more WWW resources (23) of this type. The Blocks Searcher now includes links to related WWW pages with the "Get Block" request.


Comparing sequences with blocks and other types of multiple alignments can be more sensitive than sequence-to-sequence comparisons (9, 17). Advancing one step further, we have developed a method for comparing multiple alignments with multiple alignments (24). Each multiple alignment is treated as a sequence of amino acid distributions.

Multiple alignments can then be aligned with each other by using an appropriate measure for scoring the similarity between amino acid distributions (analogous to the use of an amino acid substitution matrix in sequence-to-sequence alignment). We termed the method LAMA and tested it by searching blocks against the Blocks and other multiple alignment databases. LAMA identified genuine relations between protein families beyond the range of sequence-to-sequence and sequence-to-block search methods (24). LAMA can be used at our WWW server for searching the Blocks and PRINTS Databases and for comparing individual blocks. The WWW server also provides a tool for converting user-provided multiple alignments to the Blocks format required by LAMA, as well as computing the PSSM of the resulting block and a graphical logo representation of it (25, 5).

The following example illustrates the method. LAMA searches of the Blocks Database identified sequence similarity between serine active sites from carboxylesterase, lipase and endopeptidase families (Figure 2). All of these enzymes apparently cleave their substrates using a catalytic triad charge relay system. These relations were very difficult or impossible to detect with sequence-to-sequence searches, and sequence-to-blocks searches only detected the relation between the type-B carboxylesterases and lipolytic "G-D-X-G" enzymes (Table 1).


The Blocks Database and searching program (26, 27) were first made available by anonymous ftp in 1991. Version 9.1 of the Blocks Database contains 3,300 blocks representing 906 different protein families documented in version 12 of PROSITE, and is complemented by version 12.0 of the PRINTS Database (15) with 2,875 blocks from 550 families for a total of more than 1,000 unique families. With an median of 12 (mean

23) sequences per family, about 40% of SWISS-PROT (28) is represented in Blocks 9.1.

The BLIMPS searching program (v. 3.1) is currently used to query these databases. Both BLIMPS and the Blocks Database are freely available by anonymous ftp, however, most biologists prefer to use the Blocks Searcher e-mail service, which was initiated in the summer of 1992, or WWW service, initiated in 1994. The Blocks Searcher averages about 100 searches per day with the e-mail server handling slightly more than half of the requests.

ACCESSAnonymous FTP Siteftp://ncbi.nlm.nih.gov/repository/blocks

E-mail server blocks@blocks.fhcrc.org Send the word 'help' in the subject line or as the only word in the message body.

WWW server http://blocks.fhcrc.org/ Additional information or assistance may be obtained by sending an e-mail message to webmaster@blocks.fhcrc.org.


This work is supported by a grant from the NIH (GM29009). SP is a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation.


Henikoff, S. and Henikoff, J. G. (1991) Nucleic Acids Res., 19,6565-6572.


Henikoff, J. G. and Henikoff, S. (1996) Meth. Enzymol., 266,88-105.


Posfai, J., Bhagwat, A. S., Posfai, G. and Roberts, R. J. (1989) Nucleic Acids 3.

Res., 17,2421-2435.

Smith, H. O., Annau, T. M. and Chandrasegaran, S. (1990) Proc. Natl. Acad. Sci.


USA, 87,826-830.

Henikoff, S., Henikoff, J. G., Alford, W. J. and Pietrokovski, S. (1995) Gene, 5.


6. Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993) Science, 262,208-214.

Bairoch, A. (1992) Nucleic Acids Res., 20,2013-2018.


Henikoff, S. and Henikoff, J. G. (1994) Genomics, 19,97-107.


Gribskov, M., McLachlan, A. D. and Eisenberg, D. (1987) Proc. Natl. Acad. Sci.


USA, 84,4355-4358.

10. Brown, M. P., Hughey, R., Krogh, A., Mian, I. S., Sjolander, K. and Haussler, D.

(1993) In Hunter, L., Searls, D. and Shavlik, J. (eds), Proc. First Int. Conf. on Intelligent Systems for Molecular Biology AAAI Press, Washington D. C., pp. 47Tatusov, R. L., Altschul, S. F. and Koonin, E. V. (1994) Proc. Natl. Acad. Sci. USA, 11.


Henikoff, J. G. and Henikoff, S. (1996) CABIOS, 12,135-143.


13. Bailey, T. L. and Gribskov, M. (1996) In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA, pp. 15-24.

Henikoff, S. and Henikoff, J. G. (1992) Proc. Natl. Acad. Sci. USA, 89,10915Attwood, T. K. and Beck, M. E. (1994) Protein Engineering, 7,841-848.


Parry-Smith, D. J. and Attwood, T. K. (1992) CABIOS, 8,451-459.


17. Henikoff, S. and Henikoff, J. G. (1996),,submitted.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) J. Mol.


Biol., 215,403-410.

Pearson, W. R. and Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA, 85,2444Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994) Nucleic Acids Res., 20.


Saitou, N. and Nei, M. (1987) Mol. Biol. Evol., 4,406-425.


Felsenstein, J. (1988) Ann. Rev. Genet., 22,521-565.


Henikoff, S., Endow, S. A. and Greene, E. A. (1996) Trends Biochem. Sci., 21,in 23.


Pietrokovski, S. (1996) Nucleic Acids Res., 24,in press.


Schneider, T. D. and Stephens, R. M. (1990) Nucleic Acids Res., 18,6097-6100.


Henikoff, S., Wallace, J. C. and Brown, J. P. (1990) Meth. Enzymol., 183,111-132.


Wallace, J. C. and Henikoff, S. (1992) CABIOS, 8,249-254.


Bairoch, A. and Boeckmann, B. (1992) Nucleic Acids Res., 20,2019-2022.


Henikoff, S. and Henikoff, J. G. (1994) J. Mol. Biol., 243,574-578.



Figure 1.

Blocks from v. 9.1 of the Blocks Database representing the lipolytic "G-D-X-G" enzymes family.

Pages:   || 2 |

Similar works:

«Pittsburg State University Health, Human Performance, and Recreation EDITED BY: ANDREA GADDY&KIERSTEN MORRIS VOLUME 1, ISSUE 11 FACULTY  Dr. John Oppliger, Chair  Dr. Mike Carper  Ms. Laura Covert  Dr. Derek Crawford  Dr. Scott Gorman  Ms. Shelly Grimes  Dr. Rob Hefley  Dr. Janice Jewett  Mr. Ryan Metcalf  Mr. Cole Shewmake  Dr. Julia Spresser  Dr. Bill Stobart GRADUATE The Pittsburg State University Dr. Kenneth K. Bateman Outstanding Alumni Award...»

«DMSO NATURE'S HEALER Dr. Morton Walker (Deutsche Übersetzung der amerikanischen Originalausgabe) Inhaltsverzeichnis: Vorwort • Kapitel 1 Das Schmerzmittel mit einem Problem • Der neue medizinische Durchbruch bei Schmerzen • Die Abneigung der FDA gegenüber anderen DMSO Anwendungen • Fehler im FDA-Verfahren • Kapitel 2 Die kontroverse Vergangenheit des DMSO • Der Ursprung und die Herkunft von Dimethylsulfoxid • Die FDA betritt das Bild Die Kontroversität begann. • Die...»

«IMPLEMENTATION OF ELECTRONIC HEALTH RECORDS: MODELING AND EVALUATING HEALTHCARE INFORMATION SYSTEMS FOR QUALITY IMPROVEMENTS IN THE U.S.HEALTHCARE INDUSTRY by Vinata A. Kulkarni A Dissertation Presented in Partial Fulfillment Of the Requirements for the Degree Doctor of Philosophy Capella University October 2006 © Vinata Kulkarni, 2006 Abstract In spite of several innovative measures in the United States (U.S.) healthcare industry, the industry is not only lagging behind other industries in...»

«Aus der Medizinischen Klinik und Poliklinik I Klinikum Großhadern der Ludwig-Maximilians-Universität München Direktor: Prof. Dr. med. Gerhard Steinbeck G-CSF-Therapie zur adjuvanten Behandlung des verzögert revaskularisierten Myokardinfarktes (STEMI) Dissertation zum Erwerb des Doktorgrades der Medizin an der Medizinischen Fakultät der Ludwig-Maximilians-Universität zu München Vorgelegt von Christine Anna Maria Theiss aus Greiz Mit Genehmigung der Medizinischen Fakultät der Universität...»

«Youth detention population in Australia 2013 Juvenile Justice series no. 13 JUVENILE JUSTICE SERIES Number 13 Youth detention population in Australia Australian Institute of Health and Welfare Canberra Cat. no. JUV 31 The Australian Institute of Health and Welfare is a major national agency which provides reliable, regular and relevant information and statistics on Australia’s health and welfare. The Institute’s mission is authoritative information and statistics to promote better health...»

«CHAPTER 5: THE ROLE OF REMEDIAL AND DEVELOPMENTAL COURSES IN ACCESS AND PERSISTENCE Bridget Terry Long Angela Boatman The State of College Access and Completion: Improving College Success for Students from Underrepresented Groups Anthony Jones and Laura Perna, Editors New York: Routledge Books, 2013 Abstract In addition to the monetary benefits resulting from postsecondary attainment, research has also shown non-monetary benefits, such as better health and lower rates of government dependency...»

«Aus der Neurologischen Klinik und Poliklinik der Ludwig-Maximilians-Universität München Direktor: Prof. Dr. med. Dr. h. c. Thomas Brandt Funktionelle Expression von Parkin, Siah-1 und Dorfin unter definierten, mit Morbus Parkinson assoziierten, Zellstressbedingungen Dissertation zum Erwerb des Doktorgrades der Medizin an der Medizinischen Fakultät der Ludwig-Maximilians-Universität zu München vorgelegt von Klaus Lehmann-Horn aus München Jahr Mit Genehmigung der Medizinischen Fakultät der...»

«Ï ИММУНОЛОГИЯ ГЕМОПОЭЗА HÆMATOPOÏESIS IMMUNOLOGY Æ УДК 616.–006 UDK 616.–006 Периодическое научное издание.Выходит дважды в год Semi annual scientific oncoimmunological periodicals Основан в 2006 году Founded in 2006 1/2006, Том 3 1/2006, Vol. 3 Учредитель: ГУ РОНЦ имени Н.Н.Блохина РАМН Founder: State N.N. Blokhin Russian Cancer Research (лаборатория...»

«Tierärztliche Hochschule Hannover Vergleichende Darstellung der DNA-Methyltransferase 1 und 3a mittels Immunzytochemie in frühen bovinen Embryonalstadien (Tag 1 7) aus In-vivound In-vitro-Produktion INAUGURAL-DISSERTATION zur Erlangung des Grades einer DOKTORIN DER VETERINÄRMEDIZIN -Doctor medicinae veterinariaeDr. med. vet.) vorgelegt von Sonja Drallmeyer aus Lüdenscheid Hannover 2008 Wissenschaftliche Betreuung: Univ.Prof. Dr. Christine Wrenzycki (Reproduktionsmedizinische Einheit der...»

«Counterexample to a Claim About the Reconstruction of Ancestral Character States Brian Lucena Division of Computer Science University of California, Berkeley lucena@cs.berkeley.edu David Haussler Howard Hughes Medical Institute Department of Biomolecular Engineering University of California, Santa Cruz haussler@cse.ucsc.edu Since Pauling and Zuckerkandl first suggested it more than 40 years ago, the idea of reconstructing ancestral proteins and DNA sequences from the information contained in...»

«Plant Tissue Cult. & Biotech. 18(2): 103-111, 2008 (December) PTC&B In vitro Clonal Propagtion of Trichosanthes cucumerina L. var. cucumerina N. K. Devendra*, L. Rajanna, C. Sheetal and Y. N. Seetharam Biosystematics and Medicinal Plant Laboratory, Department of Botany, Gulbarga University, Gulbarga-585 106, India Key words: Trichosanthes cucumerina var. cucumerina, Clonal propagation, Shoot multiplication, Ontogeny Abstract An efficient protocol was established for in vitro shoot...»

«Immigration Orientation For New International Students Spring 2015 International Student & Scholar Services (ISSS) Modesto Maidique Campus (MMC) Division of Academic Affairs ISSS New International Student Orientation • ISSS Services and Operations • Health & Safety o Medical Insurance Requirement • Tips for Getting Started at FIU and in Miami • Immigration Regulations: Maintaining Status o Immigration Documents o SEVIS & DHS o Enrollment o International Travel o Employment o Getting...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.