Two factors dominate current molecular biology: the amount of raw data is increasing very rapidly and successful applications in biomedical research require carefully curated and annotated databases. The quality of the data - especially nucleic acid sequences - is satisfactory; however, annotations depend on features inferred from the data rather than measured directly, for instance the identification of genes in genome sequences. It is essential that these inferences are as accurate as possible and this requires human intervention.
Many new sequences are emerging from genomics projects and many new protein structures are now being determined using X-ray crystallography, nuclear magnetic resonance spectroscopy and cryo-electron microscopy. Without direct experimental evidence there is considerable difficulty in assigning function to proteins from their sequences or even from their proteins. This applies even to homologues of well-characterised proteins, because of the recruitment of similar proteins for divergent functions. Furthermore, correct classification of sequences, structures and functions often requires sensitivity to very delicate features. Computer programs can aid to some extent but cannot to the whole job reliably - again manual curation is essential. Proteomics studies on spatial and temporal protein expression patterns provide additional streams of data that require human interpretation to resolve fine details.
With the recognition of the importance of accurate database annotation and the requirement for individuals with particular constellations of skills to carry it out, annotators are emerging as specialists within the profession of bioinformatics. This book compiles information about annotation - its current status, what is required to improve it, what skills must be brought to bear on database curation and hence what is the proper training for annotators.
This book should be essential reading for all people working on biological databases, both biologists and computer scientists. It will be also be of interest to all users of such databases, inclduing molecular biologists, geneticists, protein chemists, clinicians and drug developers.