XML for Bioinformatics

XML for Bioinformatics, 9780387230283 (0387230289), Springer, 2005

XML for Bioinformatics aims to provide biologists, software engineers, and bioinformatics professionals with a comprehensive introduction to XML and current XML applications in bioinformatics. The book will assume no background in XML, and take readers from basic to intermediate XML concepts. Core topics will include: fundamentals of XML, creating XML grammars, web services via SOAP, and parsing XML documents in Perl and Java.

The goal of this book is to introduce XML to a bioinformatics audience. It does so by introducing the fundamentals of XML, Document Type Definitions (DTDs),XMLNamespaces,XMLSchema, and XML parsing, and illustrating these concepts with specific bioinformatics case studies. The book does not assume any previous knowledge of XML and is geared toward those who want a solid introduction to fundamental XML concepts.

Bioinformatics represents a new field of scientific inquiry, devoted to answering questions about life and using computational resources to answer those questions. A key goal of bioinformatics is to create database systems and software platforms capable of storing and analyzing large sets of biological data. To that end, hundreds of biological databases are now available and provide access to a diverse set of biological data.

Given this diverse set of biological data, the exponential growth of biological data sets, and the desire to share data for open scientific exchange, the bioinformatics community is continually exploring new options for data representation, storage, and exchange. In the past few years, many in the bioinformatics community have turned to XML to address the pressing needs associated with biological data. XML, or Extensible Markup Language, is a technical specification originally created for data representation and exchange over the Internet. XML is an open standard, officially specified by the World Wide Web Consortium (W3C), and deliberately designed to be operating system and programming language independent.

XML is extensible to many application domains and has been successfully used to represent multiple types of data, including e-commerce transactions, search engine results, scalable vector graphics, and even voice recognition and voice synthesis. Since its introduction,XMLhas also been successfully used to represent a growing set of biological data, including nucleotide sequences, genome annotations, protein–protein interactions, and signal transduction pathways. XML also forms the backbone of biological data exchange, enabling researchers to aggregate data from multiple heterogeneous data sources.