| Bioinformatics can refer to almost any collaborative effort between biologists or geneticists and computer scientists and thus covers a wide variety of traditional computer science domains, including data modeling, data retrieval, data mining, data integration, data managing, data warehousing, data cleaning, ontologies, simulation, parallel computing, agent-based technology, grid computing, and visualization. However, applying each of these domains to biomolecular and biomedical applications raises specific and unexpectedly challenging research issues.
In this book, we focus on data management and in particular data integration, as it applies to genomics and microbiology. This is an important topic because data are spread across multiple sources, preventing scientists from efficiently obtaining the information required to perform their research (on average, a pharmaceutical company uses 40 data sources). In this environment, answering a single question may require accessing several data sources and calling on sophisticated analysis tools (e.g., sequence alignment, clustering, and modeling tools). While data integration is a dynamic research area in the database community, the specific needs of biologists have led to the development of numerous middleware systems that provide seamless data access in a results-driven environment (eight middleware systems are described in detail in this book).
The objective of the book is to provide life scientists and computer scientists with a complete view on biological data management by: (1) identifying specific issues in biological data management, (2) presenting existing solutions from both academia and industry, and (3) providing a framework in which to compare these systems. |