Converting Life into Numbers

Received Dec 2 nd , 2012 Revised Dec 8 th , 2012 Accepted Dec 12 th , 2012 Biological data exists in several layers from genome sequence to networks and beyond. Information that comes from environment passes through layers of viscosity within organisms and is transformed into an output released back into the environment. Given enormous data generation, biology is increasingly becoming a computational problem. Here in this article, various computational needs are abstracted with a view to offer the future requirement of the community. Keyword:


INTRODUCTION
Organisms are made of parts.Parts interact to make pathways and pathways interact to make organisms.In this process of organism construction, information travels from environment to parts and back to sustain what we call life.From a Hydrogen atom to the whole cell level, information travels at least six orders of magnitude in size.From a pairwise interaction to a large network interaction, information navigates its path through various possibilities.This search for optimal path is further governed by probabilities, gradients, spatiotemporal complexities and emergent phenomena.
To understand how the whole cell system works, the approach of ontological reductionism seems to be reasonable.First, a cell is reduced to a set of parts (genes, RNA, proteins).This has been well achieved by the classic cell biology approach.Second, by artificially converting parts into junk (knockdown, mutations) and garbage (knockouts), one learns how organisms use them to enable specific functions.The human genome sequencing project was the big first step to know parts at the DNA level.Third,it is important to know how these parts copy themselves to make other parts like RNA and proteinsthrough transcription and translation.Fourth, how the three dimensional structure of these parts looks like?Fifth, how cell parts express themselves under various environmental contexts.Sixth, how do parts interact e.g., protein-protein, protein-DNA, protein-RNA, RNA-RNA, RNA-DNA, to transfer information.Seventh, how pathways are dynamically constructed from set of interactions.Eighth -how networks are made from a set of interacting pathways.Ninth, how cells interact to maintain a certain community behavior.
Thus, the information that came from environment and moved through layers of viscosity (from parts to networks), is now released into the environment either at the single cell level (prokaryotes) or multi-cell level (eukaryotes).
As one can imagine, every layer is composed of enormous data right from its structure to dynamics.To capture the information at various levels and produce an integrated view of the whole, one needs to make use of a strong logic, mathematics and computers.Traditionally theoretical and experimental approaches have been main pillars of studying biology.However, computational biology and bioinformatics have emerged as a third viable approach, to help understand systems and also design new ones.
Computational biology is largely about developing the right kind of tools for collecting, managing and studying data.Bioinformatics is about using existing computational tools to find patterns and answer specific questions in biology.Both the approaches are complementary and necessary to understand organism in terms of the community behavior of its constituent parts.
Converting biological data into numbers is reasonably straightforward at the genome, RNA and protein sequence levels, due to availability of data.However, as we move into structural, expression and interaction space, the data uncertainity keeps increasing, due to probabilistic nature of the events that haven't been completely understood.Thus, in addition to effectively capturing what is known, one needs to predict the unknown in the form of parts, their structure, expression and interactions.Here the right set of algorithms and tools are absolutely necessary to mimic biology as close as possible.
Given enormous omics data generated since early 90s, a new unexpected requirement has recently emerged i.e., how to store and analyze sequence, structure, expression and intearction data, across all the layers of molecular features.To address this, compression algorithms are being designed specific to biological data.The standard data compression algorithms were mostly written for computer based text, images, audio or video data.In future, the community needs to develop novel algorithms specific to compression of biological data and run analysis without uncompressing the data.For the past two decades, bioinformatics community has provided a number of invaluable tools and the data patterns leading to an enhanced understanding of biological systems.Now, it would be useful to ask: what kind of developments and publications will most likely appear till 2020.
In my view, standardization of methods to generate, deposit and publish data in a novel format that allow automated model building from publications, will increasingly appear in good bioinformatics journals.Second, we will see development of algorithms to assess and extract quality data from the published literature.Third, papers describing methods to compress biological data and run analysis pipelines on compressed data will increasingly emerge.This will be particularly useful for huge metagenomics and whole genome datasets.Fourth, open source platforms that integratemolecular sequence, expression, structure and pathway data will be released.This will accelerate genomic medicine applications.Fifth, the experimentally validated network models will find increasing space in the journals leading to computer aided design of organisms.Clearly, this is only a tiny and partial snapshot of what to expect in future.It is evident that an enormous unexplored data space from genome sequence to the cell-cell interactions is waiting to be captured, understood and used towards designing novel applications.
To conclude, computational and bioinformatics approaches are based on good science.However, in my view converting life into numbers is an art more than science... for life itself is a piece of art.