Journal : Biological Theory , vol. 4 , p. 29–43–15 , 2009
Publisher : MIT Press
International Standard Numbers
Printed : 1555-5542
Electronic : 1555-5550
Publication type : Academic article
Issue : 1
If you have questions about the publication, you may contact Nofima’s Chief Librarian.
Bioscientists generate far more data than their minds can handle, and this trend is likely to continue. With the aid of a small set of versatile tools for mathematical modeling and statistical assessment, bioscientists can explore their real-world systems without experiencing data overflow. This article outlines an approach for combining modern high-throughput, low-cost, but non-selective biospectroscopy measurements with soft, multivariate biochemometrics data modeling to overview complex systems, test hypotheses, and making new discoveries. From preliminary, broad hypotheses and goals, many relevant samples are selected and measured with respect to many informative variables. The resulting tables represent a ?cacophony? of data. From these, the most relevant and reliable ?underlying harmonies and rhythms? are extracted and tested statistically, displayed for interpretation, and used for prediction. Outliers are detected automatically. Interesting subsets of samples can then be chosen for in-depth analyses in subsequent research cycles. This pragmatic, top-down approach takes advantage of developments in both ?soft,? data-driven modeling and ?hard,? knowledge-driven cultures. Data analytical examples show how information-rich biospectroscopy can be used for characterizing and quantifying known and unknown chemical constituents and physical phenomena in intact biosamples. This is based on a combination of deductive ?hard? and inductive ?soft? modeling. The examples represent NIR spectra of biochemical mixtures, FTIR spectra of a microbiological fermentation process, and FTIR analysis of fatty acids in milk for functional genomics.