Human Genomics and Proteomics
Volume 2009 (2009), Article ID 869093, 13 pages
doi:10.4061/2009/869093
Review Article

Data Integration in Genetics and Genomics: Methods and Challenges

Jemila S. Hamid,1 Pingzhao Hu,2 Nicole M. Roslin,2 Vicki Ling,1,3 Celia M. T. Greenwood,4 and Joseph Beyene1,2,4

1Biostatistics Methodology Unit, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, ON, M5G 1X8, Canada
2The Center for Applied Genomics, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, ON, M5G 1X8, Canada
3Program in Developmental and Stem Cell Biology, The Hospital for Sick Children Research Institute, 555 University Avenue, Toronto, ON, M5G 1X8, Canada
4Dalla Lana School of Public Health, University of Toronto, 555 University Avenue, Toronto, ON, M5G 1X8, Canada

Received 25 September 2008; Accepted 1 December 2008

Academic Editor: Heikki Lehvaslaiho

Copyright © 2009 Jemila S. Hamid et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science, vol. 270, no. 5235, pp. 467–470, 1995.
  2. A. E. Oostlander, G. A. Meijer, and B. Ylstra, “Microarray-based comparative genomic hybridization and its applications in human genetics,” Clinical Genetics, vol. 66, no. 6, pp. 488–495, 2004.
  3. R. Aebersold and M. Mann, “Mass spectrometry-based proteomics,” Nature, vol. 422, no. 6928, pp. 198–207, 2003.
  4. A. Daemen, O. Gevaert, T. De Bie, et al., “Integrating microarray and proteomics data to predict the response on cetuximab in patients with rectal cancer,” Pacific Symposium on Biocomputing, vol. 13, pp. 166–177, 2008.
  5. D. M. Reif, B. C. White, and J. H. Moore, “Integrated analysis of genetic, genomic and proteomic data,” Expert Review of Proteomics, vol. 1, no. 1, pp. 67–75, 2004.
  6. T. C. Prevost, K. R. Abrams, and D. R. Jones, “Hierarchical models in generalized synthesis of evidence: an example based on studies of breast cancer screening,” Statistics in Medicine, vol. 19, no. 24, pp. 3359–3376, 2000.
  7. O. G. Troyanskaya, K. Dolinski, A. B. Owen, R. B. Altman, and D. Botstein, “A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae),” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 14, pp. 8348–8353, 2003.
  8. R. Jansen, N. Lan, J. Qian, and M. Gerstein, “Integration of genomic datasets to predict protein complexes in yeast,” Journal of Structural and Functional Genomics, vol. 2, no. 2, pp. 71–81, 2002.
  9. G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and S. Noble, “A statistical framework for genomic data fusion,” Bioinformatics, vol. 20, no. 16, pp. 2626–2635, 2004.
  10. A. Daemen, O. Gevaert, and B. De Moor, “Integration of clinical and microarray data with kernel methods,” in Proceedings of the 29th Annual International Conference of IEEE Engineering in Medicine and Biology Society (EMBC '07), pp. 5411–5415, Lyon, France, August 2007.
  11. S. Patel, D. Parmar, Y. K. Gupta, and M. P. Singh, “Contribution of genomics, proteomics, and single-nucleotide polymorphism in toxicology research and Indian scenario,” Indian Journal of Human Genetics, vol. 11, no. 2, pp. 61–75, 2005.
  12. D. R. Rhodes, J. Yu, K. Shanker, et al., “Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 25, pp. 9309–9314, 2004.
  13. “Affymetrix Microarray Suite User Guide,” Version 5, 2001.
  14. J. Listgarten and A. Emili, “Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry,” Molecular & Cellular Proteomics, vol. 4, no. 4, pp. 419–434, 2005.
  15. J. K. Choi, U. Yu, S. Kim, and O. J. Yoo, “Combining multiple microarray studies and modeling interstudy variation,” Bioinformatics, vol. 19, pp. i84–i90, 2003.
  16. L. J. Lu, Y. Xia, A. Paccanaro, H. Yu, and M. Gerstein, “Assessing the limits of genomic data integration for predicting protein networks,” Genome Research, vol. 15, no. 7, pp. 945–953, 2005.
  17. H. Jiang, Y. Deng, H.-S. Chen, et al., “Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes,” BMC Bioinformatics, vol. 5, article 81, pp. 1–12, 2004.
  18. Y. H. Yang, S. Dudoit, P. Luu, et al., “Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,” Nucleic Acids Research, vol. 30, no. 4, p. e15, 2002.
  19. R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope, B. Hobbs, and T. P. Speed, “Summaries of Affymetrix GeneChip probe level data,” Nucleic Acids Research, vol. 31, no. 4, p. e15, 2003.
  20. Z. Wu and R. A. Irizarry, “Preprocessing of oligonucleotide array data,” Nature Biotechnology, vol. 22, no. 6, pp. 656–658, 2004.
  21. B. H. Mecham, G. T. Klus, J. Strovel, et al., “Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements,” Nucleic Acids Research, vol. 32, no. 9, p. e74, 2004.
  22. P. I. W. de Bakker, M. A. R. Ferreira, X. Jia, B. M. Neale, S. Raychaudhuri, and B. F. Voight, “Practical aspects of imputation-driven meta-analysis of genome-wide association studies,” Human Molecular Genetics, vol. 17, no. R2, pp. R122–R128, 2008.
  23. Y. Li and G. R. Abecasis, “Mach 1.0: rapid haplotype reconstruction and missing genotype inference,” American Journal of Human Genetics, vol. S79, p. 2290, 2006.
  24. J. Marchini, B. Howie, S. Myers, G. McVean, and P. Donnelly, “A new multipoint method for genome-wide association studies by imputation of genotypes,” Nature Genetics, vol. 39, no. 7, pp. 906–913, 2007.
  25. P. Scheet and M. Stephens, “A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase,” American Journal of Human Genetics, vol. 78, no. 4, pp. 629–644, 2006.
  26. J. P. A. Ioannidis, N. A. Patsopoulos, and E. Evangelou, “Heterogeneity in meta-analyses of genome-wide association investigations,” PLoS ONE, vol. 2, no. 9, p. e841, 2007.
  27. A. S. Adler, M. Lin, H. Horlings, D. S. A. Nuyten, M. J. van de Vijver, and H. Y. Chang, “Genetic regulators of large-scale transcriptional signatures in cancer,” Nature Genetics, vol. 38, no. 4, pp. 421–430, 2006.
  28. S.A. McCaroll and D. Altshuler, “Copy-number variation and association studies of human disease,” Nature Genetics, vol. 39, pp. 37–42, 2007.
  29. F. Al-Shahrour, R. Díaz-Uriarte, and J. Dopazo, “Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information,” Bioinformatics, vol. 21, no. 13, pp. 2988–2993, 2005.
  30. J. Ott, Analysis of Human Genetic Linkage, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1999.
  31. E. Lander and L. Kruglyak, “Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results,” Nature Genetics, vol. 11, no. 3, pp. 241–247, 1995.
  32. N. E. Morton, “Sequential tests for the detection of linkage,” American Journal of Human Genetics, vol. 7, no. 3, pp. 277–318, 1955.
  33. E. S. Lander and P. Green, “Construction of multilocus genetic linkage maps in humans,” Proceedings of the National Academy of Sciences of the United States of America, vol. 84, no. 8, pp. 2363–2367, 1987.
  34. D. W. Fulker, S. S. Cherny, and L. R. Cardon, “Multipoint interval mapping of quantitative trait loci, using sib pairs,” American Journal of Human Genetics, vol. 56, no. 5, pp. 1224–1233, 1995.
  35. V. J. Vieland, “Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage,” American Journal of Human Genetics, vol. 63, no. 4, pp. 947–954, 1998.
  36. R. A. Fisher, Statistical Methods for Research Workers, Hafner, New York, NY, USA, 12th edition, 1954.
  37. D. B. Allison and M. Heo, “Meta-analysis of linkage data under worst-case conditions: a demonstration using the human OB region,” Genetics, vol. 148, no. 2, pp. 859–865, 1998.
  38. J. A. Badner and E. S. Gershon, “Regional meta-analysis of published data supports linkage of autism with markers on chromosome 7,” Molecular Psychiatry, vol. 7, no. 1, pp. 56–66, 2002.
  39. W.R. Rice, “A consensus combined P -value test and the family-wide significance of component tests,” Biometrics, vol. 46, pp. 303–308, 1990.
  40. L. H. Wise, J. S. Lanchbury, and C. M. Lewis, “Meta-analysis of genome searches,” Annals of Human Genetics, vol. 63, no. 3, pp. 263–272, 1999.
  41. Z. Li and D. C. Rao, “Random effects model for meta-analysis of multiple quantitative sibpair linkage studies,” Genetic Epidemiology, vol. 13, no. 4, pp. 377–383, 1996.
  42. R. DerSimonian and N. Laird, “Meta-analysis in clinical trials,” Controlled Clinical Trials, vol. 7, no. 3, pp. 177–188, 1986.
  43. C. Gu, M. Province, A. Todorov, and D. C. Rao, “Meta-analysis methodology for combining non-parametric sibpair linkage results: genetic homogeneity and identical markers,” Genetic Epidemiology, vol. 15, no. 6, pp. 609–626, 1998.
  44. K. E. Lohmueller, C. L. Pearce, M. Pike, E. S. Lander, and J. N. Hirschhorn, “Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease,” Nature Genetics, vol. 33, no. 2, pp. 177–182, 2003.
  45. E. Zeggini, M. N. Weedon, C. M. Lindgren, et al., “Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes,” Science, vol. 316, no. 5829, pp. 1336–1341, 2007.
  46. L. J. Scott, K. L. Mohlke, L. L. Bonnycastle, et al., “A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants,” Science, vol. 316, no. 5829, pp. 1341–1345, 2007.
  47. R. Saxena, B. F. Voight, V. Lyssenko, et al., “Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels,” Science, vol. 316, no. 5829, pp. 1331–1336, 2007.
  48. V. G. Tusher, R. Tibshirani, and G. Chu, “Significance analysis of microarrays applied to the ionizing radiation response,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 9, pp. 5116–5121, 2001.
  49. G. K. Smyth, “Linear models and empirical Bayes methods for assessing differential expression in microarray experiments,” Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, article 3, 2004.
  50. T. R. Golub, D. K. Slonim, P. Tamayo, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, no. 5439, pp. 531–537, 1999.
  51. L. J. van't Veer, H. Dai, M. J. van de Vijver, et al., “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, vol. 415, no. 6871, pp. 530–536, 2002.
  52. S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 77–87, 2002.
  53. L. Xu, A. C. Tan, D. Q. Naiman, D. Geman, and R. L. Winslow, “Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data,” Bioinformatics, vol. 21, no. 20, pp. 3905–3911, 2005.
  54. Y. Tan, L. Shi, W. Tong, and C. Wang, “Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data,” Nucleic Acids Research, vol. 33, no. 1, pp. 56–65, 2005.
  55. G. Bloom, I. V. Yang, D. Boulware, et al., “Multi-platform, multi-site, microarray-based human tumor classification,” American Journal of Pathology, vol. 164, no. 1, pp. 9–16, 2004.
  56. P. Warnat, R. Eils, and B. Brors, “Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes,” BMC Bioinformatics, vol. 6, article 265, pp. 1–15, 2005.
  57. J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer prediction and prognosis,” Cancer Informatics, vol. 2, pp. 59–78, 2006.
  58. J. R. Stevens and R. W. Doerge, “Combining Affymetrix microarray results,” BMC Bioinformatics, vol. 6, article 57, pp. 1–19, 2005.
  59. J. Wang, K. A. Do, S. Wen, et al., “Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer,” Cancer Informatics, vol. 2, pp. 87–97, 2006.
  60. P. Hu, C. M. T. Greenwood, and J. Beyene, “Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models,” BMC Bioinformatics, vol. 6, articel 128, pp. 1–11, 2005.
  61. R. P. DeConde, S. Hawley, S. Falcon, N. Clegg, B. Knudsen, and R. Etzioni, “Combining results of microarray experiments: a rank aggregation approach,” Statistical Applications in Genetics and Molecular Biology, vol. 5, no. 1, article 15, 2006.
  62. A. A. Shabalin, H. Tjelmeland, C. Fan, C. M. Perou, and A. B. Nobel, “Merging two gene-expression studies via cross-platform normalization,” Bioinformatics, vol. 24, no. 9, pp. 1154–1160, 2008.
  63. H. Cooper and L. V. Hedges, The Handbook of Research Synthesis, Russell Sage, New York, NY, USA, 1994.
  64. L. V. Hedges and I. Olkin, Statistical Methods for Meta-Analysis, Academic Press, Orlando, Fla, USA, 1995.
  65. G. Parmigiani, E. S. Garrett-Mayer, R. Anbazhagan, and E. Gabrielson, “A cross-study comparison of gene expression studies for the molecular classification of lung cancer,” Clinical Cancer Research, vol. 10, no. 9, pp. 2922–2927, 2004.
  66. D. R. Rhodes, T. R. Barrette, M. A. Rubin, D. Ghosh, and A. M. Chinnaiyan, “Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer,” Cancer Research, vol. 62, no. 15, pp. 4427–4433, 2002.
  67. P. Hu, C. M. T. Greenwood, and J. Beyene, “Statistical methods for meta-analysis of microarray data: a comparative study,” Information Systems Frontiers, vol. 8, no. 1, pp. 9–20, 2006.
  68. X. Yang and X. Sun, “Meta-analysis of several gene lists for distinct types of cancer: a simple way to reveal common prognostic markers,” BMC Bioinformatics, vol. 8, articel 118, pp. 1–17, 2007.
  69. D. Tritchler, “Modelling study quality in meta-analysis,” Statistics in Medicine, vol. 18, no. 16, pp. 2135–2145, 1999.
  70. P. Hu, J. Beyene, and C. M. T. Greenwood, “Tests for differential gene expression using weights in oligonucleotide microarray experiments,” BMC Genomics, vol. 7, article 33, pp. 1–15, 2006.
  71. S. Heber and B. Sick, “Quality assessment of Affymetrix GeneChip data,” OMICS: A Journal of Integrative Biology, vol. 10, no. 3, pp. 358–368, 2006.
  72. M. Morley, C. M. Molony, T. M. Weber, et al., “Genetic analysis of genome-wide variation in human gene expression,” Nature, vol. 430, no. 7001, pp. 743–747, 2004.
  73. V. G. Cheung, R. S. Spielman, K. G. Ewens, T. M. Weber, M. Morley, and J. T. Burdick, “Mapping determinants of human gene expression by regional and genome-wide association,” Nature, vol. 437, no. 7063, pp. 1365–1369, 2005.
  74. B. E. Stranger, M. S. Forrest, A. G. Clark, et al., “Genome-wide associations of gene expression variation in humans,” PLoS Genetics, vol. 1, no. 6, p. e78, 2005.
  75. P. Hu, H. Lan, W. Xu, J. Beyene, and C. M. T. Greenwood, “Identifying cis- and trans-acting single-nucleotide polymorphisms controlling lymphocyte gene expression in humans,” BMC Proceedings, vol. 1, pp. 1–5, 2007.
  76. E. Parkhomenko, D. Tricheler, and J. Beyene, “Genome-wide sparse canonical correlation of gene expression with genotypes,” BMC Proceedings, vol. 1, pp. 1–5, 2007.
  77. J. R. Pollack, T. Sørlie, C. M. Perou, et al., “Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 20, pp. 12963–12968, 2002.
  78. A. M. Levin, D. Ghosh, K. R. Cho, and S. L. R. Kardia, “A model-based scan statistic for identifying extreme chromosomal regions of gene expression in human tumors,” Bioinformatics, vol. 21, no. 12, pp. 2867–2874, 2005.
  79. J. Huang, H.-H. Sheng, T. Shen, et al., “Correlation between genomic DNA copy number alterations and transcriptional expression in hepatitis B virus-associated hepatocellular carcinoma,” FEBS Letters, vol. 580, no. 15, pp. 3571–3581, 2006.
  80. P. Platzer, M. B. Upender, K. Wilson, et al., “Silence of chromosomal amplifications in colon cancer,” Cancer Research, vol. 62, no. 4, pp. 1134–1138, 2002.
  81. B. A. Cohen, R. D. Mitra, J. D. Hughes, and G. M. Church, “A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression,” Nature Genetics, vol. 26, no. 2, pp. 183–186, 2000.
  82. A. Ben-Hur and W. S. Noble, “Kernel methods for predicting protein-protein interactions,” Bioinformatics, vol. 21, pp. i38–i46, 2005.
  83. A. Subramanian, P. Tamayo, V. K. Mootha, et al., “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 43, pp. 15545–15550, 2005.
  84. B. Efron and R. Tibshirani, “On testing the significance of sets of genes,” Annals of Applied Statistics, vol. 1, no. 1, pp. 107–129, 2007.