Fortifying the analysis of Illumina data
Change log
Authors
Abstract
Microarrays are devices that allow for the high-throughput analysis of biological samples. ”High- throughput” meaning that, with some technologies, millions of points in the genome of interest can be investigated simultaneously. Such devices make use of DNA sequences (probes) attached to the microarray surface that hybridize to fluorescently labeled sample material. By contrast, microarrays manufactured by Illumina (BeadArrays) utilize miniature beads with probes attached and around 30 beads on the same array have the same probe attached. BeadArrays are now being widely used, but the default analysis pipeline uses Illumina’s own software (BeadStudio), that provides a summarized view of the data rather than utilizing the availability of 30 replicates of each probe. There is little information on the initial processing of Illumina data, which is a source of intense research for other microarrays (Alison et al. (2006)). Such processing is crucial so that the analysis is not influenced by the many sources of error inherent in microarray technologies. Although the various analyses one might eventually perform on the data generated from a microarray experiment vary according to the features of the genome being investigated (e.g. gene expression, single nucleotide polymorphism, copy number variation), there is a need for a quality assessment (QA) step to screen for defective hybridizations and imperfections on the array surface. We show how using our beadarray software (Dunning et al. (2006)) in conjunction with bead-level data we can offer improved QA. Not only are the intensities of each bead available with these data, but also the randomized positions of each bead on the array surface. Thus, we can look for regions on the surface that have unusual intensities arising from manufacturing errors. Beads that are sufficiently far from the average value of beads with the same sequence can be flagged as outliers. The locations and numbers of outliers on an array are useful indicators of array quality. We demonstrate how beadarray can be used to generate diagnostic plots, visualize such spatial artifacts, summarise QA measurements and discuss the benefits of excluding low quality data from the analysis. These benefits are applicable to all users of Illumina technology, regardless of the biological question being addressed.
Description
IASC2008,
Joint Meeting of
4th World Conference of the International Association for Statistical Computing
and
6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis
Pacifico Yokohama, Japan December 5 (Fri.) - 8 (Mon.), 2008