ABI Advances in Bioinformatics 1687-8035 1687-8027 Hindawi Publishing Corporation 876976 10.1155/2012/876976 876976 Research Article A High-Throughput Computational Framework for Identifying Significant Copy Number Aberrations from Array Comparative Genomic Hybridisation Data Roberts Ian ian.roberts@cantab.net 1 Carter Stephanie A. stephanie.carter@cancer.org.uk 1 Scarpini Cinzia G. cgs1001@cam.ac.uk 1 Karagavriilidou Konstantina k_konstantina@yahoo.gr 1 Barna Jenny C. J. jcjb@hermes.cam.ac.uk 2 Calleja Mark mc321@cam.ac.uk 3 Coleman Nicholas nc109@cam.ac.uk 1 Van de Peer Yves 1 Department of Pathology University of Cambridge Tennis Court Road Cambridge CB2 1QP UK cam.ac.uk 2 Department of Biochemistry University of Cambridge Tennis Court Road Cambridge CB2 1QW UK cam.ac.uk 3 The Cavendish Laboratory University of Cambridge J. J. Thomson Avenue Cambridge CB3 0HE UK cam.ac.uk 2012 13 9 2012 2012 14 03 2012 22 06 2012 26 06 2012 2012 Copyright © 2012 Ian Roberts et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reliable identification of copy number aberrations (CNA) from comparative genomic hybridization data would be improved by the availability of a generalised method for processing large datasets. To this end, we developed swatCGH, a data analysis framework and region detection heuristic for computational grids. swatCGH analyses sequentially displaced (sliding) windows of neighbouring probes and applies adaptive thresholds of varying stringency to identify the 10% of each chromosome that contains the most frequently occurring CNAs. We used the method to analyse a published dataset, comparing data preprocessed using four different DNA segmentation algorithms, and two methods for prioritising the detected CNAs. The consolidated list of the most commonly detected aberrations confirmed the value of swatCGH as a simplified high-throughput method for identifying biologically significant CNA regions of interest.