Repository logo
 

CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

Published version
Peer-reviewed

Repository DOI


Loading...
Thumbnail Image

Type

Article

Change log

Authors

Fidaner, Işık Barış 
Cankorur-Cetinkaya, Ayca 
Kirdar, Betul 
Cemgil, Ali Taylan 

Abstract

MOTIVATION: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. RESULTS: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. AVAILABILITY AND IMPLEMENTATION: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. CONTACT: sgo24@cam.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Description

Keywords

Algorithms, Bayes Theorem, Cluster Analysis, Gene Expression Profiling, Models, Statistical, Software

Journal Title

Bioinformatics

Conference Name

Journal ISSN

1367-4803
1367-4811

Volume Title

32

Publisher

Oxford University Press (OUP)
Sponsorship
Biotechnology and Biological Sciences Research Council (BB/K011138/1)
European Commission (289126)
This work was supported by the Turkish State Planning Organization [DPT09K120520 to B.K.]; the Bogazici University Research Fund [10A05D4 to B.K., 08A506 to B.K., 6882-12A01D5 to A.T.C.]; TUBITAK [106M444 to B.K., 110E292 to A.T.C.], Biotechnology and Biological Sciences Research Council [BRIC2.2 grant BB/K011138/1 to S.G.O.]; and EU 7th Framework Programme [BIOLEDGE Contract No: 289126 to S.G.O.].