Show simple item record

dc.contributor.authorBenmounah, Zen
dc.contributor.authorMeshoul, Sen
dc.contributor.authorBatouche, Men
dc.contributor.authorLio, Pietroen
dc.date.accessioned2018-09-13T09:18:26Z
dc.date.available2018-09-13T09:18:26Z
dc.identifier.issn1568-4946
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/280255
dc.description.abstractClustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies.
dc.publisherElsevier
dc.subjectbig dataen
dc.subjectswarm intelligenceen
dc.subjectlarge-scale clusteringen
dc.subjectmap-reduceen
dc.subjectepigeneticsen
dc.subjectagingen
dc.titleParallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of agingen
dc.typeArticle
prism.endingPage783
prism.publicationNameApplied Soft Computingen
prism.startingPage771
prism.volume69en
dc.identifier.doi10.17863/CAM.27624
dcterms.dateAccepted2018-04-10en
rioxxterms.versionofrecord10.1016/j.asoc.2018.04.012en
rioxxterms.versionAM*
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
rioxxterms.licenseref.startdate2018-04-10en
dc.contributor.orcidLio, Pietro [0000-0002-0540-5053]
dc.identifier.eissn1872-9681
rioxxterms.typeJournal Article/Reviewen
cam.issuedOnline2018-04-27en
dc.identifier.urlhttps://www.sciencedirect.com/science/article/pii/S1568494618302035?via=ihub#!en
rioxxterms.freetoread.startdate2019-04-27


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record