Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
View / Open Files
Authors
Benmounah, Z
Meshoul, S
Batouche, M
Lio, P
Publication Date
2018Journal Title
Applied Soft Computing
ISSN
1568-4946
Publisher
Elsevier
Volume
69
Pages
771-783
Type
Article
This Version
AM
Metadata
Show full item recordCitation
Benmounah, Z., Meshoul, S., Batouche, M., & Lio, P. (2018). Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging. Applied Soft Computing, 69 771-783. https://doi.org/10.1016/j.asoc.2018.04.012
Abstract
Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies.
Keywords
big data, swarm intelligence, large-scale clustering, map-reduce, epigenetics, aging
Identifiers
External DOI: https://doi.org/10.1016/j.asoc.2018.04.012
This record's URL: https://www.repository.cam.ac.uk/handle/1810/280255
Rights
Licence:
http://creativecommons.org/licenses/by-nc-nd/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk