Challenges in the multivariate analysis of mass cytometry data: the effect of randomization

by G. Papoutsoglou, V. Lagani, A. Schmidt, K. Tsirlis, D. Gomez-Cabrero, J. Tegner, I. Tsamardinos


Cytometry, Volume95, Issue11, November 2019, Pages 1178-1190


Cytometry by time‐of‐flight (CyTOF) has emerged as a high‐throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high‐dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high‐dimensional analytical tools.

Challenges in the multivariate analysis of mass cytometry data.pdf 


Mass cytometry Pre‐processing High dimensional Data analysis Randomization Clustering algorithms

"KAUST shall be a beacon for peace, hope and reconciliation, and shall serve the people of the Kingdom and the world."

King Abdullah bin Abdulaziz Al Saud, 1924 – 2015

Contact Us

  • 4700 King Abdullah University of Science and Technology

    Thuwal 23955-6900, Kingdom of Saudi Arabia


Quick links

© King Abdullah University of Science and Technology. All rights reserved