Computational and Applied Math Seminar 3:00 PM MSCS 514
Importance of using appropriate distance metrics for Kernel Distance Covariance tests on compositional data: Demonstration using cell-type abundance in single-cell mass cytometry data Pratyaydipta Rudra, Oklahoma State University Host: Lucas Stolerman
Abstract: Analysis of compositional data involves using statistical methods specially designed for such
data which is represented by points on a simplex. Using methods based on usual distance
metrics such as Euclidean distance has been shown to have problematic results. However, use
of inappropriate distance continues to be a common practice in many biological applications.
For example, methods based on Kernel Distance Covariance (KDC) or Kernel Machine
Regression (KMR) for compositional biomedical data often use metrics that are not appropriate.
We demonstrate the consequences of this in the context of cell type abundance data arising
from mass cytometry.
Mass cytometry data are often clustered into cell sub-populations first, which can then
be used to answer scientific questions regarding the abundance of cell types and expressions of
specific parameters. It is often clinically interesting to know if the abundance of the cell
subpopulations is different across two or more groups or conditions. Testing the global null
hypothesis of differential cell type abundance should involve appropriate treatment of the
multivariate cell type compositions. We developed a method based on the KDC approach using
a metric appropriate for such data. Comparison with a similar method using an inappropriate
metric demonstrates the risks of ignoring the compositionality. We also demonstrate that our
method is robust and powerful using simulations and real data analysis.
To add/edit talks, please log in on the department web page, then return to Announce. Alternatively if you know the Announce
username/password, click the link below: