Publications

Controlling taxa abundance improves metatranscriptomics differential analysis

A common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control for either DNA or taxa abundances to address their strong correlation with RNA abundance. However, it remains unknown if both factors need to be controlled for simultaneously. We discovered that when either DNA or taxa abundance is controlled for, RNA abundance still has a strong partial correlation with the other factor.

Palo: spatially aware color palette optimization for single-cell and spatial data

In the exploratory data analysis of single-cell or spatial genomic data, single cells or spatial spots are often visualized using a two-dimensional plot where cell clusters or spot clusters are marked with different colors. With tens of clusters, current visualization methods often assigns visually similar colors to spatially neighboring clusters, making it hard to identify the distinction between clusters.

findPC: An R package to automatically select the number of principal components in single-cell analysis

Principal component analysis is widely used in analyzing single-cell genomic data. Selecting the optimal number of principal components (PCs) is a crucial step for downstream analyses. The elbow method is most commonly used for this task, but it requires one to visually inspect the elbow plot and manually choose the elbow point. To address this limitation, we developed six methods to automatically select the optimal number of PCs based on the elbow method. We evaluated the performance of these methods on real single-cell RNA-seq data from multiple human and mouse tissues and cell types.