Rarify rarefy

8/24/2023

We have provided microbiome-specific extensions to these tools in the R package, phyloseq. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Current practice in the normalization of microbiome count data is inefficient in the statistical sense.

0 Comments

Rarify rarefy

Leave a Reply.

Author

Archives

Categories