FStitch: A fast and simple algorithm for detecting nascent RNA transcripts
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. More; traditional assays, such as RNA-seq, measure steady state RNA levels, combining the e ects of transcription, post-transcriptional processing, and RNA stability. A detailed study of GRO-seq data has the potential to inform on many aspects of the transcriptional process. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, a hidden Markov model (HMM) and logistic regression, to robustly classify which regions of the genome are transcribed. Our algorithm builds on the strengths of previous approaches but is accurate, dependent on very little training data, robust to varying read depth, annotation agnostic, and fast.