Summary
PolyAMiner-Bulk, a deep-learning-based algorithm to decode alternative polyadenylation (APA) dynamics from bulk RNA sequencing (RNA-seq) data, enables scientists to identify and quantify APA events from processed bulk RNA-seq data. The protocol allows researchers to explore differential APA usage between two conditions and gain a better understanding of post-transcriptional regulatory mechanisms. The major steps involve input data preparation, executing PolyAMiner-Bulk, and interpreting the results. A basic familiarity with pre-processing bulk RNA-seq data and command-line tools is suggested.
For complete details on the use and execution of this protocol, please refer to Jonnakuti et al.1
Subject areas: bioinformatics, genomics, RNA-seq
Graphical abstract

Highlights
-
•
APA analysis with PolyAMiner-Bulk to decode and compare APA dynamics in RNA-seq
-
•
Step-by-step guide from installation to analysis in high-performance environments
-
•
Generate gene-level visualizations, PCA plots, volcano plots, and APA heatmaps
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
PolyAMiner-Bulk, a deep-learning-based algorithm to decode alternative polyadenylation (APA) dynamics from bulk RNA sequencing (RNA-seq) data, enables scientists to identify and quantify APA events from processed bulk RNA-seq data. The protocol allows researchers to explore differential APA usage between two conditions and gain a better understanding of post-transcriptional regulatory mechanisms. The major steps involve input data preparation, executing PolyAMiner-Bulk, and interpreting the results. A basic familiarity with pre-processing bulk RNA-seq data and command-line tools is suggested.
Before you begin
Timing: 1 h
To demonstrate the analysis workflow, the protocol below describes the specific steps for downloading and executing PolyAMiner-Bulk on included simulated bulk RNA-seq demo data (in BAM format).1 We recommend that users have a basic familiarity with pre-processing bulk RNA-seq data and command-line tools. Moreover, given the high-throughput nature of the analysis, utilizing a high-performance computing cluster is recommended to execute most analysis steps efficiently.
Download Docker as well as PolyAMiner-Bulk tool and environment dependencies.
-
1.
Install Docker from the official Docker website at https://docs.docker.com/engine/install/.
-
2.Launch Docker.
-
a.Once the installation is finished, Docker should be ready to use. Look for the Docker icon in your system tray or menu bar and click on it to launch Docker.
-
a.
-
3.Verify the installation.
-
a.After Docker has launched, you can verify the installation by opening a terminal or command prompt and running the following command:>docker --version
-
b.If Docker is successfully installed, it will display the version number and other relevant information.
-
a.
-
4.Download PolyAMiner-Bulk Docker image.
-
a.In the terminal or command prompt, run the following command:
-
a.
>docker pull venkatajonnakuti/polyaminer-bulk
-
5.Verify that the PolyAMiner-Bulk docker image has been successfully downloaded.
-
a.In the terminal or command prompt, run the following command:>docker images
-
b.If the PolyAMiner-Bulk docker image is successfully downloaded, it will display “venkatajonnakuti/polyaminer-bulk” under the “REPOSITORY” heading.Note: macOS and Windows users will need to download Docker applications that are compatible with their system.Note: It's important to consider that users' systems should be able to handle the workload of their input data. As an example, our demo run was executed on a Linux server, boasting 500 GB Memory and 64 Processing Cores (2x Intel(R) Xeon(R) Silver 4314 CPU @ 2.40 GHz).
-
a.
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Software and algorithms | ||
| Docker | Docker | https://www.docker.com/ |
| PolyAMiner-Bulk | Jonnakuti et al.1 |
https://github.com/YalamanchiliLab/PolyAMiner-Bulk.git https://hub.docker.com/repository/docker/venkatajonnakuti/polyaminer-bulk/general |
Step-by-step method details
Timing: 6 h
Process bulk RNA-seq data (in BAM format) for alternative polyadenylation analysis using PolyAMiner-Bulk.
In this major step, we describe how you can successfully execute PolyAMiner-Bulk, and we also explain each associated parameter.
-
1.To run the Docker command, create an empty directory on the host system, such as ‘<host_path>/Test/output’. . This directory will serve as the output folder that will contain the results from the PolyAMiner-Bulk analysis.
-
a.This creates the Test directory and the output directory inside it if they do not already exist. Open terminal or command prompt and run the following command to initiate bulk RNA-seq-based alternative polyadenylation analysis with PolyAMiner-Bulk:> docker run -it -v <host_path>/Test/output:/root/output venkatajonnakuti/polyaminer-bulk -fasta /root/PolyAMiner-Bulk/ReferenceFiles/GRCh38.primary_assembly.genome.fa -gtf /root/PolyAMiner-Bulk/ReferenceFiles/gencode.v33.primary_assembly.annotation.gtf -o /root/output -c1 /root/PolyAMiner-Bulk/Demo/control1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control3_.subset.sorted.bam -c2 /root/PolyAMiner-Bulk/Demo/treatment1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment3_.subset.sorted.bam -ignore UTR5,CDS,Intron,UN -modelOrganism human -visualizeTopNum 10
-
b.-it: Combination of two separate flags: `-i` and `-t`, which are used together to control the interaction with a Docker container when using the `docker run` command. `-i` stands for “interactive,” enabling an interactive mode that allows you to keep the standard input (STDIN) open on the container and provide input to the running container. `-t` stands for “pseudo-tty.” It allocates a pseudo-terminal or TTY inside the container, which provides an interface for input and output. When you combine both flags (`-it`), you enable both interactive mode and pseudo-tty allocation. Taken together, this flag will start a new container based on the polyaminer-bulk docker image, and you will be dropped into an interactive shell within the container.
-
c.-v: Used in the `docker run` command to mount volumes inside a Docker container. Volumes provide a way to persist data and share files between the host machine (where Docker is running) and the container. When using the `-v` parameter, you specify the volume to mount in the following format:>docker run -v <host_path>:<container_path> …
-
i.`<host_path>`: This is the path on the host machine where the data or files reside. It can be an absolute path or a relative path. For example, in our demo run, <host_path> is <host_path>/Test/output. This refers to the same folder created in step 1.
-
ii.`<container_path>`: This is the corresponding path inside the container where the volume will be mounted. It can also be an absolute or relative path. For example, in our demo run, <container_path> ios :/root/output.Note: By specifying the `-v` parameter, you instruct Docker to create a bind mount, which essentially links the specified directory or file on the host machine to the specified location inside the container. Any changes made to the files in the volume from either the host or the container will be reflected in both places. Volumes are useful for scenarios where you want to persist data across container restarts or provide access to data from the host machine to the container. They enable seamless communication and data sharing between the container and the host environment. The `-v` parameter can also be used multiple times to mount multiple volumes simultaneously. In this example, we create a bind mount between our host machine (shown in red) and the container (shown in blue) so that our results persist upon successful execution of PolyAMiner-Bulk.
CRITICAL: If you want to analyze your own processed bulk RNA-seq data (in BAM format), you should simply add another ‘-v’ parameter to the run command in the previously described format to mount the host directory containing the input data on to the PolyAMiner-Bulk Docker container.
-
i.
-
d.-fasta: This is the container directory path to the FASTA file. In our run command, we include a container path to a pre-included FASTA file that contains the assembled and annotated sequences of the Homo sapiens genome version GRCh38.
-
e.-gtf: This is the container directory path to the GTF file. GTF stands for Gene Transfer Format, which is a file format commonly used in bioinformatics to store gene annotation information and provide a structured representation of genomic features such as genes, exons, transcripts, and other functional elements. In our run command, we include a container path to a pre-included GTF file that contains gene annotation information for the Homo sapiens genome version GRCh38.
CRITICAL: Ensure that the inputs for the ‘-fasta’ and ‘-gtf’ parameters are the appropriate versions for accurate and consistent analysis. In fact, they should be the same FASTA and GTF files used to align bulk RNA-seq data to generate the processed bulk RNA-seq data (in BAM format). Using incorrect or mismatched genome or organism-specific versions of GTF or FASTA files can lead to errors in analysis and interpretation of the data. -
f.-o: This is the container directory path (shown in blue) that corresponds to the host directory path (shown in red) that will contain the results from the successful execution of PolyAMiner-Bulk.
-
g.-c1: This is the comma-separated string of file paths to the binary alignment files (BAMs) of bulk RNA-seq data corresponding to condition 1.
-
h.-c2: This is the comma-separated string of file paths to the BAMs of bulk RNA-seq data corresponding to condition 2.The inputs for the ‘-c1’ and ‘-c2’ parameters are comma-separated strings. For example, for the ‘-c1’ parameter, the input should be as follows ‘<file1_path>,<file2_path>’. There is no space between the commas.
-
i.-modelOrganism: C/PAS-BERT, the deep learning model that underlies PolyAMiner-Bulk, is trained on human and mice genomes. For that reason, PolyAMiner-Bulk only works on bulk RNA-seq data derived from either humans or mice. Specify the organism of the bulk RNA-seq data with either the ‘human’ or ‘mouse’ flag.
-
j.-ignore: This is a comma-separated string of regions that the user would like to ignore from APA analysis. Typically, for 3′ UTR-only analysis, one would use the following string: “UTR5,CDS,Intron,UN”
-
k.-visualizeTopNum: This is a string containing the number of top differential APA genes ranked by PolyAIndex magnitude) that the user wants to visualize. For example, a parameter of 10 would visualize the top 10 differential APA genes undergoing shortening (negative PolyAIndex) and lengthening (positive PolyAIndex).
-
a.
Optional parameters
-
l.
-s: This is a string containing information regarding the strandedness of the input data. Use 0 for un-stranded, 1 for forward-stranded, and 2 for reverse-stranded data.
-
m.
-softclippedNumReads: Since alternative polyadenylation is a relatively non-specific process by which cleavage and polyadenylation can occur within a range of a few nucleotides from a C/PAS, we equipped PolyAMiner-Bulk with two C/PAS deconvolution modes: (i) softclipped and a priori clustering, as well as (ii) softclipped-assisted clustering. De novo and a priori C/PASs are clustered in both modes based on a user-defined cluster distance parameter (default = 30 bp), and PolyAMiner-Bulk selects the most distal C/PAS within a cluster. Notably, in the softclipped-assisted clustering mode, PolyAMiner-Bulk only keeps softclipped-supported clusters. This mode allows for additional specificity in selecting C/PASs supported by the dataset. To activate the softclipped-assisted clustering mode, specify the minimum number of softclipped reads required for a cluster to be kept using this parameter.
-
n.
softclippedNumSamples: You can refine the softclipped-assisted clustering mode further by specifying the minimum number of unique samples that must meet the criteria mentioned above.
-
o.
-pa_p: To mitigate the impact of sequencing noise, PolyAMiner-Bulk incorporates a p-over-a_m function to eliminate noisy C/PASs (cleavage and polyadenylation sites). This function ensures that a minimum proportion of “p” samples have “a” or more reads, with an overall minimum threshold of “m” reads. For instance, if the values of p, a, and m are set to 0.6, 5, and 3 respectively, polyadenylation sites with fewer than 5 reads in at least 60% of the samples, and an overall minimum count below 3, will be filtered out. This parameter relates to the “p” proportion of samples for the p-over-a_m function.
-
p.
-pa_a: This parameter relates to the “a” number of reads for the p-over-a_m function.
-
q.-pa_m: This parameter relates to the overall minimum of “m” reads for the p-over-a_m function.Note: For optimal and consistent results, it is advisable to use higher values for the p-over-a_m parameter. However, if there is a significant level of heterogeneity within the samples, it is recommended to use lower values for the p-over-a_m parameter.
Expected outcomes
The PolyAMiner-Bulk protocol is expected to generate several comprehensive output files that provide key insights into alternative polyadenylation (APA) events. Researchers can anticipate a series of summary files that encapsulate crucial metrics related to differential APA across genes, including gene-level statistics such as PolyAIndex, number of cleavage and polyadenylation sites (C/PASs), and significant changes in APA events. For example, the file PolyAminer_Out_PolyA-miner.Results.txt will include detailed gene-specific information, such as the number and location of C/PASs and associated DeltaU values, while PolyAminer_Out_DEG-results.txt will provide data on differentially expressed genes, including log2FoldChanges and adjusted p-values.
In addition to summary files, visualizations will be produced to aid in interpreting the results. For instance, t-SNE and PCA plots will be provided (Figures 1A–1D), illustrating gene expression patterns (e.g., PolyAminer_Out_Gene.t-SNE.tiff and PolyAminer_Out_Gene.PCA.tiff) as well as APA-specific patterns (e.g., PolyAminer_Out_PA.t-SNE.tiff and PolyAminer_Out_PA.PCA.tiff). These visualizations will help distinguish between control and treatment conditions, with additional plots such as volcano plots and heatmaps (Figures 1E–1H, e.g., PolyAminer_Out_overallDegVolcanoPlot.tiff) highlighting significant changes in gene expression and APA events.
Figure 1.
Summary visualizations from PolyAMiner-Bulk
(A) Gene t-SNE Visualization.
(B) C/PAS t-SNE Visualization.
(C) Gene PCA.
(D) C/PAS PCA.
(E) Overall DEG Volcano Plot.
(F) DAG Volcano Plot.
(G) APA Factor DEG Heatmap.
(H) Overall DEG Heatmap.
On a more granular level, the protocol will generate gene-specific visualizations for the top differential APA genes, providing insights into read density patterns along both the 3′ UTR and the entire gene (Figures 2A–2D). Files such as PolyAminer_Out_"GeneName".DAG_Track_3UTRView.svg will provide detailed views of APA events along the 3′ UTR for specific genes, while PolyAminer_Out_"GeneName".PseudoPAC_DAG_Track_WholeGeneView.svg will offer whole-gene read density visualizations. These visualizations will be crucial in identifying the biological relevance of APA shifts for key genes under study.
Figure 2.
Gene-level data visualization outputs from PolyAMiner-Bulk
(A) Read Density of DAG 3′ UTR View.
(B) Read Density Heatmap.
(C) Pseudo-PAC Read Density of DAG 3′ UTR View.
(D) Pseudo-PAC Read Density of DAG Whole Gene View.
Intermediate files generated during the analysis, including stranded BigWig files (e.g., Treatment or control (#1–3).subset.sortedpseudoPAC_reverse.bw and Treatment or control (#1–3).subset.sorted_forward.bw), will capture strand-specific information on APA events, providing further context on the directionality of polyadenylation. BED files (e.g., PolyAminer_Out “GeneName”._CPASdb.bed) will offer additional genomic feature data to complement the visualizations and summary outputs.
In summary, the PolyAMiner-Bulk protocol is expected to yield a robust set of summary metrics, visualizations, and intermediate files that together provide a comprehensive view of APA events. These outputs will support both exploratory and confirmatory APA research, enabling detailed insights into APA’s role in gene regulation.
Limitations
As with any method, a possibility of false positives exists. Bulk RNA-seq data has notoriously poor and noisy 3′ UTR coverage. While this issue can be mitigated to an extent by using a bulk RNA-seq dataset with a high sequencing depth, orthogonal approaches are still required to validate the computational predictions of PolyAMiner-Bulk.
Troubleshooting
Problem 0
How do I determine my <host_path> and <container_path>?
Potential solution
-
•
To determine your <host_path>, where your data or files are situated on the host machine, you can utilize either an absolute or relative path. Begin by creating the required output directories. In our example, the <host_path> is exemplified as <host_path>/Test/output.
-
•
Your <container_path> refers to the corresponding path within the container where the volume will be mounted. Like <host_path>, it can be an absolute or relative path. For instance, in our demonstration, <container_path> is: /root/output. If utilizing input files from the demo, the <container_path> can remain the same. However, if different input files are desired, replace root with the specific path name leading to the desired input files.
-
•To clarify, this is an example of the command being executed on our local machine:
-
○<host_path> is <Users/sriyajonnakuti/Desktop/PolyAMiner-Bulk_analysis>.
-
○<container_path> is: /root/output
-
○The mount directory containing input files to /root/input is /root/PolyAMiner-Bulk/Demo/ which holds the following bam files:
-
-control1_.subset.sorted.bam
-
-control2_.subset.sorted.bam
-
-control3_.subset.sorted.bam
-
-treatment1_.subset.sorted.bam
-
-treatment2_.subset.sorted.bam
-
-treatment3_.subset.sorted.bam
-
-
-
○
> docker run -it -v /Users/sriyajonnakuti/Desktop/PolyAMiner-Bulk_analysis /Test/output:/root/output venkatajonnakuti/polyaminer-bulk -fasta /root/PolyAMiner-Bulk/ReferenceFiles/GRCh38.primary_assembly.genome.fa -gtf /root/PolyAMiner-Bulk/ReferenceFiles/gencode.v33.primary_assembly.annotation.gtf -o /root/output -c1 /root/PolyAMiner-Bulk/Demo/control1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control3_.subset.sorted.bam -c2 /root/PolyAMiner-Bulk/Demo/treatment1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment3_.subset.sorted.bam -ignore UTR5,CDS,Intron,UN -modelOrganism human -visualizeTopNum 10
Problem 1
How do you analyze locally hosted input files (i.e., BAM, FASTA, and GTF files) with PolyAMiner-Bulk (related to step 2)?
Potential solution
-
•
Simply mount additional volumes containing these locally hosted input files using the ‘-v’ parameter in your PolyAMiner-Bulk run command.
-
•
An example run command for our pre-included simulated bulk RNA-seq demo data (in BAM format) is as follows:
> docker run -it -v <host path>/inputData:/root/input
-v <host path>/Test/output:/root/output polyaminer-bulk -fasta /root/input/GRCh38.primary_assembly.genome.fa -gtf /root/input/gencode.v33.primary_assembly.annotation.gtf -o /root/output -c1 :/root/input/PolyAMiner-Bulk/Demo/control1_.subset.sorted.bam, /root/input/PolyAMiner-Bulk/Demo/control2_.subset.sorted.bam, /root/input/PolyAMiner-Bulk/Demo/control3_.subset.sorted.bam -c2 /root/input/PolyAMiner-Bulk/Demo/treatment1_.subset.sorted.bam, /root/input/PolyAMiner-Bulk/Demo/treatment2_.subset.sorted.bam, /root/input/PolyAMiner-Bulk/Demo/treatment3_.subset.sorted.bam -modelOrganism human -visualizeTopNum 10
Problem 2
How do you analyze all genic C/PASs (rather than just 3′ UTR C/PASs) with PolyAMiner-Bulk (related to step 2)?
Potential solution
-
•
Simply remove the ‘-ignore’ parameter in your PolyAMiner-Bulk run command.
-
•
An example run command for our pre-included simulated bulk RNA-seq demo data (in BAM format) is as follows:
> docker run -it -v <host path>/Test/output:/root/output polyaminer-bulk -fasta /root/PolyAMiner-Bulk/ReferenceFiles/GRCh38.primary_assembly.genome.fa -gtf /root/PolyAMiner-Bulk/ReferenceFiles/gencode.v33.primary_assembly.annotation.gtf -o /root/output -c1 /root/PolyAMiner-Bulk/Demo/control1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/control3_.subset.sorted.bam -c2 /root/PolyAMiner-Bulk/Demo/treatment1_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment2_.subset.sorted.bam,/root/PolyAMiner-Bulk/Demo/treatment3_.subset.sorted.bam -modelOrganism human -visualizeTopNum 10
Problem 3
How do you process raw bulk RNA-seq data (in FastQ format) into aligned data (in BAM format) (related to step 2)?
Potential solution
-
•To process raw bulk RNA-seq data in FastQ format and convert it into aligned data in BAM format, you typically follow these steps:
-
○Quality Control (QC): Perform quality control checks on the raw FastQ files using tools like FastQC.2 This step ensures the data quality and identifies any potential issues such as adapter contamination or low-quality reads.
-
○Read Alignment: Align the FastQ reads to a reference genome using an alignment tool such as STAR or Bowtie.3,4 These tools map the reads to their respective genomic locations, taking into account factors like splicing and allowing for multi-mapping if necessary. The output of this step is a SAM (Sequence Alignment/Map) file.
-
○Conversion to BAM format: Convert the SAM file into BAM format, which is a compressed binary version of the SAM file, using a tool like Samtools. This step improves storage efficiency and facilitates faster data processing.
-
○Sorting and Indexing: Sort the BAM file by genomic coordinates to optimize subsequent analysis steps. Additionally, create an index file (.BAI) using tools like Samtools.5 The index file allows for quick access to specific regions of the BAM file.
-
○
-
•
After completing these steps, you will have processed bulk RNA-seq data in BAM format, which can be further utilized for downstream analysis, including differential gene expression, alternative splicing, or alternative polyadenylation analysis.
-
•
It’s important to note that the specific tools and parameters used in each step may vary depending on the analysis pipeline and the requirements of your study. Therefore, specific steps are outside the scope of this protocol. However, various bioinformatics pipelines and workflow managers, such as Nextflow or Snakemake, can help automate and streamline these processing steps.
Problem 4
How do you resolve this error message: “[E::idx_find_and_load] Could not retrieve index file for 'XXX.bam” (related to step 2)?
Potential solution
-
•
.BAI files, also known as index files, are associated with the BAM (Binary Alignment/Map) file format. BAM files store the aligned sequencing reads and related information, typically generated from high-throughput sequencing technologies like bulk RNA sequencing. .BAI files serve as indexes for BAM files, providing a way to efficiently access specific regions within the BAM file without needing to read the entire file. They contain metadata and indexing information that allows for rapid querying and retrieval of specific genomic regions or alignments. By using .BAI files, bioinformatics tools and software can quickly navigate to specific genomic coordinates or regions of interest within a large BAM file, enabling faster analysis and data processing. The .BAI file format is widely supported by various genomics tools and is an essential component for efficient data retrieval in genomics analyses. As such, .BAI files are expected to be in the same directory as their .BAM file counterparts.
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Hari Krishna Yalamanchili, Ph.D. (Hari.Yalamanchili@bcm.edu).
Technical contact
Further information and requests for information on technical specifics of performing the protocol will be fulfilled by the technical contact, Venkata Jonnakuti, Ph.D. (Venkata.Jonnakuti@bcm.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
This paper analyzes demo data. All original code has been deposited at GitHub and is publicly available as of the date of publication. PolyAMiner-Bulk is an open-source algorithm that has been packaged as an end-to-end Python application. It is freely available at https://github.com/YalamanchiliLab/PolyAMiner-Bulk. Given the complexity of the python and R environments that are needed for the proper execution of PolyAMiner-Bulk, we have also created a docker image of this tool. It is freely available at https://hub.docker.com/repository/docker/venkatajonnakuti/polyaminer-bulk/general. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Acknowledgments
We are thankful to our colleagues at the Baylor College of Medicine, Texas Children’s Hospital, and the Jan and Dan Duncan Neurological Research Institute who provided expertise that greatly assisted this research. This work has been supported by the United States Department of Agriculture (USDA/ARS) under cooperative agreement no. 58-3092-0-001 and Duncan NRI Zoghbi Scholar Award to H.K.Y. V.J. is supported by the Gulf Coast Consortia and the National Library of Medicine Training Program in Biomedical Informatics and Data Science (T15 LM0070943).
Author contributions
Conceptualization, V.J. and H.K.Y.; methodology, V.J. and H.K.Y.; software, V.J.; validation, V.J. and S.J.; formal analysis, V.J. and S.J.; investigation, V.J. and S.J.; resources, V.J. and H.K.Y.; data curation, V.J. and H.K.Y.; writing – original draft, V.J. and S.J.; writing – review and editing, V.J., S.J., and H.K.Y.; visualization, V.J. and S.J.; supervision, H.K.Y.; project administration, H.K.Y.; funding acquisition, V.J. and H.K.Y.
Declaration of interests
The authors declare no competing interests.
Contributor Information
Venkata Jonnakuti, Email: venkata.jonnakuti@bcm.edu.
Hari Krishna Yalamanchili, Email: hari.yalamanchili@bcm.edu.
References
- 1.Jonnakuti V.S., Wagner E.J., Maletić-Savatić M., Liu Z., Yalamanchili H.K. PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data. Cell Rep. Methods. 2024;4 doi: 10.1016/J.CRMETH.2024.100707. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Babraham Bioinformatics. (2024). FastQC: A Quality Control tool for High Throughput Sequence Data. Accessed June 7, 2024. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 3.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This paper analyzes demo data. All original code has been deposited at GitHub and is publicly available as of the date of publication. PolyAMiner-Bulk is an open-source algorithm that has been packaged as an end-to-end Python application. It is freely available at https://github.com/YalamanchiliLab/PolyAMiner-Bulk. Given the complexity of the python and R environments that are needed for the proper execution of PolyAMiner-Bulk, we have also created a docker image of this tool. It is freely available at https://hub.docker.com/repository/docker/venkatajonnakuti/polyaminer-bulk/general. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Timing: 1 h
