Table 2:
Availability of data generated in this paper. The data is available on Caltech Data under the DOIs 10.22002/krqmp-5hy81 and 10.22002/k7xqw-88d74.
| File | Description | Category |
|---|---|---|
| viral sequences in laboratory reagents.h5ad | Count matrix containing virus-like sequences found in sequencing libraries comprised of only sterile water and laboratory reagents | Alignment of ‘blank’ sequencing libraries to the PalmDB |
| host alignment results.zip | Raw alignment results obtained by kallisto after alignment to the macaque and dog (to account for the MDCK spike-in) transcriptomes | Alignment of the macaque PBMC data37 to the host transcriptome(s) |
| host QC.h5ad | Filtered count matrix containing all host cells | |
| canis QC norm leiden.h5ad | Filtered and clustered count matrix containing MDCK cells | |
| macaque QC norm leiden.h5ad | Filtered and clustered count matrix containing macaque cells | |
| macaque QC norm leiden celltypes.h5ad | Filtered and clustered count matrix containing macaque cells with cell type assignments | |
| virus no mask alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB without masking host sequences | Alignment of the macaque PBMC data37 to the PalmDB for the detection of viral RNA with different workflows for the masking of host genome(s) and transcriptome(s) |
| virus no mask.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus dlist cdna alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB while masking host transcriptome(s) using the D-list | |
| virus dlist cdna.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus dlist dna alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB while masking host genome(s) using the D-list | |
| virus dlist dna.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus dlist cdna dna alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB while masking host genome(s) and transcriptome(s) using the D-list | |
| virus dlist cdna dna.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus dlist cdna dna amb alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB while masking host genome(s) and transcriptome(s) using the D-list + forcing ambiguous sequences to be discarded | |
| virus dlist cdna dna ambiguous.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus host capture alignment results.tar.gz | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB + reads that align to the host transcriptome(s) were captured | |
| virus host-captured.h5ad | Count matrix obtained through the alignment above with added metadata | |
| virus host capture dlist cdna dna alignment results.tar.gz | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB while masking host genome(s) and transcriptome(s) using the D-list + reads that align to the host transcriptome(s) were captured | |
| virus host-captured dlist cdna dna.h5ad | Count matrix obtained through the alignment above with added metadata | |
| bwa unmapped reads.tar.gz | Raw sequencing files obtained after removal of host sequences based on alignment with bwa | |
| virus bwa alignment results.zip | Raw alignment results obtained by kallisto translated search after alignment to the PalmDB after reads that align to the host genome(s) with bwa were removed | |
| virus bwa.h5ad | Count matrix obtained through the alignment above with added metadata | |
| models.zip | Logistic regression models to predict viral presence based on host gene expression | Logistic regression models |
| palmdb human dlist cdna dna.idx | Pre-computed PalmDB reference index with human genomic and transcriptomic sequences masked using D-list | Pre-computed references for future use with kallisto translated search |
| palmdb mouse dlist cdna dna.idx | Pre-computed PalmDB reference index with mouse genomic and transcriptomic sequences masked using D-list |