Skip to main content
. 2017 Dec 5;4:170178. doi: 10.1038/sdata.2017.178

Table 1. Open access tools and commands used to perform data analyses (analytical steps correspond to those in Fig. 2).

Analytical Step Description Software/Version Command
Read QC Read quality control FastQC v0.11.5 fastqc /path_to/raw.fq.gz (Data Citation 1 and Data Citation 2)
Clean Reads Adaptor and low quality trimming TRIMMOMATIC v0.36 java -jar /path_to/trimmomatic-0.36.jar PE -phred33 -threads 8 raw_R1.fq.gz raw_R2.fq.gz clean_FP.fq.gz clean_FU.fq.gz clean_RP.fq.gz clean_RU.fq.gz HEADCROP:10 ILLUMINACLIP:/path_to/adapters_list.fa:2:30:10 TRAILING:10 SLIDINGWINDOW:4:10 MINLEN:25
De novo assembly Transcriptome assembly Trinity v2.2.0 Trinity --seqType fq --left clean_FP.fq.gz --right clean_RP.fq.gz --CPU 20 --max_memory 150G --SS_lib_type RF --output trinity_assembly
Assembly curation Filtering out contigs with low read support Transrate v1.0.3 transrate --assembly Ltimidus_Trinity.fasta --left clean_FP.fq.gz --right clean_RP.fq.gz --threads 10 --reference Oryctolagus_cuniculus.OryCun2.0.81.pep.all.fa --output transrate_Ltimidus_Trinity
Remove redundancy Clustering of highly homologous sequences CD-HIT-EST v4.6.4 cd-hit-est -i good.Ltimidus_Trinity.fasta -c 0.95 -o AlpsIrel.fasta
ORF prediction Filtering based on candidate coding regions and pfam annotation TransDecoder v3.0.0 TransDecoder.LongOrfs -t AlpsIrel.fasta
HMMER v3.1b2 hmmscan --cpu 8 --domtblout pfam.domtblout /path_to/Pfam-A.hmm transdecoder_dir/longest_orfs.pep
TransDecoder v3.0.0 TransDecoder.Predict -t AlpsIrel.fasta --cpu 2 --retain_pfam_hits pfam.domtblout
Annotation Annotation assessment Trinotate v3.0.1 wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O Trinotate.sqlite.gz
Gunzip gunzip Trinotate.sqlite.gz
Conditional reciprocal best blast annotation crb-blast v0.6.6 crb-blast --query AlpsIrel.cds --target database(SP and Ocun) --threads 4 --split 4 --output blastx.outfmt6
crb-blast v0.6.6 crb-blast --query AlpsIrel.pep --target database(SP and Ocun) --threads 4 --split 4 --output blastp.outfmt6
Signalp annotation signalp v4.1 signalp -f short -n signalp.out AlpsIrel.pep
Pfam annotation HMMER v3.1b2 hmmscan --cpu 2 --domtblout TrinotatePFAM.out Pfam-A.hmm AlpsIrel.pep
tmhmm annotation tmHMM v2.0 tmhmm --short < AlpsIrel.pep > tmhmm.out
Combine annotations Trinity utilities v2.2.0 /path_to/trinityrnaseq-2.2.0/util/support_scripts/get_Trinity_gene_to_trans_map.pl AlpsIrel.fasta >AlpsIrel.gene_trans_map
Trinotate v3.0.1 Trinotate Trinotate.sqlite init --gene_trans_map AlpsIrel.gene_trans_map --transcript_fasta AlpsIrel.fasta --transdecoder_pep AlpsIrel.pep
SwissProt annotation load Trinotate v3.0.1 Trinotate Trinotate.sqlite LOAD_swissprot_blastp SP.blastp.outfmt6 #and# Trinotate Trinotate.sqlite LOAD_swissprot_blastx SP.blastx.outfmt6
O.cuniculus annotation load Trinotate v3.0.1 1. Trinotate Trinotate.sqlite LOAD_custom_blast --outfmt6 Ocun.blastp.outfmt6 --prog blastp --dbtype Ocun; 2. Trinotate Trinotate.sqlite LOAD_custom_blast --outfmt6 Ocun.blastx.outfmt6 --prog blastx --dbtype Ocun
Pfam annotation load Trinotate v3.0.1 Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out
tmhmm annotation load Trinotate v3.0.1 Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out
Signalp annotation load Trinotate v3.0.1 Trinotate Trinotate.sqlite LOAD_signalp signalp.out
Joint annotation file Trinotate v3.0.1 Trinotate Trinotate.sqlite report > LtimidusTranscriptome.xls
Mapping Read mapping onto the curated reference bwa-mem v0.7.15 bwa index AlpsIrel.cds
bwa-mem v0.7.15 bwa mem -t 10 -R '@RG\tID:pop_sample_lane\tSM:popsample\tLB:LIBsample' AlpsIrel.cds Sample_L*_FP.fq.gz Sample_L*_RP.fq.gz > Sample_lane.sam
Bam conversion,sort and fixmate Fixmate and BAM conversion SAMtools v1.3.1 samtools fixmate --output-fmt BAM sample_lane.sam sample_lane_fixmate.bam
BAM sort SAMtools v1.3.1 samtools sort -O bam -o sample_lane_sorted.bam -T /path_to/temp/ sample_lane_fixmate.bam
Remove duplicates Mark and remove duplicates Picard v1.140 java -jar /path_to/picard.jar MarkDuplicates REMOVE_DUPLICATES=True MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=950 ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT I=sample_lane_sorted.bam I=sample_lane_sorted.bam I=sample_lane_sorted.bam O=sample_rmdup.bam M=duplic_stats_sample TMP_DIR=/path_to/temp
Realignment and recalibration Realignment GATK v3.6-0 java -jar /path_to/GenomeAnalysisTK.jar -T RealignerTargetCreator -R AlpsIrel.cds -I sample_rmdup.bam -o sample_int.list
Recalibration GATK v3.6-0 java -jar /path_to/GenomeAnalysisTK.jar -T IndelRealigner -R AlpsIrel.cds -I sample_rmdup.bam -targetIntervals sample_int.list -o sample_realign.bam
SNP call SNP call Reads2snp v2.0.64 reads2snp_2.0.64.bin -bamlist LtimLeur_list.txt -bamref AlpsIrel.cds -out LtimVsLeur -min 10 -nbth 12 -th1 0.95 -par 1 -th2 0.01 -opt bfgs -fis 0.0 -pre 0.001 -rqt 20
Differentiation analysis Remove indels and missing data VCFtools v0.1.14 vcftools --vcf LtimVsLeur.vcf --recode --recode-INFO-all --remove-indels --max-missing-count 0 --out LtimVsLeur_noindels
Extract 1 SNP per contig VCFtools v0.1.14 vcftools --vcf LtimVsLeur_noindels.recode.vcf --recode --recode-INFO-all --thin 10000 --min-alleles 2 --out LtimVsLeur_1SNPperContig
VCF to STRUCTURE conversion PGDSpyder v2.1.1.0 java -Xmx1024m -Xms512m -jar /path_to/PGDSpider2-cli.jar -inputfile LtimVsLeur_1SNPperContig.recode.vcf -inputformat VCF -outputfile LtimVsLeur_SNPs -outputformat STRUCTURE -spid VCF_to_STRUCTURE.spid
Structure analysis STRUCTURE v2.3.4 structure -m mainparams (standard parameters except 1 million steps after a burn-in period of 200 000, K=2 and admixture model)
CLUMPACK v42089 The Web version was used - http://clumpak.tau.ac.il/
PCA analysis PLINK v1.90b3.45 plink --file LtimVsLeur_1SNPperContig --pca 3
ggplot2 R package v2.2.1 1. R; 2. library(ggfortify); 3. pca <- read.table('plink.eigenvec', header=TRUE); 4. df <- pca[c(3, 4)]; 5. autoplot(prcomp(df), data=pca, colour='Species.Pop', size=5)
GO enrichment Gene Ontology enrichment analysis g:Profiler Available at http://biit.cs.ut.ee/gprofiler/ ; Best per parent group Hierarchical filtering; Input background manually; g:SCS significance threshold.