Read QC |
Read quality control |
FastQC v0.11.5 |
fastqc /path_to/raw.fq.gz (Data Citation 1 and Data Citation 2) |
Clean Reads |
Adaptor and low quality trimming |
TRIMMOMATIC v0.36 |
java -jar /path_to/trimmomatic-0.36.jar PE -phred33 -threads 8 raw_R1.fq.gz raw_R2.fq.gz clean_FP.fq.gz clean_FU.fq.gz clean_RP.fq.gz clean_RU.fq.gz HEADCROP:10 ILLUMINACLIP:/path_to/adapters_list.fa:2:30:10 TRAILING:10 SLIDINGWINDOW:4:10 MINLEN:25 |
De novo assembly |
Transcriptome assembly |
Trinity v2.2.0 |
Trinity --seqType fq --left clean_FP.fq.gz --right clean_RP.fq.gz --CPU 20 --max_memory 150G --SS_lib_type RF --output trinity_assembly |
Assembly curation |
Filtering out contigs with low read support |
Transrate v1.0.3 |
transrate --assembly Ltimidus_Trinity.fasta --left clean_FP.fq.gz --right clean_RP.fq.gz --threads 10 --reference Oryctolagus_cuniculus.OryCun2.0.81.pep.all.fa --output transrate_Ltimidus_Trinity |
Remove redundancy |
Clustering of highly homologous sequences |
CD-HIT-EST v4.6.4 |
cd-hit-est -i good.Ltimidus_Trinity.fasta -c 0.95 -o AlpsIrel.fasta |
ORF prediction |
Filtering based on candidate coding regions and pfam annotation |
TransDecoder v3.0.0 |
TransDecoder.LongOrfs -t AlpsIrel.fasta |
HMMER v3.1b2 |
hmmscan --cpu 8 --domtblout pfam.domtblout /path_to/Pfam-A.hmm transdecoder_dir/longest_orfs.pep |
TransDecoder v3.0.0 |
TransDecoder.Predict -t AlpsIrel.fasta --cpu 2 --retain_pfam_hits pfam.domtblout |
Annotation |
Annotation assessment |
Trinotate v3.0.1 |
wget "https://data.broadinstitute.org/Trinity/Trinotate_v3_RESOURCES/Trinotate_v3.sqlite.gz" -O Trinotate.sqlite.gz |
Gunzip |
gunzip Trinotate.sqlite.gz |
Conditional reciprocal best blast annotation |
crb-blast v0.6.6 |
crb-blast --query AlpsIrel.cds --target database(SP and Ocun) --threads 4 --split 4 --output blastx.outfmt6 |
crb-blast v0.6.6 |
crb-blast --query AlpsIrel.pep --target database(SP and Ocun) --threads 4 --split 4 --output blastp.outfmt6 |
Signalp annotation |
signalp v4.1 |
signalp -f short -n signalp.out AlpsIrel.pep |
Pfam annotation |
HMMER v3.1b2 |
hmmscan --cpu 2 --domtblout TrinotatePFAM.out Pfam-A.hmm AlpsIrel.pep |
tmhmm annotation |
tmHMM v2.0 |
tmhmm --short < AlpsIrel.pep > tmhmm.out |
Combine annotations |
Trinity utilities v2.2.0 |
/path_to/trinityrnaseq-2.2.0/util/support_scripts/get_Trinity_gene_to_trans_map.pl AlpsIrel.fasta >AlpsIrel.gene_trans_map |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite init --gene_trans_map AlpsIrel.gene_trans_map --transcript_fasta AlpsIrel.fasta --transdecoder_pep AlpsIrel.pep |
SwissProt annotation load |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite LOAD_swissprot_blastp SP.blastp.outfmt6 #and# Trinotate Trinotate.sqlite LOAD_swissprot_blastx SP.blastx.outfmt6 |
O.cuniculus annotation load |
Trinotate v3.0.1 |
1. Trinotate Trinotate.sqlite LOAD_custom_blast --outfmt6 Ocun.blastp.outfmt6 --prog blastp --dbtype Ocun; 2. Trinotate Trinotate.sqlite LOAD_custom_blast --outfmt6 Ocun.blastx.outfmt6 --prog blastx --dbtype Ocun |
Pfam annotation load |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out |
tmhmm annotation load |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out |
Signalp annotation load |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite LOAD_signalp signalp.out |
Joint annotation file |
Trinotate v3.0.1 |
Trinotate Trinotate.sqlite report > LtimidusTranscriptome.xls |
Mapping |
Read mapping onto the curated reference |
bwa-mem v0.7.15 |
bwa index AlpsIrel.cds |
bwa-mem v0.7.15 |
bwa mem -t 10 -R '@RG\tID:pop_sample_lane\tSM:popsample\tLB:LIBsample' AlpsIrel.cds Sample_L*_FP.fq.gz Sample_L*_RP.fq.gz > Sample_lane.sam |
Bam conversion,sort and fixmate |
Fixmate and BAM conversion |
SAMtools v1.3.1 |
samtools fixmate --output-fmt BAM sample_lane.sam sample_lane_fixmate.bam |
BAM sort |
SAMtools v1.3.1 |
samtools sort -O bam -o sample_lane_sorted.bam -T /path_to/temp/ sample_lane_fixmate.bam |
Remove duplicates |
Mark and remove duplicates |
Picard v1.140 |
java -jar /path_to/picard.jar MarkDuplicates REMOVE_DUPLICATES=True MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=950 ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT I=sample_lane_sorted.bam I=sample_lane_sorted.bam I=sample_lane_sorted.bam O=sample_rmdup.bam M=duplic_stats_sample TMP_DIR=/path_to/temp |
Realignment and recalibration |
Realignment |
GATK v3.6-0 |
java -jar /path_to/GenomeAnalysisTK.jar -T RealignerTargetCreator -R AlpsIrel.cds -I sample_rmdup.bam -o sample_int.list |
Recalibration |
GATK v3.6-0 |
java -jar /path_to/GenomeAnalysisTK.jar -T IndelRealigner -R AlpsIrel.cds -I sample_rmdup.bam -targetIntervals sample_int.list -o sample_realign.bam |
SNP call |
SNP call |
Reads2snp v2.0.64 |
reads2snp_2.0.64.bin -bamlist LtimLeur_list.txt -bamref AlpsIrel.cds -out LtimVsLeur -min 10 -nbth 12 -th1 0.95 -par 1 -th2 0.01 -opt bfgs -fis 0.0 -pre 0.001 -rqt 20 |
Differentiation analysis |
Remove indels and missing data |
VCFtools v0.1.14 |
vcftools --vcf LtimVsLeur.vcf --recode --recode-INFO-all --remove-indels --max-missing-count 0 --out LtimVsLeur_noindels |
Extract 1 SNP per contig |
VCFtools v0.1.14 |
vcftools --vcf LtimVsLeur_noindels.recode.vcf --recode --recode-INFO-all --thin 10000 --min-alleles 2 --out LtimVsLeur_1SNPperContig |
VCF to STRUCTURE conversion |
PGDSpyder v2.1.1.0 |
java -Xmx1024m -Xms512m -jar /path_to/PGDSpider2-cli.jar -inputfile LtimVsLeur_1SNPperContig.recode.vcf -inputformat VCF -outputfile LtimVsLeur_SNPs -outputformat STRUCTURE -spid VCF_to_STRUCTURE.spid |
Structure analysis |
STRUCTURE v2.3.4 |
structure -m mainparams (standard parameters except 1 million steps after a burn-in period of 200 000, K=2 and admixture model) |
CLUMPACK v42089 |
The Web version was used - http://clumpak.tau.ac.il/
|
PCA analysis |
PLINK v1.90b3.45 |
plink --file LtimVsLeur_1SNPperContig --pca 3 |
ggplot2 R package v2.2.1 |
1. R; 2. library(ggfortify); 3. pca <- read.table('plink.eigenvec', header=TRUE); 4. df <- pca[c(3, 4)]; 5. autoplot(prcomp(df), data=pca, colour='Species.Pop', size=5) |
GO enrichment |
Gene Ontology enrichment analysis |
g:Profiler |
Available at http://biit.cs.ut.ee/gprofiler/ ; Best per parent group Hierarchical filtering; Input background manually; g:SCS significance threshold. |