Table 1.
After filtering the original data, checking the sequencing error rate, and checking the GC content distribution, we obtain clean reads for subsequent analysis. The data is summarized as shown in the table below.
| Sample | Library | Raw_reads | clean_reads | Clean_bases | Error_rate | Q20 | Q30 | GC_pct |
|---|---|---|---|---|---|---|---|---|
| tVaS1 | FRAS202156226-1r | 43614846 | 42016796 | 6.3G | 0.03 | 97.94 | 93.92 | 37.1 |
| tVaS2 | FRAS202156227-1r | 45933166 | 44134864 | 6.62G | 0.03 | 97.58 | 93.11 | 37.7 |
| tVaS3 | FRAS202156228-1r | 45061154 | 43428666 | 6.51G | 0.02 | 98.07 | 94.26 | 37.71 |
| tVaR1 | FRAS202156223-1r | 43504158 | 42181696 | 6.33G | 0.03 | 97.97 | 93.96 | 35.83 |
| tVaR2 | FRAS202156224-1r | 44238968 | 42270438 | 6.34G | 0.03 | 97.72 | 93.37 | 37.54 |
| tVaR3 | FRAS202156225-1r | 46739216 | 43630738 | 6.54G | 0.03 | 97.79 | 93.57 | 37.48 |
| tCon1 | FRAS202156229-1r | 44446884 | 42744414 | 6.41G | 0.02 | 98.04 | 94.16 | 38.45 |
| tCon2 | FRAS202156230-1r | 45099750 | 43562032 | 6.53G | 0.03 | 97.95 | 94 | 37.95 |
| tCon3 | FRAS202156231-1r | 47643512 | 45841434 | 6.88G | 0.03 | 98.01 | 94.12 | 38.14 |
Sample: Sample name.
Library: Library number.
Raw_reads: The number of reads in the raw data.
Clean_reads: The number of reads after filtering the original data.
Clean_bases: The number of bases after filtering the original data (clean base=clean reads*150bp).
Error_rate: The overall sequencing error rate of the data.
Q20: The percentage of bases with a Phred value greater than 20 to the total bases.
Q30: The percentage of bases with a Phred value greater than 30 to the total bases.
GC_pct: the percentage of G and C in the four bases in clean reads.