Skip to main content
. 2017 Oct 11;7:12961. doi: 10.1038/s41598-017-13210-9

Table 1.

Characteristics of training and test data sets used in the cross-validation.

Set Sequence length Status Sequences Hexapeptides
Training 6 Non-amyloid 841 841
Amyloid 247 247
6–10 Non-amyloid 964 1412
Amyloid 312 475
6–15 Non-amyloid 992 1653
Amyloid 342 720
Test 6 Non-amyloid 841 841
Amyloid 247 247
7–10 Non-amyloid 123 571
Amyloid 65 228
11–15 Non-amyloid 28 241
Amyloid 30 245
16–25 Non-amyloid 41 571
Amyloid 55 778

We derived sequences of different lengths from AmyLoad database (column ‘Sequences’) and from them extracted all possible overlapping hexapeptides (column ‘Hexapeptides’). Training data sets are partially overlapping (e.g. the set 6–10 contains also sequences from the set 6). Test data sets are always non-overlapping.