Data: N training enzymes, grid size of l, homothetic transformation ratio λ, p interpolations, probability pflip to flip with respect to each axis. |
Input: Raw coordinates contained in PDB files |
Output: Volumes of binary voxels representing backbone atoms occupancy |
1 |
foreach
of the N enzymes of the training set
do
|
2 |
Step 1: structural information extraction
|
3 |
Extract coordinates of backbone atoms from its PDB file |
4 |
Step 2: holes completion
|
5 |
Interpolate consecutive backbone atoms by p new points |
6 |
Step 3: size adjustment
|
7 |
Center barycenter S of the coordinates on (0, 0, 0) |
8 |
Homothetic transformation of each point with center S and ratio λ |
9 |
Step 4: enzyme orientation
|
10 |
Principal component analysis (PCA) transformation |
11 |
Step 5: random augmentation
|
12 |
if
True with probability pflip
then
|
13 |
Flip coordinates with respect to the origin along x—axis
|
14 |
if
True with probability pflip
then
|
15 |
Flip coordinates with respect to the origin along y—axis
|
16 |
if
True with probability
pflip
then
|
17 |
Flip coordinates with respect to the origin along z—axis
|
18 |
Step 6: voxelization
|
19 |
Center barycenter S of the coordinates on
|
20 |
Transform coordinate points into binary voxels |