Skip to main content
. 2011 Jan 13;2011:baq037. doi: 10.1093/database/baq037

Table 3.

Data cleaning: selected classes of errors, with examples, found in BIND

Error type Examples
No unified representation for missing information of type character/String Missing information may be represented as: 'Unknown', 'NULL', 'unknown', ‘WP:NULL’, ‘unknown.’, ‘– ‘,…etc (in addition to ignoring the enclosing XML element altogether at times)
No unified representation for missing information of type integer Missing information may be represented as: ‘0’, ‘–1’,…etc
Erroneous representation for references to external databases (x-ref) for some interactors <BIND-other-db>
<BIND-other-db_dbname>LocusLink</BIND-other-db_dbname>
<BIND-other-db_intp>0</BIND-other-db_intp>
<BIND-other-db_strp>0</BIND-other-db_strp>
</BIND-other-db>
….
<BIND-other-db>
<BIND-other-db_dbname>SGD</BIND-other-db_dbname>
<BIND-other-db_intp>0</BIND-other-db_intp>
<BIND-other-db_strp/>
</BIND-other-db>
Erroneous internal cross-reference: complexes referencing non-existent (negative) BIND interaction IDs <BIND-mol-object-source_a>
Erroneous external cross-reference: negative PubMed identifier PubMed ID ‘–2’ repeated 68 times in the S.Cerevisiae file
Inconsistent pattern for representing the IDs of some interactor x-refs SGD identifiers ‘SGD: S000003663’ and ‘S000003663’; MGD identifiers ‘MGI:1890695’ and ‘1890695’ are all used.
Wrong x-ref type: listing some IDs as RefSeq identifiers while in fact they are GIs GI IDs: ‘15643805’ and ‘15644490’ listed as RefSeq IDs.
Out dated external cross-references There are 13 070 interactor GIs used in BIND that are not currently in use in Entrez.