Skip to main content
. 2013 Apr 15;4(Suppl 1):S3. doi: 10.1186/2041-1480-4-S1-S3

Table 4.

Training corpora for patent categorization. C73 has more patents per class with longer text and only primary classification.

training corpus number of patents/class minimum text length restricted to primary classification number of classes total number of patents
C73 200 8000 characters yes 73 14600

C1205 100 2000 characters no 1205 120500