Appendix Table A3:
Complete Wikipedia | Spider-crawl Wikipedia | Strict string match | |||
---|---|---|---|---|---|
Stereotype | Word | CS Score | Word | CS Score | Word |
Attractive | unattractive | 0.7458 | desirable | 0.6811 | attractive |
appealing | 0.6966 | advantageous | 0.6603 | ||
desirable | 0.6878 | unattractive | 0.6250 | ||
unappealing | 0.6699 | appealing | 0.6150 | ||
alluring | 0.6612 | hospitable | 0.5845 | ||
enticing | 0.6471 | palatable | 0.5710 | ||
ideal | 0.6290 | favorable | 0.5578 | ||
advantageous | 0.6268 | apt | 0.5564 | ||
agreeable | 0.6207 | adaptable | 0.5564 | ||
interesting | 0.6175 | comfortable | 0.5479 | ||
Hearing | hearings | 0.5888 | judgment | 0.4814 | hearing |
testifying | 0.5629 | deaf | 0.4631 | ||
testimony | 0.5403 | hearings | 0.4614 | ||
arraignment | 0.5272 | tinnitus | 0.4585 | ||
pleading | 0.5148 | earplugs | 0.4549 | ||
sentencing | 0.5078 | testifying | 0.4473 | ||
testified | 0.5051 | cochlear | 0.4447 | ||
committal | 0.4971 | proceedings | 0.4426 | ||
questioning | 0.4929 | auditory | 0.4397 | ||
complaint | 0.4853 | trial | 0.4374 | ||
Memory | memories | 0.6628 | memories | 0.7152 | memory |
brain | 0.5274 | amnesia | 0.5725 | ||
recollection | 0.5185 | hippocampus | 0.5631 | ||
cpu | 0.5138 | cognition | 0.5625 | ||
remembering | 0.5051 | cognitive | 0.5617 | ||
eidetic | 0.4975 | retrieval | 0.5589 | ||
scratchpad | 0.4933 | recollection | 0.5514 | ||
cache | 0.4873 | episodic | 0.5478 | ||
rom | 0.4862 | perceptual | 0.5418 | ||
consciousness | 0.4770 | reconsolidation | 0.5093 | ||
Physically able | unable | 0.6887 | unable | 0.6910 | physically able |
enough | 0.6050 | willing | 0.5993 | ||
willing | 0.5912 | enough | 0.5800 | ||
trying | 0.5904 | psychologically | 0.5680 | ||
ability | 0.5865 | trying | 0.5513 | ||
needed | 0.5696 | unwilling | 0.5365 | ||
attempting | 0.5556 | expected | 0.5363 | ||
allowed | 0.5549 | anxious | 0.5294 | ||
psychologically | 0.5535 | attempting | 0.5281 | ||
unwilling | 0.5505 | eager | 0.5271 | ||
Number of articles | appx. 5,500,000 | 65,532 | |||
Number of words | 885,424 | 260,073 |
Note: The table presents examples of result from training models on different corpora. Complete Wikipedia includes all the articles; it has 885,424 words and 200-dimensional vectors. Spider-crawl (scrapy spider) Wikipedia starts with articles referring to stereotypese3, ageism, and labor markets, as explained in the text. Both use 200-dimensional vectors. The “Word” column lists the words with the highest similarity scores; the scores are reported in the “CS Score” column.