Skip to main content
. 2024 Aug 27;11:928. doi: 10.1038/s41597-024-03611-7

Table 4.

Mapping between the Silent Cities tags and the labels from the AudioSet ontology.

Final tag name Corresponding labels in AudioSet Ontology Category
Wind Wind Geophony
Rain Rain
River Stream/Waterfall
Wave Ocean
Thunder Thunderstorm
Bird Bird vocalization, bird call, bird song/Pigeon, dove/Crow/Owl/Gull, seagull Biophony
Amphibian Frog
Insect Insect
Mammal Rodents, rats, mice/Canidae, dogs, wolves
Reptile Snake
Walking Run/Walk, footsteps Anthropophony
Cycling Bicycle/Bicycle bell
Beep Reversing beeps
Car Car passing by/Tire squeal
Car honk Vehicle horn, car horn, honking
Motorbike Motorcycle
Plane Aircraft engine/Fixed-wing aircraft, airplane
Helicopter Helicopter
Boat Motorboat, speedboat/Ship/Sailboat, sailing ship
Other motors Traffic noise, roadway noise
Shoot Gunshot, gunfire
Bell Chime/Jingle bell/Cowbell/Church bell/Change ringing (campanology)
Talking Speech/Hubbub, speech noise, speech babble/
Music Music
Dog bark Dog
Rolling shutter Power windows, electric windows
Kitchen sounds

Door/Cupboard open or close/Drawer open or close/Dishes, pots, and pans/

Cutlery, silverware/Chopping (food)/Sink (filling or washing)/Water tap, faucet/Kettle whistle/

Microwave oven/Blender

Each tag is computed using the maximum probability output from the pretrained network among the corresponding Audioset labels. Finally, the three tags Antropophony, Geophony and Biophony are computed using the maximum tag probability in the category.