The Musilanguage 2.0 model. Three evolutionary stages are shown. Stage 1 at left is one of group emotional vocalizations based on innate calls driven by the mechanisms of affective prosody. Stage 2 in the middle is the musilanguage stage of intonational prosody, possessing the various features outlined in Table 1, including vocal learning and the levels-and-contours pitch system. Two of its key features are highlighted here, first a system of phonemic combinatoriality (where orange in the figure signifies combinatorial), and second a system of holistic intonational formulas (where dark red signifies prosodic and holistic). Stage 3 at right is the bifurcation to form music and language as separate, though homologous, functions. The road to music involves a digitization of the pitch properties of the levels-and-contours precursor to develop tonality based on scale structure. This is accompanied by a system of emotional-valence coding that I call “scale/emotion” associations. The performance arrangement is integrated due to evolutionary changes permitting entrainment using metric rhythms. A domain-specific combinatorial feature of music is pitch combinatoriality. Next, the road to language involves the capacity to generate words through acoustic symbols. Externalization of language through speech is proposed to retain the levels-and-contours (LandC) system used by the musilanguage precursor. I propose that speech evolved as a lexical-tonal system from its inception, one that worked based on combinatorial principles. Language develops a performance arrangement that is based on alternation. In orange is presented a “combinatorial triad” of phonemic combinatoriality (shared between music and language), pitch combinatoriality (specific to music), and tone combinatoriality (specific to language).