Skip to main content
. 2023 May 29;15:55. doi: 10.1186/s13321-023-00725-9

Table 3.

Degenerate repetition observed in various tasks from state-of-the-art large-scale language models

Task [model] Context [Repetition pattern] output
Sentence completion [GPT2] ref. [10] Prefix: Lyrically the song has excerpts of different languages including French, Spanish, [Single-word repetition] German, Italian, Portuguese, Spanish, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, Portuguese, .
Summarization [Transformer] ref. [52] Article [Phrase-level repetition] ...cristiano ronaldo’s best footballer whereas barcelona talisman lionel messi’s best footballer whereas barcelona’s best footballer whereas barcelona talisman lionel lee whereas’s best footballer’s. (Truncated)
Generation from an initial tag line [GPT-2] ref. [9] Tag: So what’s new in my life? 09/11/18 - Just got back from vacation [Sentence-level repetition] Just got back from vacation. 09/10/18 - Just got back from vacation. Just got back from vacation. 09/09/18 - Just got back from vacation. Just got back from vacation. 09/08/18 - Just got back from vacation. Just got back from vacation.
Product review generation [GPT-2] ref. [53] Initial context [Structural repetition] Great movie, although took a while to see at first it held my interest and kept me interested, plus i thought it was extremely good. also it was very good.
Protein sequence generation [ProtGPT-2] ref. [54] No context [Subsequential repetition] MSNDTPTHDPTPPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPEPAPAPAPE.
Molecule captioning [Transformer] ref. [55] SMILES: CC[N+](CC)=C1C=CC2=N C3=C(OC2=C1)C=C(N)C(C) =C3 [Single-word repetition] the molecule is a deuterated compound that is is is is is an isotopologue of chloroform in which the four hydrogen atoms have been replaced by deuterium. (Truncated)

The examples contain single-word repetitions, phrase-level repetitions, sentence-level repetitions, structural repetitions where tokens may vary within a repeating phrase, and subsequential repetitions. The first repeated unit in each example is  emphasized in bold.