Basic statistics for single-document (SDS) and multi-document (MDS) summarization datasets. For multi-document summarization (MDS), # Source words are aggregated across documents. Compression ratio is the average ratio of source words to summary words. Extractiveness metrics (coverage and density) come from Grusky et al. (2018) and, for consistency, are calculated using the official code across the validation set for each dataset. Spacy tokenization is performed before extracting fragments. Other corpus statistics are pulled from either the corresponding paper or Table 1 in Sharma et al. (2019). Entries are filled with N/A because the dataset is private (Krishna et al., 2020), or too expensive to generate (Liu et al., 2018a). The Gigaword SDS dataset comes from the annotated Gigaword dataset (Graff et al., 2003; Napoles et al., 2012)