Table 2.
Numerical breakdown of the events used to construct the given number of synthetic SM samples (in the SR) for the four machine learning methods considered in this report. The CURTAINs method uses a slightly narrower SB region of [2.7, 4.5] TeV to avoid transforming events across the turn-on region border. The SALAD samples are generated by applying the learned weights to an additional, much larger set of Herwig++ simulated SM events not contained in the LHC Olympics dataset. Note that CATHODE and CURTAINs are data-exclusive (i.e. fully data-driven), using only the the “detected” (DAT) Pythia set, while SALAD and FETA require an auxiliary “simulated” (SIM) Herwig++ set
| Method | Training data | Validation data | # Samples | Oversampling |
|---|---|---|---|---|
| SALAD | 793k SIM, 696k DAT | 198K SIM, 174K DAT | 1,045k | N/A |
| CATHODE | 696k DAT | 174K DAT | 400k | 3 |
| CURTAINs | 373k DAT | 93k DAT | 1,887k | 4 |
| FETA | 793k SIM, 696k DAT | 198K SIM, 174K DAT | 732k | 6 |