The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics

Ann M Mc Cartney; Giulio Formenti; Alice Mouton; Diego De Panis; Luísa S Marins; Henrique G Leitão; Genevieve Diedericks; Joseph Kirangwa; Marco Morselli; Judit Salces-Ortiz; Nuria Escudero; Alessio Iannucci; Chiara Natali; Hannes Svardal; Rosa Fernández; Tim De Pooter; Geert Joris; Mojca Strazisar; Jonathan M D Wood; Katie E Herron; Ole Seehausen; Phillip C Watts; Felix Shaw; Robert P Davey; Alice Minotto; José M Fernández; Astrid Böhne; Carla Alegria; Tyler Alioto; Paulo C Alves; Isabel R Amorim; Jean-Marc Aury; Niclas Backstrom; Petr Baldrian; Laima Baltrunaite; Endre Barta; Bertrand BedHom; Caroline Belser; Johannes Bergsten; Laurie Bertrand; Helena Bilandija; Mahesh Binzer-Panchal; Iliana Bista; Mark Blaxter; Paulo A V Borges; Guilherme Borges Dias; Mirte Bosse; Tom Brown; Rémy Bruggmann; Elena Buena-Atienza; Josephine Burgin; Elena Buzan; Alessia Cariani; Nicolas Casadei; Matteo Chiara; Sergio Chozas; Fedor Čiampor, Jr; Angelica Crottini; Corinne Cruaud; Fernando Cruz; Love Dalen; Alessio De Biase; Javier del Campo; Teo Delic; Alice B Dennis; Martijn F L Derks; Maria Angela Diroma; Mihajla Djan; Simone Duprat; Klara Eleftheriadi; Philine G D Feulner; Jean-François Flot; Giobbe Forni; Bruno Fosso; Pascal Fournier; Christine Fournier-Chambrillon; Toni Gabaldon; Shilpa Garg; Carmela Gissi; Luca Giupponi; Jessica Gomez-Garrido; Josefa González; Miguel L Grilo; Björn Grüning; Thomas Guerin; Nadege Guiglielmoni; Marta Gut; Marcel P Haesler; Christoph Hahn; Balint Halpern; Peter W Harrison; Julia Heintz; Maris Hindrikson; Jacob Höglund; Kerstin Howe; Graham M Hughes; Benjamin Istace; Mark J Cock; Franc Janžekovič; Zophonias O Jonsson

doi:10.1038/s44185-024-00054-6

. 2024 Sep 17;3:28. doi: 10.1038/s44185-024-00054-6

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics

Ann M Mc Cartney ^1,^✉,^#, Giulio Formenti ^2,^3,^#, Alice Mouton ^3,^4,^#, Diego De Panis ^5,⁶, Luísa S Marins ^5,⁶, Henrique G Leitão ⁷, Genevieve Diedericks ⁷, Joseph Kirangwa ⁸, Marco Morselli ⁹, Judit Salces-Ortiz ¹⁰, Nuria Escudero ¹⁰, Alessio Iannucci ³, Chiara Natali ³, Hannes Svardal ^7,¹¹, Rosa Fernández ¹⁰, Tim De Pooter ^12,¹³, Geert Joris ^12,¹³, Mojca Strazisar ^12,¹³, Jonathan M D Wood ¹⁴, Katie E Herron ¹⁵, Ole Seehausen ^16,¹⁷, Phillip C Watts ¹⁸, Felix Shaw ¹⁹, Robert P Davey ¹⁹, Alice Minotto ²⁰, José M Fernández ²¹, Astrid Böhne ²², Carla Alegria ²³, Tyler Alioto ^24,²⁵, Paulo C Alves ^26,^27,²⁸, Isabel R Amorim ²⁹, Jean-Marc Aury ³⁰, Niclas Backstrom ³¹, Petr Baldrian ³², Laima Baltrunaite ³³, Endre Barta ³⁴, Bertrand BedHom ³⁵, Caroline Belser ³⁰, Johannes Bergsten ^36,³⁷, Laurie Bertrand ³⁸, Helena Bilandija ³⁹, Mahesh Binzer-Panchal ^40,^41,⁴², Iliana Bista ^43,^44,⁴⁵, Mark Blaxter ¹⁴, Paulo A V Borges ²⁹, Guilherme Borges Dias ^40,^41,⁴², Mirte Bosse ^46,^47,⁴⁸, Tom Brown ^5,^6,^49,⁵⁰, Rémy Bruggmann ⁵¹, Elena Buena-Atienza ^52,⁵³, Josephine Burgin ⁵⁴, Elena Buzan ^55,⁵⁶, Alessia Cariani ⁵⁷, Nicolas Casadei ^52,⁵³, Matteo Chiara ^58,⁵⁹, Sergio Chozas ^23,⁶⁰, Fedor Čiampor Jr ⁶¹, Angelica Crottini ^26,^27,²⁸, Corinne Cruaud ³⁸, Fernando Cruz ^24,²⁵, Love Dalen ^62,^63,⁶⁴, Alessio De Biase ⁶⁵, Javier del Campo ¹⁰, Teo Delic ⁶⁶, Alice B Dennis ⁶⁷, Martijn F L Derks ⁴⁷, Maria Angela Diroma ³, Mihajla Djan ⁶⁸, Simone Duprat ³⁰, Klara Eleftheriadi ¹⁰, Philine G D Feulner ⁶⁹, Jean-François Flot ⁷⁰, Giobbe Forni ⁵⁷, Bruno Fosso ⁷¹, Pascal Fournier ⁷², Christine Fournier-Chambrillon ⁷², Toni Gabaldon ^73,^74,^75,⁷⁶, Shilpa Garg ⁷⁷, Carmela Gissi ^59,^71,⁷⁸, Luca Giupponi ^79,⁸⁰, Jessica Gomez-Garrido ^24,²⁵, Josefa González ¹⁰, Miguel L Grilo ^81,⁸², Björn Grüning ⁸³, Thomas Guerin ³⁰, Nadege Guiglielmoni ⁸⁴, Marta Gut ^24,²⁵, Marcel P Haesler ^16,¹⁷, Christoph Hahn ⁸⁵, Balint Halpern ^86,^87,⁸⁸, Peter W Harrison ⁵⁴, Julia Heintz ^40,^41,⁴², Maris Hindrikson ⁸⁹, Jacob Höglund ⁹⁰, Kerstin Howe ¹⁴, Graham M Hughes ^15,⁹¹, Benjamin Istace ³⁰, Mark J Cock ^92,⁹³, Franc Janžekovič ⁹⁴, Zophonias O Jonsson ⁹⁰, Sagane Joye-Dind ^95,⁹⁶, Janne J Koskimäki ⁹⁷, Boris Krystufek ^98,⁹⁹, Justyna Kubacka ¹⁰⁰, Heiner Kuhl ¹⁰¹, Szilvia Kusza ¹⁰², Karine Labadie ³⁸, Meri Lähteenaro ^36,³⁷, Henrik Lantz ^40,^41,⁴², Anton Lavrinienko ¹⁰³, Lucas Leclère ¹⁰⁴, Ricardo Jorge Lopes ^23,¹⁰⁵, Ole Madsen ⁴⁷, Ghislaine Magdelenat ³⁸, Giulia Magoga ¹⁰⁶, Tereza Manousaki ¹⁰⁷, Tapio Mappes ¹⁸, Joao Pedro Marques ^26,²⁸, Gemma I Martinez Redondo ¹⁰, Florian Maumus ¹⁰⁸, Shane A McCarthy ^109,¹¹⁰, Hendrik-Jan Megens ⁴⁷, Jose Melo-Ferreira ^26,^28,¹¹¹, Sofia L Mendes ²³, Matteo Montagna ^106,¹¹², Joao Moreno ^23,¹¹³, Mai-Britt Mosbech ^40,^41,⁴², Mónica Moura ^114,¹¹⁵, Zuzana Musilova ¹¹⁶, Eugene Myers ^49,⁵⁰, Will J Nash ¹⁹, Alexander Nater ⁵¹, Pamela Nicholson ¹¹⁷, Manuel Niell ¹¹⁸, Reindert Nijland ¹¹⁹, Benjamin Noel ²⁹, Karin Noren ⁶², Pedro H Oliveira ³⁰, Remi-Andre Olsen ¹²⁰, Lino Ometto ^121,¹²², Rebekah A Oomen ^123,¹²⁴, Stephan Ossowski ^125,^126,¹²⁷, Vaidas Palinauskas ¹²⁸, Snaebjorn Palsson ⁹⁰, Jerome P Panibe ¹²⁹, Joana Pauperio ⁵⁴, Martina Pavlek ³⁹, Emilie Payen ³⁸, Julia Pawlowska ¹³⁰, Jaume Pellicer ¹³¹, Graziano Pesole ¹³², Joao Pimenta ^26,¹¹⁰, Martin Pippel ^40,^41,⁴², Anna Maria Pirttilä ⁹⁷, Nikos Poulakakis ^133,¹³⁴, Jeena Rajan ⁵⁴, Rúben MC Rego ^114,¹¹⁵, Roberto Resendes ¹³⁵, Philipp Resl ⁸⁵, Ana Riesgo ¹³⁶, Patrik Rodin-Morch ¹³⁷, Andre E R Soares ^40,^41,⁴², Carlos Rodriguez Fernandes ^23,¹³⁸, Maria M Romeiras ^23,^139,¹⁴⁰, Guilherme Roxo ^114,¹¹⁵, Lukas Rüber ^16,¹⁴¹, Maria Jose Ruiz-Lopez ^142,¹⁴³, Urmas Saarma ⁸⁹, Luis P da Silva ^26,²⁸, Manuela Sim-Sim ^23,^144,¹⁴⁵, Lucile Soler ^40,^41,⁴², Vitor C Sousa ^23,¹⁴⁶, Carla Sousa Santos ¹¹³, Alberto Spada ¹⁴⁷, Milomir Stefanovic ⁶⁸, Viktor Steger ¹⁴⁸, Josefin Stiller ¹⁴⁹, Matthias Stöck ¹⁰¹, Torsten H Struck ¹⁵⁰, Hiranya Sudasinghe ^141,¹⁵¹, Riikka Tapanainen ¹⁵², Christian Tellgren-Roth ^40,^41,⁴², Helena Trindade ^23,¹⁴⁵, Yevhen Tukalenko ¹⁵³, Ilenia Urso ⁵⁹, Benoit Vacherie ³⁸, Steven M Van Belleghem ¹⁵⁴, Kees Van Oers ¹⁵⁵, Carlos Vargas-Chavez ¹⁰, Nevena Velickovic ⁶⁸, Noel Vella ¹⁵⁶, Adriana Vella ¹⁵⁶, Cristiano Vernesi ¹⁵⁷, Sara Vicente ^23,¹⁵⁸, Sara Villa ^159,¹⁶⁰, Olga Vinnere Pettersson ^40,^41,⁴², Filip A M Volckaert ¹⁶¹, Judit Voros ¹⁶², Patrick Wincker ³⁰, Sylke Winkler ¹¹⁶, Claudio Ciofi ³, Robert M Waterhouse ^95,⁹⁶, Camila J Mazzoni ^5,⁶

¹Genomics Institute, University of California, Santa Cruz, CA USA

²The Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA

³Department of Biology, University of Florence, Sesto Fiorentino, Italy

⁴InBios-Conservation Genetics Laboratory, University of Liege, Liege, Belgium

⁵Leibniz Institut für Zoo und Wildtierforschung, Berlin, Germany

⁶Berlin Center for Genomics in Biodiversity Research, Berlin, Germany

⁷Department of Biology, University of Antwerp, Antwerp, Belgium

⁸Institute of Zoology, University of Cologne, Cologne, Germany

⁹Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy

¹⁰Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona, Spain

¹¹Naturalis Biodiversity Center, Leiden, The Netherlands

¹²Neuromics Support Facility, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium

¹³Neuromics Support Facility, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium

¹⁴Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK

¹⁵School of Biology and Environmental Science, University College Dublin, Belfield, Ireland

¹⁶Aquatic Ecology & Evolution, Institute of Ecology & Evolution, University of Bern, Bern, Switzerland

¹⁷Department of Fish Ecology & Evolution, Eawag, Kastanienbaum, Switzerland

¹⁸Department of Biological and Environmental Science, University of Jyvaskyla, Jyvaskyla, Finland

¹⁹The Earlham Institute, Norwich Research Park, Norwich, UK

²⁰Digital Science, London, UK

²¹Barcelona Supercomputing Center; Spanish National Bioinformatics Institute, ELIXIR Spain, Getafe, Spain

²²Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig Bonn, Bonn, Germany

²³CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE—Global Change and Sustainability Institute, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa Portugal

²⁴Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain

²⁵Universitat de Barcelona (UB), Barcelona, Spain

²⁶CIBIO, Centro de Investigacao em Biodiversidade e Recursos Geneticos, InBIO Laboratorio Associado, Universidade do Porto, Vairao, Portugal

²⁷Departamento de Biologia, Faculdade de Ciencias, Universidade do Porto, Porto, Portugal

²⁸BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairao, Vairao, Portugal

²⁹University of the Azores, cE3c—Centre for Ecology, Evolution and Environmental Changes, Azorean Biodiversity Group, CHANGE—Global Change and Sustainability Institute, Rua Capitão João d´Ávila, Pico da Urze, Angra do Heroísmo, Portugal

³⁰Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France

³¹Evolutionary Biology Program, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden

³²Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic

³³Nature Research Centre, Debrecen, Hungary

³⁴Institute of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary

³⁵Institut de Systematique, Evolution, Biodiversite, Museum National d Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France

³⁶Department of Zoology, Swedish Museum of Natural History, Stockholm, Sweden

³⁷Department of Zoology, Faculty of Science, Stockholm University, Stockholm, Sweden

³⁸Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France

³⁹Ruder Boskovic Institute, Zagreb, Croatia

⁴⁰SciLifeLab, Solna, Sweden

⁴¹Uppsala University, Uppsala, Sweden

⁴²National Bioinformatics Infrastructure Sweden, Uppsala, Sweden

⁴³Senckenberg Research Institute, Frankfurt, Germany

⁴⁴LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany

⁴⁵Wellcome CRUK Gurdon Institute, University of Cambridge, Cambridge, UK

⁴⁶VU University Amsterdam, Amsterdam, The Netherlands

⁴⁷Animal Breeding & Genomics, Wageningen University & Research, Wageningen, The Netherlands

⁴⁸Wageningen University & Research, Wageningen, The Netherlands

⁴⁹Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany

⁵⁰DRESDEN concept Genome Center, Dresden, Germany

⁵¹Interfaculty Bioinformatics Unit and Swiss Institute of Bioinformatics, University of Bern, Bern, Switzerland

⁵²Institute of Medical Genetics and Applied Genomics, University of Tubingen, Tubingen, Germany

⁵³NGS Competence Center Tubingen, Tubingen, Germany

⁵⁴European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK

⁵⁵University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia

⁵⁶Faculty of Environmental Protection, Velenje, Slovenia

⁵⁷Department of Biological, Geological and Environmental Sciences, Alma Mater Studiorum Universitá di Bologna, Bologna, Italy

⁵⁸Department of Biosciences, Università degli Studi di Milano, Milan, Italy

⁵⁹Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, Bari, Italy

⁶⁰Sociedade Portuguesa de Botânica, Lisbon, Portugal

⁶¹Department of Biodiversity and Ecology, Plant Science and Biodiversity Centre Slovak Academy of Sciences, Bratislava, Slovakia

⁶²Department of Zoology, Stockholm University, Stockholm, Sweden

⁶³Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden

⁶⁴Centre for Palaeogenetics, Stockholm, Sweden

⁶⁵Department of Biology and Biotechnologies, Sapienza University of Rome, Rome, Italy

⁶⁶University of Ljubljana, Biotechnical Faculty, Department of Biology, Ljubljana, Slovenia

⁶⁷University of Namur, Department of Biology, URBE, ILEE, Namur, Belgium

⁶⁸Department of Biology and Ecology, University of Novi Sad, Novi Sad, Serbia

⁶⁹Eawag Swiss Federal Institute of Aquatic Science and Technology, Department of Fish Ecology & Evolution, Kastanienbaum, Switzerland

⁷⁰Department of Organismal Biology, Universite libre de Bruxelles, Brussels, Belgium

⁷¹Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy

⁷²Groupe de Recherche et d Etude pour la Gestion de l Environnement, Villandraut, France

⁷³Barcelona Supercomputing Centre (BSC), Barcelona, Spain

⁷⁴Institute for Research in Biomedicine (IRB), Barcelona, Spain

⁷⁵Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain

⁷⁶CIBERINFEC, Instituto Carlos III, Barcelona, Spain

⁷⁷NNF Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark

⁷⁸CoNISMa, Consorzio Nazionale Interuniversitario per le Scienze del Mare, Roma, Italy

⁷⁹Centre of Applied Studies for the Sustainable Management and Protection of Mountain Areas CRC Ge.S.Di.Mont., University of Milan, Milan, Italy

⁸⁰Department of Agricultural and Environmental Sciences-Production, Landscape and Agroenergy DiSAA, University of Milan, Milan, Italy

⁸¹Marine and Environmental Sciences Centre, Aquatic Research Network, Instituto Universitário de Ciências Psicológicas, Sociais e da Vida, Lisboa, Portugal

⁸²Egas Moniz Center for Interdisciplinary Research (CiiEM), Egas Moniz School of Health & Science, Caparica, Portugal

⁸³Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany

⁸⁴University of Cologne, Cologne, Germany

⁸⁵Department of Biology, University of Graz, Graz, Austria

⁸⁶MME BirdLife Hungary, Budapest, Hungary

⁸⁷Doctoral School of Biology, Department of Systematic Zoology and Ecology, Institute of Biology, ELTE Eotvos Lorand University, Budapest, Hungary

⁸⁸HUN-REN-ELTE-MTM Integrative Ecology Research Group, Budapest, Hungary

⁸⁹Department of Zoology, Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia

⁹⁰Institute of Life and Environmental Sciences, University of Iceland, Reykjavik, Iceland

⁹¹UCD Conway Institute, University College Dublin, Belfield, Ireland

⁹²Algal Genetics Group, UMR 8227, CNRS, Sorbonne Universite, UPMC University Paris 06, Paris, France

⁹³France Integrative Biology of Marine Models, Station Biologique de Roscoff, Roscoff, France

⁹⁴University of Maribor, Faculty of Natural Sciences and Mathematics, Maribor, Slovenia

⁹⁵Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland

⁹⁶Swiss Institute of Bioinformatics, Lausanne, Switzerland

⁹⁷Ecology and Genetics Research Unit, University of Oulu, Oulu, Finland

⁹⁸Slovenian Museum of Natural History, Ljubljana, Slovenia

⁹⁹Science and Research Centre Koper, Koper, Slovenia

¹⁰⁰Museum and Institute of Zoology, Polish Academy of Sciences, Warsaw, Poland

¹⁰¹Department IV Fish Biology, Fisheries and Aquaculture, Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

¹⁰²University of Debrecen, Centre for Agricultural Genomics and Biotechnology, Debrecen, Hungary

¹⁰³Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zurich, Zurich, Switzerland

¹⁰⁴Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Banyuls-sur-Mer, France

¹⁰⁵MHNC-UP, Natural History and Science Museum of the University of Porto, Porto, Portugal

¹⁰⁶Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy

¹⁰⁷Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Heraklion, Crete, Greece

¹⁰⁸Universite Paris Saclay, INRAE, URGI, Versailles, France

¹⁰⁹Department of Genetics, University of Cambridge, Cambridge, UK

¹¹⁰Wellcome Sanger Institute, Cambridge, UK

¹¹¹Departamento de Biologia, Faculdade de Ciencias da Universidade do Porto, Porto, Portugal

¹¹²Interuniversity Center for Studies on Bioinspired Agro Environmental Technology, University of Naples Federico II, Naples, Italy

¹¹³MARE Marine and Environmental Sciences Centre, ARNET Aquatic Research Network, Lisboa, Portugal

¹¹⁴CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Pólo dos Açores; Faculdade de Ciências e Tecnologia, Universidade dos Açores, Ponta Delgada, Portugal

¹¹⁵UNESCO, Chair Land Within Sea Biodiversity & Sustainability in Atlantic Islands, Portugal

¹¹⁶Department of Zoology, Faculty of Science, Charles University, Prague, Czech Republic

¹¹⁷Next Generation Sequencing Platform, University of Bern, Bern, Switzerland

¹¹⁸Andorra Research and Innovation, Sant Julià de Lòria, Andorra

¹¹⁹Marine Animal Ecology Group, Wageningen University and Research, Wageningen, The Netherlands

¹²⁰Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden

¹²¹Department of Biology and Biotechnology, University of Pavia, Pavia, Italy

¹²²National Biodiversity Future Center, Palermo, Italy

¹²³Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway

¹²⁴University of New Brunswick Saint John, Saint John, New Brunswick, Canada

¹²⁵Institute for Medical Genetics and Applied Genomics, University of Tubingen, Tubingen, Germany

¹²⁶NGS Competence Center Tubingen (NCCT), University of Tubingen, Tubingen, Germany

¹²⁷Institute for Bioinformatics and Medical Informatics (IBMI), University of Tubingen, Tubingen, Germany

¹²⁸Nature Research Centre, Vilnius, Lithuania

¹²⁹Biodiversity Research Center, Academia Sinica, Taipei, Taiwan

¹³⁰Faculty of Biology, University of Warsaw, Warsaw, Poland

¹³¹Institut Botànic de Barcelona, IBB (CSIC-CMCNB), Passeig del Migdia s.n., Parc de Montjüic, Barcelona, Spain

¹³²University of Bari Aldo Moro, Department of Biosciences, Biotechnology and Environment; Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, Bari, Italy

¹³³Department of Biology, School of Sciences and Engineering, University of Crete, Voutes University Campus, Irakleio, Greece

¹³⁴Natural History Museum of Crete, School of Sciences and Engineering, University of Crete, Irakleio, Greece

¹³⁵Universidade dos Acores, Departamento de Biologia, Ponta Delgada, Portugal

¹³⁶Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales, Madrid, Spain

¹³⁷Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden

¹³⁸Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal

¹³⁹Linking Landscape, Environment, Agriculture and Food, Associated Laboratory TERRA, Instituto Superior de Agronomia, Universidade de Lisboa, Lisboa, Portugal

¹⁴⁰Portugal Centre for Ecology, Evolution and Environmental Changes, Lisbon, Portugal

¹⁴¹Naturhistorisches Museum Bern, Bern, Switzerland

¹⁴²Departamento de Biología de la Conservación y Cambio Global, Estación Biológica de Doñana (EBD), CSIC, Sevilla, Spain

¹⁴³CIBER of Epidemiology and Public Health, Granada, Spain

¹⁴⁴Museu Nacional de História Natural e da Ciência, Lisboa, Portugal

¹⁴⁵Departamento de Biologia Vegetal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal

¹⁴⁶Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal

¹⁴⁷Department of Agricultural and Environmental Sciences Production, Landscape, Agroenergy, University of Milan, Milan, Italy

¹⁴⁸Department of Genetics and Genomics, Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Godollo, Hungary

¹⁴⁹Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark

¹⁵⁰Natural History Museum, University of Oslo, Blindern, Oslo Norway

¹⁵¹Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland

¹⁵²University of Eastern Finland, Kuopio, Finland

¹⁵³Institute for Nuclear Research of the NAS of Ukraine, Kyiv, Ukraine

¹⁵⁴Ecology, Evolution and Conservation Biology, Department of Biology, KU Leuven, Leuven, Belgium

¹⁵⁵Department of Animal Ecology, Netherlands Institute of Ecology, Wageningen, The Netherlands

¹⁵⁶Conservation Biology Research Group, Department of Biology, University of Malta, Msida, Malta

¹⁵⁷Forest Ecology Unit, Research and Innovation Centre-Fondazione Edmund Mach, San Michele All’Adige, Italy

¹⁵⁸ERISA Escola Superior de Saúde Ribeiro Sanches, IPLUSO, Lisboa, Portugal

¹⁵⁹Institute for Sustainable Plant Protection, National Research Council, Sesto Fiorentino, Italy

¹⁶⁰Department of Agricultural and Environmental Sciences, University of Milan via Giovanni Celoria 2, Milan, Italy

¹⁶¹Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium

¹⁶²Department of Zoology, Hungarian Natural History Museum, Budapest, Hungary

^✉

Corresponding author.

Contributed equally.

PMCID: PMC11408602 PMID: 39289538

Abstract

A genomic database of all Earth’s eukaryotic species could contribute to many scientific discoveries; however, only a tiny fraction of species have genomic information available. In 2018, scientists across the world united under the Earth BioGenome Project (EBP), aiming to produce a database of high-quality reference genomes containing all ~1.5 million recognized eukaryotic species. As the European node of the EBP, the European Reference Genome Atlas (ERGA) sought to implement a new decentralised, equitable and inclusive model for producing reference genomes. For this, ERGA launched a Pilot Project establishing the first distributed reference genome production infrastructure and testing it on 98 eukaryotic species from 33 European countries. Here we outline the infrastructure and explore its effectiveness for scaling high-quality reference genome production, whilst considering equity and inclusion. The outcomes and lessons learned provide a solid foundation for ERGA while offering key learnings to other transnational, national genomic resource projects and the EBP.

Subject terms: Scientific community, Eukaryote, Genome, Genomics, Sequencing

Background

Reference genomes as a key biodiversity genomics tool

In the midst of the Earth’s sixth mass extinction, species worldwide are declining at an unprecedented rate¹ directly impacting ecosystem functioning and services², human health³ and our resilience to climate disturbances⁴. Biodiversity and ecosystem decline^5,6, loss and degradation raise the prospect that much, if not most, of the Earth’s biodiversity will be lost forever before they can be genomically explored—analogous to the ‘dark extinctions’ in the pre-taxonomic period⁷. Our ability to genomically characterise and investigate the species that span the tree of life, and their ecosystems, can help not only scientifically inform decision making processes to flatten the biodiversity extinction curve⁸, but also can unlock diverse genetic-, species- and ecosystem-level⁹ discoveries that can be used for human health, bioeconomy stimulation, food sovereignty, biosecurity amongst many more.

As genomic sequencing has become increasingly cost effective and the platforms and computational algorithms become more technically efficient, many biodiversity genomics tools have become available to expedite the investigation of both known and unknown species e.g. DNA barcoding, genome skimming, reduced representation sequencing, transcriptome sequencing, and whole genome sequencing for reference genome production¹⁰. Reference genomes (Supplementary glossary) are one such tool that offers an unparalleled, scalable, and increasingly cost-effective high-resolution insight into species, and their accessibility has made the construction of a planetary-wide genomic database of all eukaryotic life a more realistic endeavour¹¹.

To date, reference genomes do not exist for most of eukaryotic life. For instance, the largest genomics data repository, the International Nucleotide Sequence Database Collaboration (INSDC), has genome-wide DNA sequence information for just 6480 eukaryotic species (about 0.43% of described species) of which over 63% (4082) are short-read based (draft quality)¹¹ and most are variable in terms of sequence quality, data type, data volume, associated voucher samples, completeness of metadata and protocol reproducibility^12–14. Building from this, the biodiversity research community is pushing to expand beyond reference genome production alone and toward the production of a complete reference resource for each species. A complete reference resource includes a reference genome, an annotation, all metadata, and associated ex-situ samples (voucher(s) and cryopreserved specimen(s)). Complete reference resources are necessary to unlock the plurality of possible scientific enquiries beyond the scope of any singular research project⁹. However, the scientific enquiries that can be realised from reference resources¹⁵ are limited in scope due in large to a current lack of standardisation across the multitude of actors involved throughout the production of complete reference resources.

The state of reference genome production today

After two decades of uncoordinated and unstandardised biodiversity genomics sequencing data production (e.g. with little coordination among individual research laboratories or projects), the Earth BioGenome Project (EBP)¹¹ was established. The goal of the EBP is to create a global network of biodiversity genomics researchers that share a mission to produce a database of openly accessible, standardised, and complete reference resources that span the whole eukaryotic phylogenetic tree. The project has a three-phase approach and to date (Phase I) has produced ~1213 reference genomes for species across ~1010 genera¹⁶. However the rate of production is fast increasing, for instance in 2022 over 316 reference genomes were produced and in the coming years, the rate is estimated to increase by at least 10 fold. It is important to acknowledge that during this initial phase, 910 reference genomes were produced by a single affiliated project, the Darwin Tree of Life¹⁷, and a further 120 by the Wellcome Sanger Tree of Life Programme (https://www.sanger.ac.uk/programme/tree-of-life/). As the EBP approaches Phase II where 150,000 reference resources for species are planned, the status quo centralised approach poses significant challenges for scaling up reference genome production. Additionally, it raises important concerns regarding inclusion, accessibility, equity, and fairness.

The goal of building a decentralised model embracing all of Europe and beyond

Given these limitations, the European node of the EBP, the European Reference Genome Atlas (ERGA) (Box 1) set out to develop and implement a pilot decentralised infrastructure that would act to test the effectiveness of the approach in creating and scaling reference genomic resources for Europe’s eukaryotes.

A decentralised approach for the production of genomic reference resources for ERGA supports: 1) an expansion in the diversity of expertise, processes and innovative ideas that can act synergistically to accelerate scientific outcomes, 2) a platform for accessible, equitable, and standard production of, high quality, ethically and legally compliant reference genomic resources, 3) streamlined communication and opportunities for new collaborations to be fostered, 4) an expansion of funding opportunities, 5) mitigation of hierarchical power imbalances, 6) increased access to up-to-date and reproducible tools and workflows, and 7) increased downstream analyses applications.

The ambition of the pilot test was to identify the challenges in constructing and implementing a decentralised infrastructure, but also to understand and find solutions on how best to support the inclusion of ERGA members who face a multitude of different realities whilst participating e.g. resource availability, geographic, and political positioning. The lessons learned from this initial pilot can certainly be used by ERGA to inform future developments, but can also be used to inform the broader EBP strategy as to whether decentralised approaches are effective in the production of reference genomic resources that meet with EBP minimum standards.

The first step towards decentralisation was to create a pan-European network of existing sequencing centres, biobanks, and museum collections that were willing to participate and provide diverse support options for sample storage, wet lab preparation, sequencing, and data handling and storage. The second step was to obtain adequate funding to support the development and implementation of the infrastructure. Here, no central source of funding was available and so the majority of funds were acquired through the grassroots efforts of individual ERGA members contributing to the pilot test as well as a plethora of partnering institutions (Supplementary Table 1). In many cases, researchers completely, or partially financed their participation in the pilot test. In other cases, sequencing partners contributed their own grant funds to completely cover or offer heavy discounts for the cost of library preparation, sequence data production and/or assembly services whilst also covering the costs of the scientific personnel within their facilities to participate in the pilot test. In addition, collaborations were fostered with commercial sequencing companies to obtain in-kind contributions that could be used to support those researchers who wished to participate but deserved financial support. All in-kind contributions were shipped to three established ERGA Hubs, two ERGA Library Preparation Hubs (University of Antwerp, Belgium and the Metazoa Phylogenomics Lab at the Institute of Evolutionary Biology (CSIC-UPF) in Barcelona, Spain) and one ERGA Sequencing Hub (University of Florence, Italy).

Box 1 The European Reference Genome Atlas.

As the European node of the Earth BioGenome Project (EBP; https://www.earthbiogenome.org/)¹¹, the mission of the European Reference Genome Atlas is to coordinate the generation of high-quality reference genomes for all eukaryotic life across Europe⁶⁰. At the core of this mission is ensuring the implementation of an inclusive, accessible, and distributed genomic infrastructure that supports the inclusion of all who wish to participate, advances scientific excellence and data-sharing best practices, and increases taxonomic, geographic, and habitat representation of sequenced species in a balanced manner. Embracing diversity in this way brings opportunities for ERGA to build a genomic infrastructure that can be used by the large network of biodiversity researchers and also foster new international and transdisciplinary collaborations.

The organisational structure of ERGA currently comprises the governing body of the Council of Country/Regional Representatives, with actions developed and implemented by the Executive Board and nine expert Committees, with participation from the large network of members (https://www.erga-biodiversity.eu/). With over 750 members spanning 38 countries, one regional ERGA-affiliated project, and 234 institutions, ERGA is currently the largest initiative of its kind in the world. ERGA membership is open to all who wish to engage in the sequencing of European eukaryotes, foster new collaborations in and beyond Europe, and learn about the most up-to-date technologies for generating reference genomes for species (individuals interested in becoming a member can register through the ERGA website).

Development of a decentralised infrastructure

Overall, from the 33 countries (17 Widening countries (Supplementary glossary)) and regions, 98 species were included in the pilot test (Fig. 2a). However despite efforts made during the prioritisation process, the dispersion of species selected was not equal across countries predominantly due to the acceptance of additional species after nomination closure (Fig. 2b). Nine iterative steps were developed to support the production of a complete reference genomics resource for each of the species included into the pilot project (Fig. 1).

Fig. 1 — . Establishing an inclusive, accessible, distributed and pan-European genomic infrastructure that could support the streamlined and scalable production of genomic resources for all European species.

Genome team establishment

After a successful nomination, including a species into the ERGA infrastructure was reliant on the creation of a ‘genome team’. A genome team is a transdisciplinary group of researchers that have a shared interest in a particular species and assume the shared responsibility of shepherding this species through each of the infrastructure’s steps. Each team member has an assigned role (Fig. 1, Supplementary Table 2). Further, all teams were strongly encouraged to include both national and international members and all teams were overseen by the ‘Principle Investigator’ and a ‘Sample Ambassador’ who was ideally from the country of origin of the focal species. The role of the sample ambassador was to coordinate the species project, and to ensure the continuous communication across the team members. In total, 98 genome teams were established and each had at least one international team member, 23% having three members, and 26% having >five members (n = 93) (Fig. 2b). A total of 76 genome team sample ambassadors were comfortable sharing their self-declared sex, (only ‘male’ and ‘female’ were proposed as choices) from this subset, 63 (16%) self-identified as male and 36 (84%) as female (Fig. 2b). To ensure compliance with GDPR regulations, no other data was collected to assess representation by other critically important dimensions of diversity e.g. race, ethnicity, religion, sexual orientation or their intersections. Hence, ERGA does not currently have any means to evaluate its inclusiveness beyond sex and it is likely that it suffers the same lack of racial representation and inclusion that characterises European science at large¹⁸.

Building a representative species list

Prior to developing and testing the decentralised infrastructure (Fig. 1), we first needed to consider the species that would test it. For this, a nomination form was issued for completion by all ERGA members that were willing to contribute samples for a species. The form collected information on genome properties, vouchering, habitat and sampling, conservation status, permit prerequisites, sample properties, species identification, and sex (https://treeofsex.sanger.ac.uk/)¹⁹ for each suggested species²⁰. To prioritise nominations, a scoring system was applied based on several feasibility criteria: small genome size (<1 Gb), an ease of availability, possibility for being freshly collected and flash frozen, >1 g of tissue, a well-established nucleic acid extraction protocol, a specimen voucher present, no species identification ambiguity, all necessary permits existing, and no restrictions on export²⁰. ERGA council representatives were given the prioritised species list and asked to select three species per predefined ERGA target category (pollinators, freshwater species and endangered/iconic) from the nominations from members within their country. After nomination form closure, many additional species were nominated by ERGA members. However, only nominations that fulfilled all of the selection criteria, had funding available, and/or were from a country not yet represented were accepted for inclusion into the test.

Developing a communication and coordination strategy

The nature of the infrastructure constructed required streamlined communication between ERGA genome teams and partnering sequencing facilities spanning large geographic distances. To facilitate this, we created avenues to maximise continuous communication both in and outside of the ERGA community. In partnership with Ensembl at EMBL’s European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/)²¹, we built an ERGA Data Portal (https://portal.erga-biodiversity.eu/) to provide a comprehensive overview of all ERGA data. The portal provides a powerful and intuitive ability to search over each ERGA metadata, genomic dataset, assembly and annotation, with filters for component project, sequencing status and taxonomy. Additionally, an interactive phylogeny provides another route to exploring available species and can display ERGA species sequenced at any taxonomic level. We developed the current portal rapidly to support the goals of the pilot test, but it will be continually and iteratively improved to enhance usability, for example by potentially adding species imagery and distribution ranges, Ensembl²² and community annotations, interactive geographic map searches, and cross referencing to key resources such as the Global Biodiversity Information Facility (https://www.gbif.org/) and climate data. Progress data is continuously shared through the portal's public tracking pages (https://portal.erga-biodiversity.eu/status_tracking) and the GoAT database¹⁶ https://goat.genomehubs.org/projects/ERGA-PIL).

Developing a training and knowledge transfer strategy

Investing in building competency is important if ERGA is to provide scientists across disciplines, experience levels, demographic sectors of society, and geographies with equitable opportunities to leverage and benefit from the use of the enormous volume of data expected to be generated through ERGA, but also other large biodiversity genomics initiatives including and especially those in parts of the world where economic opportunities are much more limited. However, a significant gap remains in expertise between countries due to the diverse nature of resource availability, genomic research capacity and capability, and access to state-of-the-art training (Box 2). To increase the accessibility and stimulate the use of existing infrastructure within ERGA across all the infrastructure steps, efforts were made to share expertise through conference participation, webinar organisation and through organising hands-on training workshop opportunities. For instance, many ERGA members participated in a BioHackathon to integrate new genome assembly methods into an openly accessible Galaxy pipeline and worked on the development of robust user guidance²³. In addition, we organised a virtual workshop entitled ‘Building high-quality reference genome assemblies of eukaryotes’ as part of the European Conference in Computational Biology 2022²⁴ and now freely available online to further educate researchers in best practices for genome assembly. We also organised a webinar on ‘Access and Benefit-Sharing’ with the National Focal Points across Europe to help genome team sample ambassadors to understand their Nagoya permitting obligations during the sample collection stage of the project and organised an online workshop on structural genome annotation with BRAKER & TSEBRA.

An online workshop was also organised to train pilot genome teams to identify the external actors (international, national, and local levels) involved in their reference genome project. During this training, we conducted a stakeholder and rightsholder, herein interested parties, mapping exercise, and examined sample ambassador perceptions of how to interact with interested parties across high and low GBARD (government budget allocations for R&D) countries. The results indicated that researchers did not categorise their project’s interested parties differently (X²(3, (153 − 130)) = 5.66, p = 0.12) (Supplementary Fig. 6) depending on whether they were situated in a low or high GBARD country. However, there does appear to be a tendency in the ‘Consult’ category (df = 1, p = 0.08), suggesting that researchers located in low GBARD countries may place a higher value on the involvement and collaboration of interested parties as opposed to those located in high GBARD countries (Supplementary Fig. 6).

Box 2 opportunities for training & knowledge transfer.

During an EMBO Practical Course ‘Hands-on course in genome sequencing, assembly and downstream analyses’ held at the Université libre de Bruxelles (ULB), Belgium (https://meetings.embo.org/event/22-gen-seq-analysis), the organisers chose to use the endophytic yeast Debaromyces sp. RF-E1 (13 Mb) for sequencing during the course. Microorganisms are excellent objects for genome sequencing and bioinformatics teaching due to their small genome size (making it possible to try many workflows and sets of parameters). The genus Debaryomyces comprises species of extremophilic yeasts, some of which support plant health by modulating pathogen invasion^61,62. A high-quality reference genome will help study the impacts of radiation on this genome and elucidate the adaptive potential of host-microbe interactions. The yeast was isolated from a silver birch tree in the Red Forest, one of the most radioactive areas in the Chernobyl Exclusion Zone (CEZ) in Ukraine⁶³. Anthropogenic stresses caused by radionuclide contamination can adversely affect organism health through genotoxicity^24,64. Although symbiotic interactions with endophytic microorganisms can facilitate a host’s capacity to adapt and persist under such environmental stress⁶⁵, little is known about radiation exposure’s impact on these endophytic interactions. ONT genomic and cDNA sequencing was performed during the course, then the data were assembled with Flye⁶⁶ and annotated with BRAKER^67,68 by the course participants. The pedagogy of the EMBO course effectively combined hands-on research training with the necessary theoretical framing to support active learning of participants. Feedback by course participants was extremely positive, and as a result, a second EMBO-funded Practical Course will be organised by the same team in 2024 (this time in Valencia, Spain). In addition to providing participants with a realistic insight into the research process, the training also created a suite of high-quality publicly available genomic resources for the yeast species sequence that will be directly useful to the sample provider’s ongoing research, but also to potentially many more researchers. This successful teaching-through-research model will inform future ERGA training and capacity-building activities at locations across Europe and beyond.

Technical workflows

Pre-sampling requirements

Supporting genome team compliance with all relevant ethical and legal customary, local, regional, national, and international obligations was a priority during the infrastructure development process. Through ERGA expert committees, namely the Ethics, Legal and Social Issues (ELSI) Committee and the Sampling and Sample Processing (SSP) Committee, comprehensive documentation was developed including a ‘Sampling Code of Best Practice’ and ‘Guidelines on implementing the Traditional Knowledge and Biocultural Labels and Notices when partnering with Indigenous Peoples and Local Communities (IPLC)’^20,25. The Traditional Knowledge (TK) and Biocultural BC) Label and Notice implementation and guideline documentation was developed through a funded partnership (European Open Science Cloud Grant) with representatives of the Global Indigenous Data Alliance (https://www.gida-global.org/), Local Context Hub (https://localcontexts.org/) and the Research Data Alliance (https://www.rd-alliance.org/node/77186). Complying with this documentation was mandatory as it codifies the official ERGA standards for how to ethically and legally collect samples, as well as how to responsibly engage all interested parties (Supplementary glossary). In addition, educational webinars were used as a researcher capacity-building tool, providing more general information on pertinent topics such as the Nagoya Protocol on Access and Benefit Sharing, and Digital Sequence Information (https://www.youtube.com/@erga-consortium1001).

Sampling and metadata acquisition

During sample collection important metadata concerning the species collection event were expected to be documented by the sample collector. To standardise this process a robust metadata schema was developed, using the DToL metadata schema as a foundation²⁶. The tailored ERGA schema, including unique ERGA specimen identifiers as well as ToLID (https://id.tol.sanger.ac.uk/), was codified into a .csv formatted ‘manifest’ and made publicly available (https://github.com/ERGA-consortium/ERGA-sample-manifest). In tandem, a standard operating procedure document²⁷ was developed to provide details on how to complete all of the 81 validatable manifest fields. Inspired by the Genomic Observatories Metadatabase²⁸, ERGA also developed fields to mandate important information disclosure e.g. permanent unique identifiers (PUID) associated with ex-situ specimens, permits, and Indigenous rights and interests (TK and BC Labels and Notices)^25,29–31. Overall, samples were collected for 98 species spanning 92 genera, 81 families, 61 orders, 26 classes, and 13 phyla (https://goat.genomehubs.org/projects/ERGA-PIL, Fig. 2a). The geographic distribution of samples collected was relatively even, although some countries contributed more species than others (Fig. 2b). Altogether 89% of genome teams (n = 93) reported >90% confidence level in that they had obtained all permits required with ten Nagoya permits and three CITES (Supplementary Note 1) permits being obtained.

Sample manifest submission, validation, ex-situ storage

An accessible and streamlined metadata manifest submission system was implemented to ensure that all ERGA’s sample metadata was accurately validated and promptly submitted into the public archive. To achieve this, a user-friendly and highly customised data and metadata brokering system called Collaborative OPen Omics (COPO) (https://wellcomeopenresearch.org/articles/7-279/v1) was used³². The COPO submission system validated each manifest submitted against an ERGA-provided checklist to standardise and automate entry into the BioSamples public archive. By automating this process it ensured that all species samples collected had a permanent unique identifier (PUID) from BioSamples that can be automatically linked to the associated genomic sequencing data submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena). Additionally, the submission system had the capability to upload permit documentation and supported its immediate transfer to a private and secure location on an internal ERGA data repository (that was built for the purposes of the pilot test) to avoid privacy concerns and data leakages. All documents were subsequently deleted from COPO’s internal servers. The internal data repository itself was constructed in partnership with the Barcelona Supercomputing Centre (BSC; https://www.bsc.es/) and was a Nextcloud instance containing a group folder with a tiered storage system, or HSM (see Supplementary glossary). All ERGA members could request access to the ERGA data repository and upon approval, members were assigned appropriate access privileges depending on their needs (read-, write-, or full file control access). To support repository utilisation, guidelines were developed detailing protocols for data upload/download as well as directory structure, to ensure standardisation, reusability, and interoperability³³.

We highly recommended that both voucher specimen(s) and cryopreserved specimen(s) be associated with all genomic resources produced during the pilot test. To support this we issued supporting guidance for biobanking and vouchering. The vouchering best practices developed recommended the deposition of both a physical and digital e-voucher(s) (high-quality, informative photographs). Through ERGA’s SSP Committee, we also supported genome teams in seek of a permanent collection for voucher deposition and a partnership with the LIB Biobank at Museum Koenig (Bonn) (https://bonn.leibniz-lib.de/en/biobank) was established to support the deposition of cryopreserved samples for those without access to a local biobank. Samples biobanked in LIB were made publicly visible via the international biodiversity biobanking portal (GGBN.org). Although not a mandatory requirement, voucher specimens were provided for 67% of the species (19% digital, 40% physical, and 40% had both physical and digital) and deposited in museum collections across 23 countries (Fig. 2c). Of the specimens, 45% had an associated cryopreserved sample that were stored in 34 biobanks in 22 countries (Fig. 2c). All 98 genome teams successfully completed, validated and uploaded their metadata publicly to BioSamples through the COPO system and manifest submissions are publicly available through the ERGA Data Portal (https://portal.erga-biodiversity.eu/) that provides intuitive search and direct links to all of the data held in the public archives ('Communication and Coordination' Section).

Sample preparation

Sample quality and shipment requirements were formalised for each data-type across ERGA sequencing facility partners, including sample requirements for long reads (Oxford Nanopore Technologies (ONT)/Pacific BioSciences (PacBio)), scaffolding (Omni-C/Hi-C), and annotation (RNA-Seq/IsoSeq) of data. Sample collectors were expected to adhere to the requirements of the ERGA sequencing facility specified and ensure that samples shipped are: 1) of a quality suitable for HMW DNA extraction, and 2) of an appropriate quantity for long-read, proximity ligation and annotation sequence data production. Two ERGA Library Preparation Hubs were established to support genome teams that required resource support for the library preparation of samples prior to sequencing. To increase the likelihood that the HMW DNA of sufficient quantity was obtained for effective sequencing, most library preparation was conducted by partnering sequencing facilities. However, the ERGA Library Preparation Hubs facilitated the production of 99 libraries: 15 libraries for proximity ligation data (Hi-C/Omni-C® kit) that were provided by 27 countries; Eight libraries for PacBio data provided by eight countries; and the remainder were for RNAseq data (Supplementary Tables 3, 4, Supplementary Figs. 1, 2).

Sequencing strategy

A key component and strength of the decentralised infrastructure was the intentional distribution of sequence data production across partnering European sequencing facilities. To initialise these partnerships, a sequencing platform landscape assessment was conducted across all of the countries that had ERGA council representation. This effort assessed the quantity, distribution, and diversity of the sequencing platforms available across Europe and specifically examined their capability to produce long read (PacBio HiFi reads/ONT reads /IsoSeq reads), and short read (Hi-C/Omni-C/RNA-Seq/PCR-free Illumina) sequencing data. This mapping indicated an uneven distribution of sequencing platforms across Europe, and so we decided that any sequencing facility with a platform to produce long read sequencing data could be an ERGA partner. We took this long read data-type agnostic approach to maximise geographic breadth and increase accessibility but also to reduce shipping costs and the likelihood of customs issues. An additional strength was that it could facilitate the development of more standardised and automated approaches for long read technologies that are currently underrepresented in generating genomic references for biodiversity genomics. Supporting a variety of technologies is important as it takes advantage of their individual characteristics (e.g. portability or lower priced solutions) to increase sequencing capability and accessibility in under-resourced countries, regions and institutions in ERGA. In the end, we partnered with a total of 26 sequencing facilities, 17 with PacBio and 9 with ONT sequencing platforms available (Fig. 2c), and documented the minimum sample collection and quality requirements for each partner. Here, we recommended the following data-type volumes for assembly generation: 30X HiFi or 60X ONT, 25X Hi-C (per haplotype) and 25X (per haplotype) Illumina (in cases where ONT data was used), and the following data-type volumes for annotation: total of 100 million reads if >five tissue types are available, or 30 million reads if tissue samples are pooled³⁴. IsoSeq production was not a mandatory requirement but was promoted, where feasible. The pilot test’s 98 species were sequenced across 25 main partnering sequencing facilities (Supplementary Table 1), and additional data was generated by Novogene for four species from the Netherlands and Hungary. 27 species were sequenced using an ONT platform, 75 using the PacBio Sequel II platform, and four by both platforms. For scaffolding and curation purposes, proximity ligation sequencing was highly recommended. A total of 76 species had some form of proximity ligation sequencing conducted, 47 species with Arima-Hi-C (Arima Genomics), 24 species with Dovetail Omni-C® (Dovetail Genomics), and five with Proximo (Phase Genomics) (Fig. 3a). Regardless of the partnering sequencing facility utilised or species being sequenced, the facilities were expected to produce sufficient data to reach at minimum EBP recommendations³⁵. An ERGA Sequencing Hub was also established at the University of Florence (Italy) Genomics Core to support the sequencing of the 99 libraries prepared by the ERGA Library Preparation Hubs (Supplementary Tables 3, 4). Upon sequencing data generation, both genomic and transcriptomic data were shared with the genome teams through the internal ERGA data repository.

a total data production progress across all 98 species included, noting that data not planned/required for 12 species for proximity ligation, and 15 species for annotation data. b species distribution of species with genome assemblies available, both draft and curated assemblies are shown here. The data-type distribution for these species is also supplied. See Supplementary Fig. 3 for complete species tree.

Genome assembly and annotation

A requirement for becoming an ERGA reference genome was that the genome assembly reached, at minimum, the EBP standard for assembly quality³⁵. To ensure the infrastructure supported the production of genomic references to this standard, we developed assembly guidelines with workflows tailored for both ONT and HiFi-based genome assemblies³⁶. The use of these workflows was not mandatory, and any assembly workflow would be accepted if the resulting assembly met the appropriate assembly quality³⁵. To streamline the assessment and validate all ERGA genomic references, we established a stepwise procedure of 1) QC metrics assessment, 2) internal peer review, and 3) manual curation. On completion of a draft assembly, each genome team reported a set of standard QC metrics³⁷ that include a contaminant assessment, K-mer metrics, Hi-C map and graph production, gene prediction analyses, and a set of summary statistics. After this, the assembly and the associated metrics underwent an internal round of peer review from assembly experts (ERGA Sequencing and Assembly Committee). After feedback integration, each genome team uploaded the pre-curation assembly to the internal ERGA data repository along with details of the assembly construction (https://gitlab.com/wtsi-grit/documentation/-/blob/main/yaml_format.md) and each team was provided with the opportunity to submit their reference genome to an internal panel of expert curators who conducted a final manual curation³⁸.

Due to the decentralised nature of the infrastructure, all 98 species progressed through the steps at different rates, depending on the number and complexity of permits (Supplementary Note 1), difficulty of sample collection (Supplementary Note 2), need for sample specific protocol development (Supplementary Note 3), partnering sequencing facility capacity, and assembly complexity. Figure 3a highlights the current status of each species that has an assembly generated and shows that 13 complete and curated reference genomes have been generated (11 of which can be found in the INSDC), a further 17 are complete but require curation, and 8 are in non-final draft stage.

From the 30 reference genome assemblies with a ‘Curated’ or ‘Pre-curation’ status, we found 14 cases where the assemblies do not meet the quality standard 6.C.Q40 EBP standard criteria (See Supplementary glossary and Fig. 4a). For instance, Argentina silus (fArgSil1) and Knipowitschia panizzae (fKniPan1) have scaffold N50 values that meet the minimum requirement, indicating successful Hi-C scaffolding, however both fall short in terms of contig contiguity (N50 < 1 Mbp). In addition, those two pre-curation assemblies contained many small scaffolds, which increased the total number and translated to higher values of Scaffold L95. Notably, Phaeosaccion multiseriatum (uoPhaMult1) meets the contig N50 but does not meet the scaffold N50 metric (N50 > 10 Mbp). In the cases of Spongipellis delectans (gfSpoDele1) and Phakellia ventilabrum (odPhaVent1), they reached a chromosomal scale N50 scaffolding (6.C.Q40), but not the N50 threshold used as a proxy in Fig. 4a (6.7.Q40), a minimum criteria set for vertebrates but that cannot be applied to taxa with chromosome length N50 less than 10 Mbp.

Fig. 4 — a Genome assemblies are represented according to their Scaffold N50 (y-axis, log₁₀) and number of the longest scaffolds that comprise at least 95% of the assembly (x-axis, log₂). Bubble size is proportional to assembly span. Empty bubbles depict HiFi-based genomes, while full bubbles are ONT-based. Colours are according to assembly status (Curated, Pre-curation, Non-final draft). Lower values for both axes indicate better assembly contiguity. Assemblies not reaching the EBP-recommended One Megabase Contig N50 (log₁₀1,000,000 = 6) or 10 Megabase Scaffold N50 (log₁₀10,000,000 = 7) here a proxy for chromosome-level scaffolds are labelled with their ToLIDs* (https://id.tol.sanger.ac.uk/). b Completed HiFi- and ONT-based genomes assemblies are represented according to their Quality value (QV, y-axis) and number of gaps per Gbp (log₁₀, x-axis). The bubble size is proportional to assembly size. Colour grade of the bubbles is according to the K-mer completeness score. ToLIDs are reported for the assemblies that are below the recommended EBP metric for QV (40), Gaps/Gbp (log₁₀1000 = 3) or K-mer completeness (90%). Quality values are calculated differently for HiFi-based assemblies than for ONT-based assemblies and should not be compared directly. c BUSCO completeness scores for genome assemblies with ‘Curated’ and ‘Pre-curation’ status. Using two orthologs databases, one for a more recent last common ancestor encompassing related species (blue), and one for all eukaryotes (grey), we seek a more comprehensive estimation of the assembly completeness. Number of single-copy orthologs present on each database is reported. *Briefly, a ToLID is a unique identifier for an individual organism within a species sampled for genome sequencing, consisting of one or two lowercase letters for high-level taxonomic rank and clade, respectively, followed by three letters for genus and species each. Thus, within insects (i), the Hemiptera (i) includes Andrena humilis (iyAndHumi1) and Osmia cornuta (iyOsmCorn1). The Coleoptera (c) contains Carabus granulatus (icCarGran1), C. intricatus (icCarIntr1), and Leptodirus hochenwarti (icLepHoch2). Ephemeroptera (e) features Epeorus assimilis (ieEpeAssi1), and among Strepsiptera (v) it is found Stylops ater (ivStyAter1). Lepidoptera (l) includes Coenonympha glycerion (ilCoeGlyc1), Helleia helle (ilHelHell1), and Parnassius mnemosyne (ilParMnem1). Within the fungi (g), Agaricomycetes (f) are represented by Spongipellis delectans (gfSpoDele1). For sponges (o), Demospongiae (d) includes Phakellia ventilabrum (odPhaVent1), and among algae (u), Heterokontophyta (o) are represented by Phaeosaccion multiseriatum (uoPhaMult1). The fishes (f) include Alburnus alburnus (fAlbAlb2), Ammodytes marinus (fAmmMar1), Anaecypris hispanica (fAnaHis1), Argentina silus (fArgSil1), Knipowitschia panizzae (fKniPan1), Perca sp.‘yellow fin Alpine’ (fPerYfa1), Salvelinus alpinus (fSalAlp1), Silurus aristotelis (fSilAri1), Solea solea (fSolSol8), Tripterygion tripteronotum (fTriTrp1), and Zingel asper (fZinAsp1). Birds (b) are represented by Haliaeetus albicilla (bHalAlb1), Oenanthe leucura (bOenLec1), and Tetrao urogallus (bTetUro2). Mammals (m) include Canis aureus (mCanAur2), Chionomys nivalis (mChiNiv1), Lepus granatensis (mLepGra1), Lepus europaeus (mLepEur2), and Mustela lutreola (mMusLut1). Among reptiles (r) is Vipera ursinii (rVipUrs1). Within dicotyledons (d), the Ericales (d) include Hottonia palustris (ddHotPalu1), and Rosales and Fabales (r) features Prunus brigantina (drPruBrig1) and Trifolium dubium (drTriDubi1), respectively. Finally, among ‘other chordates’ (k), Ascidiacea (a) includes Botryllus schlosseri (kaBotSchl2), while in the category ‘other animal phyla’ (t), Nematomorpha (f) is exemplified by Gordionus montsenyensis (tfGorSpeb1).

We found differences between HiFi- and ONT-based assemblies in the K-mer-based analyses, for example, the average quality value (QV) for HiFi-based assemblies was 61, while for ONT-based it was 38. From these ONT assemblies, five species showed values below the recommended 40, which corresponds to an error rate > 0.01% (Fig. 4b). It should be noted that in the case of ONT-based assemblies K-mers were derived from orthogonal Illumina reads from the same individual, whereas in the case of Hifi assemblies the K-mers were derived from the same data used to generate the genome assembly, likely inflating QV estimation due to data-interdependence. Further research is warranted on how to mitigate this issue. Recent unpublished results from within ERGA suggest that assembly of newer ONT data (Kit14, Q20+) consistently generates assemblies with QV > 40, perhaps side-stepping this issue. Eleven species showed K-mer completeness below 90%, with four being below 80% and one also lower than 70%. Out of these, six belonged to ONT-based assemblies while eight had curated status (Fig. 4b). A caveat to K-mer completeness is that pseudohaploid assemblies (the typical output of ONT-based assemblies) of heterozygous genomes tend to have lower K-mer completeness. This highlights the need for continued development of diploid assembly strategies to ensure high K-mer completeness.

Five genomes exceeded the recommended metric Gaps/Gbp²⁶ as they all had >1000 remaining (Argentina silus (fArgSil1), Knipowitschia panizzae (fKniPan1), Ammodytes marinus (fAmmMar1), Salvelinus alpinus (fSalAlp1) and Vipera ursinii rakosiensis (rVipUrs1)). Despite this, for all the completed assemblies, Ns accounted for less than 0.05% of the genome, with the exception of Mustela lutreola (mMusLut1). For this genome assembly, which has yet to undergo final curation (the only large ONT-based assembly evaluated >2 Gbp), 0.55% of its sequence was composed of Ns (Fig. 4b).

Besides EBP metrics, when estimating completeness using single-copy orthologs, Phakellia ventilabrum (odPhaVent1) and Gordionus montsenyensis (tfGorSpeb1) assemblies had lower values than recommended. tfGorSpeb1 is one of the first of its phylum to be sequenced³⁹, and so is therefore underrepresented in the BUSCO database (Fig. 4c)⁴⁰. Two species, Trifolium dubium (drTriDubi1) and the Salvelinus alpinus (fSalAlp1), both have higher ploidy levels (tetraploid and partial tetraploid, respectively) and had much higher BUSCO duplicate values than the recommended 5% (Supplementary Table 1).

For the pilot test, the sample collection process for the included species was ideally conducted to facilitate simultaneous genomic and transcriptomic data production. After data deposition to the ERGA data repository, we designed the infrastructure to have the flexibility necessary for each genome team to decide whether the annotation will be conducted i) by the genome team or sequencing facility, ii) with supporting expertise from the internal ERGA community, iii) or wait until the assembly and annotation data is uploaded to ENA where a gold standard annotation will be generated by Ensembl⁴¹. Although annotation was not mandatory, we produced sequencing data to support annotation data for 81 species (66 with RNA-Seq data, and 15 IsoSeq data). For those species with IsoSeq data generated, 13 also obtained RNA-Seq data. For the 30 genome teams spanning 16 countries that lacked the resources necessary to generate annotation data, we ensured that samples were shipped to a dedicated ERGA Library Preparation Hub. Here, 76 libraries were prepared and shipped to the ERGA Sequencing Hub for data production. In some groups annotation is still underway, but seven genome teams reported that they have a finalised annotation.

Data analysis

Reference genomes can support many downstream analyses, including population genomics, phylogenomics, functional genomics and comparative genomics⁹. Following the assembly and annotation of the newly-built reference genomes, we offered assistance through the ERGA Data Analysis Committee to genome teams by suggesting and supporting avenues of downstream data analyses that could be followed to answer their biological questions of interest. In addition, we connected genome teams with relevant ERGA members that may be able to assist or mentor downstream biological exploration, sparking new collaboration and working groups. As many of the 98 species participating had not yet reached the point of data analysis, we conducted a brief survey to better understand what downstream analysis was planned across the genome teams participating (Supplementary Fig. 5). For 59.8% of genome teams, the downstream analyses planned would not have been possible without the reference genome, and 70.7% reported that their planned analyses will be significantly improved by the availability of the reference genome, reinforcing that the biodiversity genomics community is in great need of genomic resources of this kind and quality. Results across the genome teams indicate that the most common type of downstream genomic analyses planned was population genomic based analyses (37.7%) for assessments of population history, structure and status of endangered and endemic species (e.g. demography, inbreeding, hybridization, and association with morphological or environmental factors). Comparative genomics was also a common analysis type across genome teams (27%) who seek to examine relevant evolutionary processes across species (e.g. trait-associated gene family evolution analysis, repeat content evolution, synteny, inversions, tRNA evolution). Overall, the results of this survey show that the availability of reference genomes are considered a key tool for downstream applications.

Upload to public archives

To follow the principles of Open Access to Scientific Publications and Research Data Guidelines of the European Research Council under Horizon 2020, ERGA adopted the data policy of ‘as open as possible but as closed as necessary’. To support this policy, we developed an ERGA Pilot Project Data Sharing and Management Policy²¹ specifically seeking to balance data openness with respecting the needs of diverse ERGA genome teams. The policy itself codified that all reference genome, annotation and raw sequence data was expected to be uploaded upon generation to the internal ERGA data repository, ensuring its immediate accessibility to the ERGA community. The policy also grants each genome team the ability to place an embargo on public upload of ERGA data into the public archives until the first publication but no longer than two years after data release. Laid clear in the policy is the provisions for fair and rightful attribution in all associated publications.

Decentralisation challenges

From the outset of the pilot test, we realised that the decentralised infrastructure built would have huge implications on who was included, had access to, and benefited from the production of genomic resources into the future. Collecting, identifying, storing, and cold-chain shipping of specimens as well as producing, analysing, and storing sequencing data is expensive, requiring ex-situ long-term storage facilities, sequencing equipment, laboratory access, a skilled workforce, and significant computational resources. The resources to create genomic resources are neither evenly distributed across the globe, nor across Europe. A key goal of the pilot test was to identify how the existing inequitable structures and systems would manifest whilst building a distributed genomic infrastructure. Intertwining and embedding justice, equity, diversity and inclusion into the scientific mission was considered essential if a decentralised, accessible, and scalable infrastructure was to be achieved that truly supported the production of complete reference genomics resources for all species, and was accessible to all researchers. Overall, the main objectives we set out for the decentralised infrastructure were achieved as it: i) supported the ethical and legal production of high quality genomic resources; ii) created a network of the researchers and institutions engaged in the field of biodiversity genomics; iii) leveraged the network’s existing institutional capacities and capabilities; and iv) harnessed the diverse expertise of the ERGA memberbase and streamlined, as much as possible, equitable participation. However, the decentralised approach also revealed a number of challenges that need to be addressed by ERGA moving forward.

Technical

Phylogenetic representativeness and sampling bias

Bias was found in the representation of countries (Fig. 2), distribution of species sampled per country (even when population size is considered (Supplementary Fig. 7) and species distribution across the phylogenetic tree. Generally, non-Widening countries were more strongly represented than Widening countries and certain branches of the tree of life were overrepresented (Mammalia, Aves, Actinopterygii and Magnoliopsida), whilst others (Insecta, Amphibia Mollusca, Annelida, Fungi and most protist groups) were underrepresented. Feasibility was another obstacle. First, the production of long-read and -range sequencing on a species sample requires a significant amount of HMW DNA per 1 Gb of genome size and so small-sized species or species with very large genomes remain an unsolved challenge (Supplementary Note 2). Second, for some taxa and species, co-purification of secondary compounds resulted in sequencing chemistry interferences. Finally, ideal tissue preservation was not always possible due to sampling at remote destinations or from scientific collections where samples were preserved a long time ago (Supplementary Note 6).

Moving forward, a more robust species prioritisation process could ensure that all species are assessed using clearly specified criteria with a scoring system that is responsive to the needs of both equity deserving countries (see Supplementary glossary) and underrepresented taxa. For example, species from higher taxonomic groups without reference genomes could be prioritised over those more resource abundant groups or Widening countries could be prioritised over non-Widening countries. A more robust species prioritisation process could also facilitate knowledge transfer and serve as a seed for national investments in biodiversity genomics. Tackling these challenges will require a greater investment in research and development as well as highly-skilled personnel, additionally researchers may need incentives to prioritise the interest of species or taxa that remain underrepresented in public databases.

Enhancing end-use through genome annotation

The first hurdle in annotation is the availability of sufficient evidence (transcriptomic and protein sequence data) from focal species, databases and predictive models of repeats. Secondly, even with appropriate data, the most accurate genome annotation pipelines require advanced skills to both install and run which reduces their accessibility and ultimately their utility. Finally, robust annotation quality assessment tools are lacking particularly for species with underrepresented genomic resources, for instance gene content assessment tools such as BUSCO⁴² remain unable to account for species within taxonomic groups that have incomplete gene sets available leading to unreliable quality assessments.

Obtaining, and equitably distributing, financial resources will be required to equip researchers, labs, and regions for annotation in a manner that responds to their varying resource realities. Additionally, the development of more easily installable and reproducible pipelines are needed, and thankfully some new tools are now emerging with this in mind⁴³. Standardised and streamlined annotation pipelines are needed for consistency which is crucial for many analyses such as comparative genomics as it can facilitate more confident comparisons. Finally, sequencing more underrepresented genomes will help improve quality assessment tools. Filling in phylogenetic gaps will provide more opportunities for comparisons among taxa but also to develop better models for gene predictions. Despite these challenges, it is important that genomes are annotated. Many downstream analyses are based solely on the predicted genes from the annotation, and incomplete or incorrect results will negatively impact studies of both short-term and broad evolutionary processes.

Decentralising reference production and reproducibility

During the pilot test the reference genomics resources were produced across diverse and transdisciplinary research groups, institutes and countries. This diversity resulted in variances in accessibility, capacity and capability in sequencing technologies, computation, and software but also across different taxa. The overrepresentation of pilot sequencing facility partners located in Western Europe compared to Eastern Europe demonstrates such disparity. Furthermore, the data agnostic approach taken led to challenges in standardising assembly, annotation and curation protocols, workflows and procedures across the project. For instance, a blanket adoption of the VGP pipeline for diploid genomes based on PacBio HiFi and Hi-C sequencing (https://gxy.io/GTN:T00039; https://workflowhub.eu/workflows/325?version=1) was not appropriate as this approach would not cater for polyploid genomes nor those assemblies produced that were ONT-based. A further challenge was the provision of a centralised system for the storage and transfer of raw and final genomic and transcriptomic data. This was particularly challenging in cases where data production spanned two or more locations (e.g. PacBio sequenced at one site, Hi-C at a second, and RNA at a third) and was subsequently assembled at another site. While the Nextcloud instance created by BSC was an elegant solution for transferring vast quantities of data between parties, it required a vast amount of personnel hours to manage, in addition to its baseline system-wide maintenance requirements.

Moving forward, a key goal for ERGA is the production of standardised and reusable pipelines that are: responsive to all sequencing ‘recipes’ (PacBio, ONT, or other future technologies); written for Galaxy, Snakemake, and/or Nextflow workflow managers; made publicly available (https://github.com/ERGA-consortium/pipelines); and are actively maintained by the ERGA community with regular scheduled and versioned updates. It would also be beneficial to diversify the availability of sequencing instruments to allow for more instances where sequencing and assembly can be produced concurrently at the same location, reducing the need for transferring files that can reach up to 1 TB in size.

Ethical and legal

ERGA is an international initiative and so safeguarding production of only ethical and legal reference genomes was a complex endeavour. Decentralisation of the infrastructure resulted in many species samples being transported across national and regional jurisdictions as well as in and out of the European Union, creating an ethical and legal compliance tribulation. Additionally, depending on the species in question, the legal landscape may differ drastically e.g. CBD⁴⁴, CITES⁴⁵, ITPGRFA⁴⁶, UNCLOS⁴⁷, etc. Understanding legislation can be complex and difficult, especially for researchers who do not have formal legal training, usually lack legal support within their institution, and often do not have the time or resources to acquire either. This created uncertainty amongst many researchers, especially those navigating this for the first time (Supplementary Note 1). To add to this uncertainty, the pilot test coincided with international discussions on the fair and equitable sharing of benefits from the access and use of digital sequence information (i.e. genomic sequences) under the Nagoya Protocol adding increased uncertainty surrounding the legal compliance landscape⁴⁸. Additionally, although researchers were supplied with documentation and infrastructural support to aid ethical and legal compliance, the pilot test had no means to monitor compliance.

Moving forward addressing the ethical, legal and social implications of ERGA will require professionalisation through a dedicated funding stream. Funded positions will attract trained personnel with the necessary experience needed to navigate complex permitting issues and compliance monitoring. Additionally, a greater effort needs to be made on training ERGA members on the importance of ethical and legal compliance in biodiversity genomics research.

Social justice

Building a more socially just infrastructure

Building a truly inclusive, diverse and equitable infrastructure for biodiversity genomics faces structural constraints. They are mainly twofold: first, lack of equity for and inclusion of minorities in science within the countries of Europe^18,49; second, extreme economic and political inequity between Europe and countries in the Global South⁵⁰. For the pilot test, no data was collected on race, ethnicity, religion, sexual orientation, disability, career level or the intersections of these with gender and with each other. This data deficiency made it impossible to critically evaluate the consortium in terms of inclusiveness. Some preliminary data generated in regard to sex suggested that by allowing genome teams to organically form it resulted in sex imbalances. Hence, there is a high likelihood that this also resulted in an underrepresentation of many other minoritised groups¹⁸, a known trend across European science¹⁸. The second constraint arises from a pressure to confine a biodiversity genomics consortium to the political boundary of Europe and the nation-states within. Europe, and the nations within it, are not naturally occurring units of biodiversity. In fact, Europe is part of a much wider biogeographical realm (the Palaearctic) that includes large parts of Africa and Asia⁵¹ (https://www.britannica.com/science/biogeographic-region).

As ERGA progresses, the consortium should prioritise the collection of applicable demographic data. Moreover, outreach activities should be conducted to explicitly recruit researchers from sectors of the population that are underrepresented in science. To really address the biodiversity crisis in a meaningful way, it will be important for ERGA to expand its reach globally. After all, most biodiversity by far resides not in Europe but in the Global South. Much commitment and ingenuity will be required to overcome the effects on biodiversity genomics of the equity gap that separates Europe as a block from many countries in the Global South. It will be a challenge to overcome the boundaries and constraints often dictated by scientific funding, but it is a challenge that must be overcome on the road towards a sustainable future. Through harnessing the power of its positioning in the EBP, ERGA should make efforts to become more integrated with other ongoing and related initiatives in neighbouring regions, e.g. Africa BioGenome Project⁵².

Prioritising engagement and outreach

Effective engagement is commonly seen as a constraint rather than an opportunity due to resource and time limitations, and a lack of training and awareness. Although a virtual workshop was provided during the pilot test to train researchers on 1) the significance of interested party engagement and 2) the skills to identify, map, and comprehend the needs of potential interested parties, it remained a challenge to transition researcher focus from reference genomes to the practical applications of genomics more broadly. Additionally, although the infrastructure was designed to recognise and include the rights and interests of Indigenous Peoples and Local Communities (TK and BC Labels and Notices, supporting guidelines for researcher implementation, and an ‘Open to Collaborate’ Notice on the ERGA website), researchers require more training on why and how to proactively engage and establish sustainable partnerships with Indigenous Peoples and Local Communities.

Overall, more training is needed for interested party identification, mapping, tailored engagement (varying interests and cultural perspectives), and communication. To address this, a comprehensive framework that encompasses targeted communication strategies, tailored dissemination channels, and proactive exploitation of research findings would be useful. This plan, if developed, could ensure that all interested parties receive timely and relevant information, fostering broader awareness, understanding, and utilisation of the results generated by biodiversity genomics research. Supporting ERGA members in this way could empower researchers to get more involved at the interface between biodiversity genomics research and biodiversity policy (Supplementary Note 4).

Scaling training and knowledge transfer

Financial resources are not equally distributed among countries, institutions or researchers, leading to limited access to crucial state-of-the-art training, resulting in significant disparities in terms of the expertise required to access and utilise these resources. Given the economic privilege that even the least wealthy EU countries have when compared to countries in the Global South, it is clear that access to funding for mobility is a huge barrier globally. Throughout the pilot test, several trainings were held and guidelines developed to enhance the user-friendliness of the infrastructure as well as to streamline its use; however, there was no clear long-term strategy for training and knowledge transfer.

To develop a genomics curriculum that is responsive to the needs of researchers and trainees, and promote the long-term building of capacity within these countries, an investment into a long-term strategy will be required. For instance, a publicly available knowledge transfer platform could be created to provide ERGA members with resources and training relating to each step of reference genome production, but could also provide links to complementary initiative resources e.g. EBP, Elixir (https://elixir-europe.org/), Galaxy (https://usegalaxy.org/), DSI Network (https://www.dsiscientificnetwork.org/), gBIKE (https://g-bikegenetics.eu/en), CETAF (https://cetaf.org/), etc. Such a platform could also provide a space for the sharing of relevant biodiversity genomics educational materials that could further aid collaborations between researchers who are shaping the future of biodiversity genomics curricula development globally.

Future directions

The decentralised approach taken by ERGA through the pilot test illustrates the huge potential of the consortium to become a model for equitable and inclusive biodiversity genomics in the future. The power of such an approach was evident through the momentum it built across its participants. Not only did the pilot test successfully unite an international community of biodiversity researchers, but it also stimulated communities of researchers within the same country to combine and consolidate efforts under the ERGA umbrella e.g. DeERGA and Portugal BioGenome⁵³. Additionally, it allowed participating researchers to apply the lessons learned from the test to build localised infrastructures that would remain interoperable with partners across Europe, e.g. ATLASea^40,54–59.

A key aim for testing the approach was making visible the challenges and issues that would manifest whilst working at an international level, and at scale and working to improve and build upon these learnings as the consortium moves forward. Some key challenges highlighted by the pilot test concerned: species selection processes (criteria, prioritisation) and sampling procedures (permitting, collection, preservation, metadata); modes of engagement across interested parties (citizen scientists, policymakers, Indigenous Peoples, Local Communities, etc); the diversity and inclusion of the researchers participating; defining the scope of ERGA and how that aligns with global efforts, particularly those containing the majority of the planets remaining biodiversity; disparities in resources and capacity (personnel, financial, and infrastructural); balancing decentralisation and innovation with standardisation, reproducibility and consistency; a need for more long-term and consistent training opportunities and disproportionate interest; and protocols, research and investment in species that are underrepresented in public data repositories.

As ERGA progresses, now with a dedicated funding stream through Biodiversity Genomics Europe, it can now build upon, learn and make the intentional investments needed to address at least some of these challenges. Although a centralised source of funding to support these endeavours is overall a positive it will also provide many challenges concerning diversity and equity, however, efforts are underway to safeguard at least some level of the decentralised process e.g. community sampling and hotspot sequencing.

Supplementary information

PilotAbstracts^{(35.9KB, docx)}

SupplementaryInformation^{(1.6MB, docx)}

Acknowledgements

This paper was subject to open peer review by PCI Genomics; see https://genomics.peercommunityin.org/articles/rec?id=298 and would like to acknowledge the PCI reviewers Dr. Justin Ideozu and Dr. Eric Crandall for the time they took to review this manuscript and provide such wonderful feedback. ERGA Infrastructure: We acknowledge access to the storage resources at Barcelona Supercomputing Center. We would like to thank Alisha Ahamed, Josephine Burgin, Joana Paupério, Jeena Rajan and Guy Cochrane from the European Nucleotide Archive (ENA) for their support regarding data coordination and submission. ERGA Hubs: We thank the Antwerp University Hospital Center of Medical Genetics and Jarl Bastianen for access to sequencing library quality control equipment. Commercial Partners: We would like to acknowledge and thank all supplier partners that have kindly donated kits, reagents to the ERGA pilot Library Preparation Hubs to support species without funding to produce the generation of high-quality genomes and annotations. This support has been key to embedding a culture of diversity, equity, inclusion, and justice in the Pilot Project. Specifically we want to thank Dovetail Genomics, Part of Cantata Bio LLC, especially Mark Daly, Thomas Swale and Lily Shuie; Arima Genomics; PacBio; Integrated DNA Technologies (IDT); MagBio Genomics Europe GmbH; Zymo Research; Agilent Technologies; Fisher Scientific Spain; Illumina Inc. ERGA sequencing partners: We would like to thank all the sequencing facilities involved in the project: i) the Integrative Omics platform of the Italian Node of ELIXIR, the European Research Infrastructure for Life Science data, Giovanna Longo for support in library preparation and Apollonia Tullo for the management of the Integrative Omics platform (Italy); ii) the France Génomique network; iii) the DFG-funded NGS Competence Center Tübingen; iv) SciLifeLab Genomics/National Genomics Infrastructure/Uppsala Genome Center and UPPMAX for aiding in High Molecular Weight DNA/RNA extraction, massive parallel sequencing and computational infrastructure; v) we would like to thank all the teams at Wellcome Sanger institute Tree of life programme, namely the ToL Samples management team, the ToL Core Laboratory, the support of Sanger Scientific Operations, especially the Long Read teams, the Tree of Life Assembly team and the Genome Reference Informatics Team, and the Tree of Life ERGA-Pilot project management team, especially Theodora Anderson.; vi) The Earlham Institute by members of the Genomics Pipelines and Core Bioinformatics Groups. This work was also supported by the Scientific Computing group, as well as support for the physical HPC infrastructure and data center delivered via the NBI Research Computing group; vii) the Functional Genomics Center Zurich (FGCZ) for library preparation and sequencing; viii) the NGSP team at the University of Bern, from DNA and RNA extraction to data generation on both long read and short-read platforms; ix) The Lausanne University Genomic Technologies Facility (GTF, Switzerland, https://wp.unil.ch/gtf/) for library preparation and sequencing; x) the DRESDEN concept genome Center, part of the MPI-CBG and the technology platform of the CMCB at the TU Dresden; xi) the DFG Research Infrastructure West German Genome Center. NGS analyses were carried out at the production sites Cologne, Bonn and Düsseldorf; xii) We would like to acknowledge the material support through the Oxford Nanopore Technologies ORG.one project in the execution of sequencing the Acipenser sturio genome at the University of Wageningen (Netherland) xiii) the Centro Nacional de Análisis Genómico CNAG; the Catalan Initiative of the Catalan Biogenome Project; xiv) Computational resources were provided by the HPC core facility CalcUA of the University of Antwerp and VSC (Flemish Supercomputer Center), and the Biomina network. ERGA Training and Knowledge Transfer partners: The 2022 EMBO Practical Course 'Hands-on course in genome sequencing, assembly and downstream analyses' at the Université libre de Bruxelles. Co-authors: L.R would like to thank the Naturhistorisches Museum Bern; D.D.P is funded by Biodiversity Genomics Europe (BGE); The tissue sample for the European mink has been taken from a male mink in Dordogne (France) in 2006 by Pascal Fournier and Christine Fournier-Chambrillon (GREGE, sample collectors), in the framework of the first National Action Plan for the conservation of the European mink, and sent to a collection of cryopreserved cells (managed by Vitaly Volobouev) at MNHN (National Museum of Natural History). This cryopreserved sample was used by Bertrand Bed'Hom (sample ambassador) to generate fresh cell cultures for the production of high-quality DNA by Genoscope, and the construction of a reference genome. The French institutional partners for the European mink conservation programme in France are DREAL (Direction régionale de l’environnement, de l’aménagement et du logement Nouvelle-Aquitaine) and OFB (Office Français de la Biodiversité). ERGA Community: We would like to thank: Svein-Ole Mikalsen and Sunnvør í Kongsstovu (University of Faroe Islands) for the providing Argentina silus and Ammodytes marinus to the pilot project as a of the project Genome Atlas of Faroese Ecology; Michel Sartori (Musée cantonal des sciences naturelles, Lausanne); Karim Gharbi, Suzanne Henderson, Kendall Baker, Tom Barker, Naomi Irish, Jamie McGowan, Will Nash, James Lipscombe, Angela Man, Alex Durrant, Mariano Olivera, Chris Watkins, Jonathan Wright, David Swarbreck, Neil Shearer, Sacha Lucchini, Thomas Brabbs, Vanda Knitlhoffer, Leah Catchpole, Fiona Fraser, Seanna McTaggart (Earlham Institute, Norwich, UK), W.N extracted high molecular weight DNA from Xylocopa violacea individuals, conducted assembly of the resulting PacBio HiFi data, coordinated generation of Hi-C data for this species, and manually curated the scaffolded assembly. LIMS support using SapioLIMS; Lada Jovović (Ruđer Bošković Institute, Croatia); Federica Montesanto and Francesco Mastrototaro (Università degli Studi di Bari 'A. Moro' Dipartimento di Bioscienze, Biotecnologie e Ambiente), for contributing to the design of the Pilot project on two Botryllus species; Jana Bedek (Ruđer Bošković Institute, Croatia); João Jacinto, Helena Trindade, Manuela Sim-Sim (cE3c—Centre for Ecology, Evolution and Environmental Changes, Faculdade de Ciências da Universidade de Lisboa & CHANGE—Global Change and Sustainability Institute, Portugal), M.S.S is Sample Ambassador for Corema album; Christian de Guttry, Julien Marquis (University of Lausanne, Switzerland); Native Flora Centre of Lombardy (Centro Flora Autoctona della Lombardia, CFA), c/o Parco Monte Barro, Galbiate (LC) Italy); Reichlin Pascal; Yannick Chittaro (info fauna); Ferran Palero (Catalan Biogenome Project); Sara Vicente and Cristina Máguas (Centre for Ecology, Evolution and Environmental Changes (cE3c) & CHANGE—Global Change and Sustainability Institute, Portugal; Escola Superior de Saúde Ribeiro Sanches (ERISA), IPLUSO—Instituto Politécnico da Lusofonia, Portugal); Maria Judite Alves (Museu Nacional de História Natural e da Ciência, Universidade de Lisboa); Christian Harrison, Dana R. MacGregor (Rothamsted Research) are working with the Earlham Institute to sequence, curate, and annotate an Alopecurus aequalis genome ; Patrik Rödin Mörch (Department of Ecology and Genetics—Animal Ecology, Uppsala University); Björn Marcus von Reumont (Goethe University Frankfurt, Institute of Cell Biology and Neuroscience, Applied Bioinformatics Group); Veronique Decroocq (University of Bordeaux, INRAE), Provider of data and samples of Prunus brigantine, Managing sequencing effort on European Prunus species; Lucia Manni, Università di Padova, Dipartimento di Biologia, for contributing to the design of the Pilot project on two Botryllus species; Evan G. Williams (Luxembourg Centre for Systems Biomedicine, University of Luxembourg); Julia Pawłowska (Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, Poland); Craig R Primmer (University of Helsinki); Alicja Okrasińska (Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw); Annamaria Giorgi (Centre of Applied Studies for the Sustainable Management and Protection of Mountain Areas-CRC Ge.S.Di.Mont, University of Milan, 25048 Edolo, Italy; Department of Agricultural and Environmental Sciences-Production, Landscape and Agroenergy-DiSAA, University of Milan, 20133 Milan, Italy); Simon Pierce (Department of Agricultural and Environmental Sciences, University of Milan); Shai Meiri (The Steinhardt Museum of Natural History, School of Zoology, Tel Aviv University); Virginia Vanni (Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, OX3 0BP, United Kingdom); M.M.R. was funded by FCT and AKDN through the project CVAgrobiodiversity/333111699 and (LEAF/ISA) UID/AGR/04129/2020; Akira Peters (Bezhin Rosko, Santec, France) isolated from nature and donated the strain of Phaeosaccion multiseriatum; We would finally like to thank Loriano Ballarin and Fabio Gasparini, Università di Padova, Dipartimento di Biologia, for contributing to the design of the Pilot project of two Botryllus species. We would also like to acknowledge and thank Shane Whelan for his efforts in abstract translation.

Author contributions

H.S., E.B.-A., K.N., S.D., S.O., N.C., and P.B. played a role in either conceiving or co-ordinating the Pilot only. A.E.R.S., S.W., P.N., T.M., Te.M., L.B., F.C., M.C., and C.G. contributed to the initial draft of the manuscript only. M.M., L.S., G.B.D., H.L., Z.M., L.L., M.L.G., S.C., J.P., M.G., M.N., M.L., B.K., J.P.M., B.F., G.Forn., N.P,. J.B., F.J., I.R.A., J.G.-G. provided comments on the draft manuscript only. J.S.-O., C.V., K.L., C.C., L.B., G.M., B.V., T.G., E.P., M.Mon., S.V., L.G., F.Mo., M.P., G.M., F.Ma., B.A.G., A.R., N.B., R.N., L.P.S., J.Bu., J.K., A.S., C.N., N.E., A.I., M.-B., F.A.M.V., C.R.F., R.R.R,. S.L.M., J.M.M., C.A., A.D.B., I.U., C.S.-S., R.T., V.C.S., R.M.C.R., S.D., M.M.R., G.R. and M.Mo. reviewed the draft manuscript only. K.E.H. designed figures only. G.V., J.R., M.H., E.M., M.B., O.M., J.G., J.P.P., J.V., M.F.L.D., S.G., S.M.V.B. and Z.O.J. contributed to the stakeholder engagement workshop only. M.A.D. drafted a supplementary figure/box and reviewed the draft manuscript. R.B., H.B., and A.C. contributed to both the stakeholder engagement workshop and reviewed the first draft of the manuscript only. H.K., L.O., F.C., and T.S.A. provided comments on the draft manuscript and contributed to the stakeholder engagement workshop. C.C., J.W., J.Pa., P.A.V.B., F.M., T.G., K.V.O., P.H.O. and M.Pa. provided comments on and reviewed the draft manuscript. A.N., M.B.-P., M.J.R.-L., and S.P. contributed to the initial draft of the draft manuscript and reviewed it afterward. B.I., C.B., B.N., J.-M.A., P.A.W. and P.V. conceived/coordinated the pilot and reviewed the draft manuscript. S.K., C.V.-C. and C.H. contributed to the initial draft and provided comments. M.St. conceived/coordinated the pilot project and contributed to the stakeholder engagement workshop. E.T. contributed to the initial draft of the manuscript and helped to a supplementary box/table. R.F., C.T.-R., N.G., N.V., O.S., P.R., U.S., K.H., J.Pi., T.H.S., J.M.-F., F.M., P.W.H., A.B.D., and A.Bo. contributed to the initial draft of the manuscript, provided comments and reviewed the final draft. contributed to the initial draft and provided comments. L.S.M. provided comments on the draft, designed figures and contributed to the stakeholder engagement workshop. P.C.W. provided comments on the draft, reviewed the final draft, drafted a supplementary figure/box and contributed to the stakeholder engagement workshop. A.L. contributed to the initial draft, provided comments on the draft, drafted a supplementary figure/box and contributed to the stakeholder engagement workshop. J.M.F. contributed to the initial draft, provided comments, and reviewed the final draft as well as drafted a supplementary figure/box. E.Bu. and A.M.P. contributed to the initial draft, provided comments on the draft, reviewed the draft, drafted a supplementary figure/box and contributed to the stakeholder engagement workshop. G.M.H. contributed to the initial draft, provided comments on the draft, reviewed the draft and helped to design figures. D.D.P. and H.G.L. contributed to the initial draft, provided comments on the draft, reviewed the draft, and drafted a supplementary figure/box. J.J.K. helped conceive/coordinate the pilot, drafted a supplementary figure/box and contributed to the stakeholder engagement workshop. P.G.D.F. and J.H. helped to coordinate/conceive the project, reviewed the draft and contributed to the stakeholder engagement workshop. J.D.C. helped to coordinate/conceive the project, provided comments on the draft and contributed to the stakeholder engagement workshop. C.J.M. conceived/coordinated the project, provided comments on the manuscript and drafted a supplementary figure/box. M.P.H. conceived/coordinated the project, provided comments on and reviewed the initial draft. A.V. conceived/coordinated the project, contributed to the initial draft and reviewed the final draft. B.H. conceived/coordinated the project, contributed to the initial draft and provided comments on the manuscript. L.R. conceived/coordinated the project, contributed to the initial draft, provided comments on and reviewed the final draft. R.M.W. conceived/coordinated the project, contributed to the initial draft, provided comments on and reviewed the final draft and drafted a supplementary figure/box. G.Form. and A.Mo. led the projects coordination, conceived/coordinated the project, contributed to the initial draft, provided comments on and and reviewed the final draft, and contributed to the stakeholder engagement workshop. A.M.C. led the projects coordination, conceived/coordinated the project, contributed to the initial draft, provided comments on and and reviewed the final draft, contributed to the stakeholder engagement workshop, drafted a supplementary figure/box and designed figures. All authors reviewed the manuscript prior to submission.

Data availability

Sequence data that support the findings of this study have been deposited in the European Nucleotide Archive with the primary accession code: PRJEB47820. Sequence data is also stored in an ERGA-Pilot Nextcloud instance hosted by Barcelona Supercomputer, if you would like to request access please email the corresponding author. The DOI for Supplementary Materials is DOI: 10.5281/zenodo.10789421. The DOI for the scripts/code related to our manuscript is DOI: 10.5281/zenodo.10789421.

Code availability

All code used to conduct analysis is openly available with unrestricted access through Zenodo (10.5281/zenodo.10789421).

Competing interests

Jean-François Flot, Rosa Fernández, Javier Del Campo, Josefa Gonzáles, Olga Vinnere Pettersson, Robert M Watherhouse, Patrick Wincker and Sylke Winkler are recommenders for PCI Genomics. The authors declare they have no further conflict of interest relating to the content of this article.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Ann M. Mc Cartney, Giulio Formenti, Alice Mouton.

Change history

10/15/2024

A Correction to this paper has been published: 10.1038/s44185-024-00065-3

Supplementary information

The online version contains supplementary material available at 10.1038/s44185-024-00054-6.

References

1.UNEP. Facts about the nature crisis. UNEP—UN Environment Programmehttps://www.unep.org/facts-about-nature-crisis (2022).
2.Zhang, Y., Wang, Z., Lu, Y. & Zuo, L. Editorial: biodiversity, ecosystem functions and services: Interrelationship with environmental and human health. Front. Ecol. Evol. 10, 10.3389/fevo.2022.1086408 (2022).
3.Urban, L. et al. Real-time genomics for One Health. Mol. Syst. Biol. 19, e11686 (2023). [DOI] [PMC free article] [PubMed]
4.Kumar, S. et al. Changes in land use enhance the sensitivity of tropical ecosystems to fire-climate extremes. Sci. Rep.12, 964 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.IUCN. The IUCN Red List of Threatened Species Version 2022-2. The IUCN Red List of Threatened Specieshttps://www.iucnredlist.org.
6.IPBES. Summary for policymakers of the global assessment report on biodiversity and ecosystem services. 10.5281/zenodo.3553579 (2019).
7.Boehm, M. M. A. & Cronk, Q. C. B. Dark extinction: the problem of unknown historical extinctions. Biol. Lett.17, 2021 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Supple, M. A. & Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biol.19, 131 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol.37, 197–202 (2022). [DOI] [PubMed] [Google Scholar]
10.Theissinger, K. et al. How genomics can help biodiversity conservation. Trends Genet. 39, 545–559(2023). [DOI] [PubMed]
11.Lewin, H. A. et al. Earth BioGenome Project: Sequencing life for the future of life. Proc. Natl Acad. Sci. Usa.115, 4325–4333 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Crandall, E. D. et al. Importance of timely metadata curation to the global surveillance of genetic diversity. Conserv. Biol. 37, e14061 (2023). [DOI] [PMC free article] [PubMed]
13.Samuel, S. & König-Ries, B. Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ9, e11140 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Buckner, J. C., Sanders, R. C., Faircloth, B. C. & Chakrabarty, P. The critical importance of vouchers in genomics. Elife10, e68264 (2021). [DOI] [PMC free article] [PubMed]
15.Sabot, F. On the importance of metadata when sharing and opening data. BMC Genom. Data23, 79 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Challis, R., Kumar, S., Sotero-Caio, C., Brown, M. & Blaxter, M. Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Res.8, 24 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Null, N. et al. Sequence locally, think globally: The Darwin Tree of Life Project. Proc. Natl Acad. Sci.119, e2115642118 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Boytchev, H. Diversity in German science: researchers push for missing ethnicity data. Nature616, 22–24 (2023). [DOI] [PubMed] [Google Scholar]
19.Stöck, M. et al. A brief review of vertebrate sex evolution with a pledge for integrative research: towards ‘sexomics’. Philos. Trans. R. Soc. Lond. B Biol. Sci.376, 20200426 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Böhne, A. et al. Contextualising samples: Supporting reference genomes for European biodiversity through sample and associated metadata collection. npjbiodiversity10.1038/s44185-024-00053-7 (2024). [DOI] [PMC free article] [PubMed]
21.Mc Cartney, A. M. et al. ERGA pilot project data sharing policy. 10.5281/ZENODO.8091290 (2021).
22.Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023). [DOI] [PMC free article] [PubMed]
23.Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat. Biotechnol.42, 367–370 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Mousseau, T. A. The biology of Chernobyl. Annu. Rev. Ecol. Evol. Syst.52, 87–109 (2021). [Google Scholar]
25.Mc Cartney, A. M. et al. Guidelines on the implementation of the Traditional Knowledge and Biocultural Labels and Notices in the European Reference Genome Atlas for biodiversity researchers. 10.5281/ZENODO.8088227 (2022).
26.Lawniczak, M. K. N. et al. Specimen and sample metadata standards for biodiversity genomics: a proposal from the Darwin Tree of Life project. Wellcome Open Res.7, 187 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Leonard, J. A. et al. ERGA Sample Manifest Standard of Practice. https://github.com/ERGA-consortium/ERGA-sample-manifest.
28.Riginos, C. et al. Building a global genomics observatory: Using GEOME (the Genomic Observatories Metadatabase) to expedite and improve deposition and retrieval of genetic data and metadata for biodiversity research. Mol. Ecol. Resour.20, 1458–1469 (2020). [DOI] [PubMed] [Google Scholar]
29.Liggins, L., Hudson, M. & Anderson, J. Creating space for Indigenous perspectives on access and benefit-sharing: encouraging researcher use of the Local Contexts Notices. Mol. Ecol.30, 2477–2482 (2021). [DOI] [PubMed] [Google Scholar]
30.Mc Cartney, A. M. et al. Indigenous peoples and local communities as partners in the sequencing of global eukaryotic biodiversity. NPJ Biodivers.2, 1–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Mc Cartney, A. M. et al. Balancing openness with Indigenous data sovereignty: an opportunity to leave no one behind in the journey to sequence all of life. Proc. Natl. Acad. Sci. USA. 119, e2115860119 (2022). [DOI] [PMC free article] [PubMed]
32.Shaw, F. et al. COPO: a metadata platform for brokering FAIR data in the life sciences. F1000Res.9, 495 (2020). [Google Scholar]
33.Formenti, G., Fernandéz, J. M. & McCartney, A. M. Data download from the ERGA Pilot repository. 10.5281/ZENODO.8091687 (2021).
34.Mc Cartney, A. M., Formenti, G. & Mouton, A. ERGA Pilot Project Official Guidelines. 10.5281/zenodo.8319754 (2023).
35.Lawniczak, M. K. N. et al. Standards recommendations for the Earth BioGenome Project. Proc. Natl. Acad. Sci. USA. 119, e2115639118 (2022). [DOI] [PMC free article] [PubMed]
36.Mc Cartney, A. M. et al. ERGA Pilot Project assembly recommendations. 10.5281/ZENODO.8088368 (2023).
37.Mc Cartney, A. M., Wood, J., Howe, K. & Formenti, G. ERGA Pilot Project post assembly quality control standards. 10.5281/ZENODO.8088393 (2022).
38.Howe, K. et al. Significantly improving the quality of genome assemblies through curation. Gigascience10, giaa153 (2021). [DOI] [PMC free article] [PubMed]
39.Cunha, T. J., de Medeiros, B. A. S., Lord, A., Sørensen, M. V. & Giribet, G. Rampant loss of universal metazoan genes revealed by a chromosome-level genome assembly of the parasitic Nematomorpha. Curr. Biol.33, 3514–3521.e4 (2023). [DOI] [PubMed] [Google Scholar]
40.Eleftheriadi, K. et al. The genome sequence of the Montseny horsehair worm, Gordionus montsenyensis sp. nov., a key resource to investigate Ecdysozoa evolution. Peer Community Journal, Volume 4, article no. e32. 10.24072/pcjournal.381 (2024).
41.Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res.50, D988–D995 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Gabriel, L. et al. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv 2023.06.10.544449 10.1101/2023.06.10.544449 (2023). [DOI] [PMC free article] [PubMed]
44.United Nations Environment Programme. Convention on Biological Diversity. (Environmental Law and Institutions Programme Activity Centre, 1992).
45.CITES, Text of the Convention on International Trade in Endangered Species of Wild Fauna and Flora: signed March 3, 1973, entered into force July 1, 1975. (U.S. Fish and Wildlife Service, Office of Management Authority, 1993).
46.International treaty on plant genetic resources for food and agriculture. Food and Agriculture Organisation (2004).
47.Bassiouni, M. C. Convention on the Law of the Sea, UN Doc. A/Conf. 62-122 & Corr. 1--8; 1833 UNTS 397 (10 Dec. 1982). in International Terrorism: Multilateral Conventions (1937–2001) 101–103 (Brill Nijhoff, 2001).
48.Scholz, A. H. et al. Multilateral benefit-sharing from digital sequence information will support both science and biodiversity conservation. Nat. Commun.13, 1086 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Tseng, M. et al. Strategies and support for Black, Indigenous, and people of colour in ecology and evolutionary biology. Nat. Ecol. Evol.4, 1288–1290 (2020). [DOI] [PubMed] [Google Scholar]
50.Hickel, J., Dorninger, C., Wieland, H. & Suwandi, I. Imperialist appropriation in the world economy: Drain from the global South through unequal exchange, 1990–2015. Glob. Environ. Change73, 102467 (2022). [Google Scholar]
51.Holt, B. G. et al. An update of Wallace’s zoogeographic regions of the world. Science339, 74–78 (2013). [DOI] [PubMed] [Google Scholar]
52.Ebenezer, T. E. et al. Africa: sequence 100,000 species to safeguard biodiversity. Nature603, 388–392 (2022). [DOI] [PubMed] [Google Scholar]
53.Marques, J. P. et al. Building a Portuguese Coalition for Biodiversity Genomics. (2023). [DOI] [PMC free article] [PubMed]
54.Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data3, 160018 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Carroll, S. R. et al. The CARE principles for indigenous data governance. Data Sci. J. 19, (2020). [DOI] [PubMed]
56.Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol.4, 265–270 (2009). [DOI] [PubMed] [Google Scholar]
57.Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods12, 733–735 (2015). [DOI] [PubMed] [Google Scholar]
58.Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol.37, 1155–1162 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.10, 57–63 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Mazzoni, C. J., Ciofi, C. & Waterhouse, R. M. Biodiversity: an atlas of European reference genomes. Nature619, 252 (2023). [DOI] [PubMed] [Google Scholar]
61.Capella-Gutierrez, S. et al. ECCB2022: the 21st European Conference on Computational Biology. Bioinformatics38, ii1–ii4 (2022). [DOI] [PubMed] [Google Scholar]
62.Boekhout, T. et al. Trends in yeast diversity discovery. Fungal Divers114, 491–537 (2022). [Google Scholar]
63.Medina-Córdova, N. et al. Biocontrol activity of the marine yeast Debaryomyces hansenii against phytopathogenic fungi and its ability to inhibit mycotoxins production in maize grain (Zea mays L.). Biol. Control97, 70–79 (2016). [Google Scholar]
64.Lourenço, J., Mendo, S. & Pereira, R. Radioactively contaminated areas: Bioindicator species and biomarkers of effect in an early warning scheme for a preliminary risk assessment. J. Hazard. Mater.317, 503–542 (2016). [DOI] [PubMed] [Google Scholar]
65.Kesäniemi, J. et al. Exposure to environmental radionuclides associates with tissue-specific impacts on telomerase expression and telomere length. Sci. Rep.9, 850 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Hardoim, P. R. et al. The hidden world within plants: ecological and evolutionary considerations for defining functioning of microbial endophytes. Microbiol. Mol. Biol. Rev.79, 293–320 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol.37, 540–546 (2019). [DOI] [PubMed] [Google Scholar]
68.Hoff, K. J., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-genome annotation with BRAKER. Methods Mol. Biol.1962, 65–95 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

PilotAbstracts^{(35.9KB, docx)}

SupplementaryInformation^{(1.6MB, docx)}

Data Availability Statement

All code used to conduct analysis is openly available with unrestricted access through Zenodo (10.5281/zenodo.10789421).

[CR1] 1.UNEP. Facts about the nature crisis. UNEP—UN Environment Programmehttps://www.unep.org/facts-about-nature-crisis (2022).

[CR2] 2.Zhang, Y., Wang, Z., Lu, Y. & Zuo, L. Editorial: biodiversity, ecosystem functions and services: Interrelationship with environmental and human health. Front. Ecol. Evol. 10, 10.3389/fevo.2022.1086408 (2022).

[CR3] 3.Urban, L. et al. Real-time genomics for One Health. Mol. Syst. Biol. 19, e11686 (2023). [DOI] [PMC free article] [PubMed]

[CR4] 4.Kumar, S. et al. Changes in land use enhance the sensitivity of tropical ecosystems to fire-climate extremes. Sci. Rep.12, 964 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.IUCN. The IUCN Red List of Threatened Species Version 2022-2. The IUCN Red List of Threatened Specieshttps://www.iucnredlist.org.

[CR6] 6.IPBES. Summary for policymakers of the global assessment report on biodiversity and ecosystem services. 10.5281/zenodo.3553579 (2019).

[CR7] 7.Boehm, M. M. A. & Cronk, Q. C. B. Dark extinction: the problem of unknown historical extinctions. Biol. Lett.17, 2021 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Supple, M. A. & Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biol.19, 131 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol.37, 197–202 (2022). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Theissinger, K. et al. How genomics can help biodiversity conservation. Trends Genet. 39, 545–559(2023). [DOI] [PubMed]

[CR11] 11.Lewin, H. A. et al. Earth BioGenome Project: Sequencing life for the future of life. Proc. Natl Acad. Sci. Usa.115, 4325–4333 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Crandall, E. D. et al. Importance of timely metadata curation to the global surveillance of genetic diversity. Conserv. Biol. 37, e14061 (2023). [DOI] [PMC free article] [PubMed]

[CR13] 13.Samuel, S. & König-Ries, B. Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ9, e11140 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Buckner, J. C., Sanders, R. C., Faircloth, B. C. & Chakrabarty, P. The critical importance of vouchers in genomics. Elife10, e68264 (2021). [DOI] [PMC free article] [PubMed]

[CR15] 15.Sabot, F. On the importance of metadata when sharing and opening data. BMC Genom. Data23, 79 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Challis, R., Kumar, S., Sotero-Caio, C., Brown, M. & Blaxter, M. Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Res.8, 24 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Null, N. et al. Sequence locally, think globally: The Darwin Tree of Life Project. Proc. Natl Acad. Sci.119, e2115642118 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Boytchev, H. Diversity in German science: researchers push for missing ethnicity data. Nature616, 22–24 (2023). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Stöck, M. et al. A brief review of vertebrate sex evolution with a pledge for integrative research: towards ‘sexomics’. Philos. Trans. R. Soc. Lond. B Biol. Sci.376, 20200426 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Böhne, A. et al. Contextualising samples: Supporting reference genomes for European biodiversity through sample and associated metadata collection. npjbiodiversity10.1038/s44185-024-00053-7 (2024). [DOI] [PMC free article] [PubMed]

[CR21] 21.Mc Cartney, A. M. et al. ERGA pilot project data sharing policy. 10.5281/ZENODO.8091290 (2021).

[CR22] 22.Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023). [DOI] [PMC free article] [PubMed]

[CR23] 23.Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat. Biotechnol.42, 367–370 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Mousseau, T. A. The biology of Chernobyl. Annu. Rev. Ecol. Evol. Syst.52, 87–109 (2021). [Google Scholar]

[CR25] 25.Mc Cartney, A. M. et al. Guidelines on the implementation of the Traditional Knowledge and Biocultural Labels and Notices in the European Reference Genome Atlas for biodiversity researchers. 10.5281/ZENODO.8088227 (2022).

[CR26] 26.Lawniczak, M. K. N. et al. Specimen and sample metadata standards for biodiversity genomics: a proposal from the Darwin Tree of Life project. Wellcome Open Res.7, 187 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Leonard, J. A. et al. ERGA Sample Manifest Standard of Practice. https://github.com/ERGA-consortium/ERGA-sample-manifest.

[CR28] 28.Riginos, C. et al. Building a global genomics observatory: Using GEOME (the Genomic Observatories Metadatabase) to expedite and improve deposition and retrieval of genetic data and metadata for biodiversity research. Mol. Ecol. Resour.20, 1458–1469 (2020). [DOI] [PubMed] [Google Scholar]

[CR29] 29.Liggins, L., Hudson, M. & Anderson, J. Creating space for Indigenous perspectives on access and benefit-sharing: encouraging researcher use of the Local Contexts Notices. Mol. Ecol.30, 2477–2482 (2021). [DOI] [PubMed] [Google Scholar]

[CR30] 30.Mc Cartney, A. M. et al. Indigenous peoples and local communities as partners in the sequencing of global eukaryotic biodiversity. NPJ Biodivers.2, 1–12 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Mc Cartney, A. M. et al. Balancing openness with Indigenous data sovereignty: an opportunity to leave no one behind in the journey to sequence all of life. Proc. Natl. Acad. Sci. USA. 119, e2115860119 (2022). [DOI] [PMC free article] [PubMed]

[CR32] 32.Shaw, F. et al. COPO: a metadata platform for brokering FAIR data in the life sciences. F1000Res.9, 495 (2020). [Google Scholar]

[CR33] 33.Formenti, G., Fernandéz, J. M. & McCartney, A. M. Data download from the ERGA Pilot repository. 10.5281/ZENODO.8091687 (2021).

[CR34] 34.Mc Cartney, A. M., Formenti, G. & Mouton, A. ERGA Pilot Project Official Guidelines. 10.5281/zenodo.8319754 (2023).

[CR35] 35.Lawniczak, M. K. N. et al. Standards recommendations for the Earth BioGenome Project. Proc. Natl. Acad. Sci. USA. 119, e2115639118 (2022). [DOI] [PMC free article] [PubMed]

[CR36] 36.Mc Cartney, A. M. et al. ERGA Pilot Project assembly recommendations. 10.5281/ZENODO.8088368 (2023).

[CR37] 37.Mc Cartney, A. M., Wood, J., Howe, K. & Formenti, G. ERGA Pilot Project post assembly quality control standards. 10.5281/ZENODO.8088393 (2022).

[CR38] 38.Howe, K. et al. Significantly improving the quality of genome assemblies through curation. Gigascience10, giaa153 (2021). [DOI] [PMC free article] [PubMed]

[CR39] 39.Cunha, T. J., de Medeiros, B. A. S., Lord, A., Sørensen, M. V. & Giribet, G. Rampant loss of universal metazoan genes revealed by a chromosome-level genome assembly of the parasitic Nematomorpha. Curr. Biol.33, 3514–3521.e4 (2023). [DOI] [PubMed] [Google Scholar]

[CR40] 40.Eleftheriadi, K. et al. The genome sequence of the Montseny horsehair worm, Gordionus montsenyensis sp. nov., a key resource to investigate Ecdysozoa evolution. Peer Community Journal, Volume 4, article no. e32. 10.24072/pcjournal.381 (2024).

[CR41] 41.Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res.50, D988–D995 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol.38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Gabriel, L. et al. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv 2023.06.10.544449 10.1101/2023.06.10.544449 (2023). [DOI] [PMC free article] [PubMed]

[CR44] 44.United Nations Environment Programme. Convention on Biological Diversity. (Environmental Law and Institutions Programme Activity Centre, 1992).

[CR45] 45.CITES, Text of the Convention on International Trade in Endangered Species of Wild Fauna and Flora: signed March 3, 1973, entered into force July 1, 1975. (U.S. Fish and Wildlife Service, Office of Management Authority, 1993).

[CR46] 46.International treaty on plant genetic resources for food and agriculture. Food and Agriculture Organisation (2004).

[CR47] 47.Bassiouni, M. C. Convention on the Law of the Sea, UN Doc. A/Conf. 62-122 & Corr. 1--8; 1833 UNTS 397 (10 Dec. 1982). in International Terrorism: Multilateral Conventions (1937–2001) 101–103 (Brill Nijhoff, 2001).

[CR48] 48.Scholz, A. H. et al. Multilateral benefit-sharing from digital sequence information will support both science and biodiversity conservation. Nat. Commun.13, 1086 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Tseng, M. et al. Strategies and support for Black, Indigenous, and people of colour in ecology and evolutionary biology. Nat. Ecol. Evol.4, 1288–1290 (2020). [DOI] [PubMed] [Google Scholar]

[CR50] 50.Hickel, J., Dorninger, C., Wieland, H. & Suwandi, I. Imperialist appropriation in the world economy: Drain from the global South through unequal exchange, 1990–2015. Glob. Environ. Change73, 102467 (2022). [Google Scholar]

[CR51] 51.Holt, B. G. et al. An update of Wallace’s zoogeographic regions of the world. Science339, 74–78 (2013). [DOI] [PubMed] [Google Scholar]

[CR52] 52.Ebenezer, T. E. et al. Africa: sequence 100,000 species to safeguard biodiversity. Nature603, 388–392 (2022). [DOI] [PubMed] [Google Scholar]

[CR53] 53.Marques, J. P. et al. Building a Portuguese Coalition for Biodiversity Genomics. (2023). [DOI] [PMC free article] [PubMed]

[CR54] 54.Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data3, 160018 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Carroll, S. R. et al. The CARE principles for indigenous data governance. Data Sci. J. 19, (2020). [DOI] [PubMed]

[CR56] 56.Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol.4, 265–270 (2009). [DOI] [PubMed] [Google Scholar]

[CR57] 57.Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods12, 733–735 (2015). [DOI] [PubMed] [Google Scholar]

[CR58] 58.Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol.37, 1155–1162 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.10, 57–63 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Mazzoni, C. J., Ciofi, C. & Waterhouse, R. M. Biodiversity: an atlas of European reference genomes. Nature619, 252 (2023). [DOI] [PubMed] [Google Scholar]

[CR61] 61.Capella-Gutierrez, S. et al. ECCB2022: the 21st European Conference on Computational Biology. Bioinformatics38, ii1–ii4 (2022). [DOI] [PubMed] [Google Scholar]

[CR62] 62.Boekhout, T. et al. Trends in yeast diversity discovery. Fungal Divers114, 491–537 (2022). [Google Scholar]

[CR63] 63.Medina-Córdova, N. et al. Biocontrol activity of the marine yeast Debaryomyces hansenii against phytopathogenic fungi and its ability to inhibit mycotoxins production in maize grain (Zea mays L.). Biol. Control97, 70–79 (2016). [Google Scholar]

[CR64] 64.Lourenço, J., Mendo, S. & Pereira, R. Radioactively contaminated areas: Bioindicator species and biomarkers of effect in an early warning scheme for a preliminary risk assessment. J. Hazard. Mater.317, 503–542 (2016). [DOI] [PubMed] [Google Scholar]

[CR65] 65.Kesäniemi, J. et al. Exposure to environmental radionuclides associates with tissue-specific impacts on telomerase expression and telomere length. Sci. Rep.9, 850 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Hardoim, P. R. et al. The hidden world within plants: ecological and evolutionary considerations for defining functioning of microbial endophytes. Microbiol. Mol. Biol. Rev.79, 293–320 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol.37, 540–546 (2019). [DOI] [PubMed] [Google Scholar]

[CR68] 68.Hoff, K. J., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-genome annotation with BRAKER. Methods Mol. Biol.1962, 65–95 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics

Ann M Mc Cartney

Giulio Formenti

Alice Mouton

Diego De Panis

Luísa S Marins

Henrique G Leitão

Genevieve Diedericks

Joseph Kirangwa

Marco Morselli

Judit Salces-Ortiz

Nuria Escudero

Alessio Iannucci

Chiara Natali

Hannes Svardal

Rosa Fernández

Tim De Pooter

Geert Joris

Mojca Strazisar

Jonathan M D Wood

Katie E Herron

Ole Seehausen

Phillip C Watts

Felix Shaw

Robert P Davey

Alice Minotto

José M Fernández

Astrid Böhne

Carla Alegria

Tyler Alioto

Paulo C Alves

Isabel R Amorim

Jean-Marc Aury

Niclas Backstrom

Petr Baldrian

Laima Baltrunaite

Endre Barta

Bertrand BedHom

Caroline Belser

Johannes Bergsten

Laurie Bertrand

Helena Bilandija

Mahesh Binzer-Panchal

Iliana Bista

Mark Blaxter

Paulo A V Borges

Guilherme Borges Dias

Mirte Bosse

Tom Brown

Rémy Bruggmann

Elena Buena-Atienza

Josephine Burgin

Elena Buzan

Alessia Cariani

Nicolas Casadei

Matteo Chiara

Sergio Chozas

Fedor Čiampor Jr

Angelica Crottini

Corinne Cruaud

Fernando Cruz

Love Dalen

Alessio De Biase

Javier del Campo

Teo Delic

Alice B Dennis

Martijn F L Derks

Maria Angela Diroma

Mihajla Djan

Simone Duprat

Klara Eleftheriadi

Philine G D Feulner

Jean-François Flot

Giobbe Forni

Bruno Fosso

Pascal Fournier

Christine Fournier-Chambrillon

Toni Gabaldon

Shilpa Garg