Abstract
We describe a new national organisation in scientific research that facilitates life scientists with technologies and technological expertise in an era where new projects often are data-intensive, multi-disciplinary, and multi-site. The Dutch Techcentre for Life Sciences (DTL, www.dtls.nl) is run as a lean not-for-profit organisation of which research organisations (both academic and industrial) are paying members. The small staff of the organisation undertakes a variety of tasks that are necessary to perform or support modern academic research, but that are not easily undertaken in a purely academic setting. DTL also represents the Netherlands in the ELIXIR ESFRI, and the office supports this task. The organisation is still being fine-tuned and this will probably continue over time, as it is crucial for this kind of organisation to adapt to a constantly changing environment. However, already being underway for several years on the path to professionalisation, our experiences can benefit researchers in other fields or other countries setting up similar initiatives.
Keywords: Data, technologies, initiative
Introduction
In this introduction we will explain the origin of DTL, the change in Dutch funding of life science technology that led to the start of the DTL foundation, and how the efforts of the DTL Data programme fit in the parallel development of professional data stewardship and knowledge structuring initiatives in science overall.
Timeline
During the preparatory phase of the ELIXIR ESFRI ( elixir-europe.org), in 2012, several high profile bioinformatics and systems biology representatives started an initiative called DISC, The Data Integration and Stewardship Centre. They met several times to discuss the implementation of ELIXIR in the Netherlands. In parallel to this, the initiative to establish the Dutch Techcentre for Life Sciences was launched on the 31st of October 2012. The DTL organisation was started as a platform of leading universities, research institutes, university medical centres, science funders, government funding sectors (‘topsectoren’ in the Netherlands) and private companies from the health, nutrition, agrigenomics and industrial microbiology and information engineering sectors. We soon discovered that there was a significant overlap in the goals of the two initiatives, and it was decided to merge DISC into DTL as its Data programme. Starting from the 1st of January 2014, organisations have been signing up for formal membership of DTL.
Why was DTL started?
The initiative for DTL was based on the growing data challenge as well as the changing funding landscape in the Netherlands. From 2003 to 2013 significant funding in the Netherlands went to institutes developing technical services and techniques for life sciences. For example, the Netherlands Bioinformatics Centre (NBIC) operated between 2004 and 2014 as a nation-wide initiative of bioinformatics experts in academia and industry. These institutes were expected to foster technology research, drive the exchange of methodology among labs and translate these into technical services that other scientists could use. Around 2012 it became clear that in the future the development and use of these technologies would no longer receive similar direct funding, and that research projects that apply a technology would need to budget for that. Rather than letting the previous investment in the technological institutes go to waste and let life scientists each replicate the experience at their own institutes, the technology institutes decided to develop a work form in which they would continue to exchange expertise across technology disciplines, build up a collective and well-accessible research infrastructure (RI) and deliver the services required. This has led to the formation of DTL.
Next to the historical perspective, there are also forward-looking reasons for the start of the DTL organisation:
Technology programmes working together in a single organisation enable the application of what we call integrated life science research requiring the use of multiple technologies in a single research project, and the integration of generated and already available data.
Members of DTL can collaboratively draw attention to the fact that the fundamental developments in the technology fields require more attention of both the collaborating research organisations as well as the national funding agencies. Together we can look for solutions to tackle these challenges.
Establishing a collective technology platform of the major research organisations in the Netherlands provides further chances to establish international partnerships for individual member organisations or as a collective.
The organisational structure of DTL comprises three main programmes, as described below.
Governance and organisational programmes
DTL is governed by a board that is advised by a scientific advisory committee, and operations are monitored by a board of representatives from the partner institutes. DTL has organised its actions in three areas, Data, Technologies and Learning, which run as individual but cross-connected programmes within the organisation.
DTL Learning, to start with the third area, manages an inventory of all training needs and offerings in life science technologies. It forms the bridge to the national Research School on Bioinformatics and Systems Biology (BioSB, biosb.nl) and other related research schools, and maintains contacts with all academic institutes that offer bioinformatics bachelor and master programs or postdoctoral training. DTL Learning also bundles expertise available in the DTL network, and organises both ad hoc and repeated training and courses on diverse subjects related to developments in the Data and Technologies programmes.
DTL Technologies bundles more than 100 research labs that offer support to life scientists with different technologies (so called technology hotels). These technology hotels include a wide coverage of a variety of experimental (e.g. next generation sequencing, proteomics, metabolomics, bioimaging) technologies as well as bioinformatics and systems biology expertise. DTL Technologies facilitates the contact between the technology hotels and external researchers as potential customers e.g. through the organisation of funding calls that encourage new collaborative projects. In the DTL Technologies programme we will also work on harmonising and optimising access to hotels to make it easier for life scientists to use the latest technological opportunities and access multiple facilities in parallel.
DTL Data brings together experts on every aspect of data stewardship, tools and databases, and e-Infrastructure. DTL Data builds relations for the people involved in the other DTL programs and partner organisations and connects to international initiatives such as ELIXIR, the pan-European life science research data infrastructure. The setup of DTL Data has gone hand in hand with more generic developments related to data and knowledge handling in the life sciences that we will address first.
Parallel developments: data stewardship and knowledge structuring
The rise and wide application of modern data-intensive technological approaches in the life sciences has led to pressure on funders to provide support to keep acquired data around for longer than a project lasts. As such, initially in the US, and later in Europe, funding agencies have started demanding data stewardship to be an integral part of all scientific research projects. This is important because present-day research projects collect much data that intrinsically has more value than the first project will extract. Acquiring such data a second time is unnecessarily expensive, and this makes data stewardship a good investment. Furthermore, good data stewardship is required to make the work reproducible. In addition, proper structuring of knowledge sources that represent the aggregated and possibly curated findings of the body of previous research is of equal importance to fully enable integrative research.
DTL facilitates data stewardship and knowledge structuring in all associated projects through participation in the development and deployment of the FAIR initiative. The FAIR acronym stands for Findable, Accessible, Interoperable, and Reusable ( datafairport.org). To allow data and knowledge sources to be findable and accessible by both humans and computer systems requires a standardised description of metadata and study capturing as well as long-term storage and proper licensing. Interoperability and reusability require the representation of data and knowledge in such a way that they can be easily combined and used for further analytical processing 1.
To support practical implementation of good data stewardship, DTL and its Data programme are on a mission to bring together all experts that can help life scientists with different aspects of their data management, and to show life scientists that it is not efficient to do everything in house using local solutions.
The remainder of this paper describes the organisational structure and approaches of the DTL Data programme in more detail.
Content of the DTL Data programme
DTL-associated scientists and engineers are responsible for data integration and stewardship in various life science initiatives in different life science sectors. They bring expertise, reusable tools and databases that have been developed in the Netherlands or elsewhere, and have access to a shared e-infrastructure.
Bioinformatics and medical informatics expertise
DTL brings together experts with a very diverse professional expertise in life science data management. This expertise is classified along four independent dimensions:
The life science sector: current activities are in health, agri/food, nutrition, and industrial biotechnology.
Location: even though the Netherlands is a relatively small country, a local expert is sometimes preferred for an advice or in a collaboration.
Phase in the data lifecycle: we distinguish expertise in planning an experiment, collecting data, data processing, data analysis, data and knowledge integration, and modelling. There is also underlying expertise in biostatistics, systems biology, instrumentation, data security, computing infrastructure, and computer science approaches.
Technical discipline and type of data: e.g. genomics, proteomics, metabolomics, bioimaging, biobanking, knowledge representation.
All expertise can be classified along those four dimensions. To make all of this available to life scientists everywhere, we are working on setting up a network of local expert centres at different sites. Such expert centres can function as help desks: places where information can be obtained about the expertise available locally as well as elsewhere. Representatives of the expert centres are involved in frequent contact with each other to learn about new developments and learn of each other’s experience (both in techniques and in organisation). Over time, DTL will also extend its own help desk that can guide people to the right expert centres.
A very important mission of DTL Data is to prevent projects from running into problems because of unconscious incompetence; we want to facilitate early interaction between life scientists with a specific plan and experts in all the technical fields that they need to engage, to avoid underestimating technological tasks or risks.
Tools & databases
Many of the experts collaborating in the DTL Data programme have (co)developed reusable tools and databases. For such tools there is ample experience to implement their use in different projects. Such tools can often be reused by a new project in an existing shared deployment with dedicated help for users. In other cases, specialised installations of the software can be made, tailored to the project. DTL has a strong preference for reuse of existing tools, which have proven their value in earlier national or international projects. Advantages of such tools are that they have overcome their teething problems, that their continued development benefits multiple projects, and that the reuse increases interoperability with other tools and existing data.
e-Infrastructure
In the past, many life science labs have each been taking care of their own needs for computing. More and more, however, the need for data processing becomes too large to handle. Furthermore, server system maintenance is not a core competence of a life scientist, and keeping a local cluster running should not be the task of a PhD candidate. Computing and data storage are becoming an infrastructure: equipment that nobody can do without, and which is inefficient to duplicate for every project. Many groups are therefore no longer willing to maintain the needed infrastructures themselves, and set up institutional services together employing specialised people for maintaining the computing equipment. Additional benefits of such centralisation efforts are flattening-off peak demands and allowing individual projects to be run at relatively short notice. Also, it reduces the need for synchronising new equipment purchase with the start of new projects, which without central facilities results in waste for short projects and the use of outdated computing resources for longer projects. DTL brings experience from centralisation efforts together, and ensures alignment with the national centres for computing. Together, these people work on harmonising the computer centres so that migration of computing work and federation of resources become easier. When a new data intensive life science project is started with new demands for computing or storage, the best solution for the location of such computing is found in collaboration.
The e-Infrastructure that can be shared is not limited to the computer racks (Infrastructure as a Service, IaaS). We also investigate possibilities for sharing higher level platforms (Platforms as a Service, PaaS), for example the workflow supporting software Galaxy 2, which has been supported by the Netherlands bioinformatics centre in the past, and potentially other shared infrastructures for systems biology. We are also working together on a shared data publishing infrastructure based on experience from the Open PHACTS project 3.
Organisation of the DTL Data programme
Organisational structure and facilitation
The DTL Data programme is coordinated by a programme manager from the DTL Office. All projects are executed by DTL partners, outside of the office. The primary organisation of DTL Data is per sector of life science research ( Figure 1). We organise several kinds of meetings for different target groups, which we have identified as fulfilling an urgent need: project leader meetings, programmer meetings and so-called focus meetings. We also identify people with similar interest and facilitate interest groups and working groups with their own meetings. Each of these types of events will be described in more detail.
Project leader meetings
Within each of the life science Sectors, DTL Data brings project leaders together who are each functionally coordinating the progress in a particular project.
For the healthcare sector, this is a continuation of a weekly project leader meeting that has been running since 2009, and involves 10 project leaders meeting 60 minutes every week. These meetings are conducted as teleconferences where the participants collaboratively edit the meeting notes. This style of focused reporting of what has been accomplished and what is planned builds trust between the project leaders and leads to many accidental discoveries of potential synergy between their projects. This results in cost savings for the projects and does not stand in the way of healthy competition. These meetings also provide a direct connection to TraIT, the IT project for the Dutch translational medicine project CTMM.
The other sectors (agrigenomics, nutrigenomics and industrial biotechnology) are now in the process of setting up similar meetings. The principle project leaders who will be leading these meetings have been identified. These four principal project leaders will be meeting together on a monthly basis to discuss progress and to identify synergies between the sectors.
Programmers meetings
Many of the programmers involved in the bioinformatics projects in the different sectors of DTL Data are so-called embedded programmers, often the only bioinformatician in a biology or medical setting. Others work together in groups. In DTL Data, we call programmers from both settings together every two months for lectures and workshops on topics ranging from programming techniques to biological applications. Sometimes we invite external speakers, but most topics are presented by members of the group. This way they keep each other informed. At these meetings we also encourage interactions between programmers in smaller groups.
Focus meetings
During our work we regularly recognise similar problems or solutions being raised in more than one context. For such topics we organise focus meetings. A focus meeting brings together a group of people that preferably have never met in that composition, to discuss a subject that is either crossing borders between technologies or between sectors. Focus meetings are not only organised by DTL Data, but also by the DTL Technologies and DTL Learning programmes. A focus meeting often contains a few short lectures, followed by a well-prepared discussion that engages the whole audience. After the meeting, a white paper is written by the organisers of the meeting that is published on the DTL website.
Interest and working groups
If a group of people, e.g. after a focus meeting, feels the need to exchange experience more often, they can form a so-called interest group within DTL. DTL facilitates these interest groups with meeting rooms, and tries to find a young researcher as a champion of the group to keep it going. This is modelled after “Project and Area Liaisons” (PALs) from earlier EU and UK projects 4. PALs are rewarded for introducing new ways of working: they are provided with extra support for their work and direct influence on the development of the new working methods.
An interest group that has identified an issue they want to work on together can form a working group. A working group needs to be supported by a part-time project leader to take the practical work out of the hands of the principal investigators. Each working group must deliver a practical result (deliverable) after a limited time. DTL is looking for ways to support the working groups by providing resources for the project leaders.
Both interest groups and working groups can be supported with a good software development environment, mailing lists, a website and a wiki to exchange information.
Relations with other DTL programmes
The data programme interacts with many organisations, both internal to DTL (other programmes and partners) as well as external (for instance IMI projects and RIs under the EC ESFRI scheme).
Help desk, training and education
In the day to day operations of the Data programme, we frequently come across needs for training: both training for data scientists to broaden their knowledge with newly developed technologies, as well as training for life scientists to make them aware of and teach them how to use solutions that are being developed in DTL Data projects. This is expected to become even more important once the development of local data desks in different institutions will be realised. The setup of these data desks will bring together experienced data scientists from different institutes, and they will find out that others have complementary expertise that they sometimes need to replicate. Also, life scientists with less experience will have a low barrier to approach their local data desk for advice, bringing in more demand for basic data awareness training. All of these training needs will be developed with the DTL training Programme, which is very well connected to people and organisations that can support this effort.
Data-related technology hotels
Many of the people involved in DTL Data offer their services to life scientists as a Data hotel in the DTL Technologies Programme. DTL Data works with DTL technologies to define the needs of and requirements for these data-specific hotels. An overview of current DTL hotels is available at www.dtls.nl/expertise-services/hotels.
Relations with external programmes
ELIXIR
Synchronous with the development of the DTL organisation, bioinformatics institutes and laboratories all over Europe have set up the European research infrastructure for life science data and bioinformatics, ELIXIR. ELIXIR is organised as a hub hosted at the EBI in Hinxton, UK, and nodes in each of the member countries. In the Netherlands, DTL hosts the ELIXIR node (ELIXIR-NL). Association with ELIXIR gives us the possibility to reach out to experts and tools all over Europe.
DTL and ELIXIR have developed the concept of so-called Bring Your Own Data (BYOD) parties as a platform to bring together data owners and data experts. Also biological domain experts are invited where relevant. The main goal of these meetings is to get data owners acquainted with the possibilities to connect and functionally interlink their data with other datasets and knowledge resources by applying FAIR principles. Researchers can suggest a BYOD party and DTL will assist with the logistics and invite data experts.
Other ESFRI programmes and national projects
Europe has many other Research Infrastructures in the life sciences, each with their own special focus. Also in the Netherlands several larger project organisations are active in life science research. All of these have their own research data and associated challenges. In the Netherlands we make sure that the people working with that data are co-developing and steering the DTL Data Programme. This ensures that the methods and tools they use are compatible with the ELIXIR choices and avoids unnecessary duplication of development efforts.
Conclusion
Life science research becomes more and more data intensive and cross-disciplinary at unprecedented scales. Individual research groups do not have the resources and the interest to keep in contact with all expert providers and keep informed of the progress of other related projects at such scales. In the Netherlands we have developed a networked approach to accommodate for the challenges posed by modern data-intensive life science research. The establishment of DTL as a collective platform that brings together experts in various technological disciplines across life science domains, facilitated by a small core team, allows projects to run efficiently. Already in the preparatory period and in the first year of operations we have identified synergies between parallel running research projects and found common interests from surprisingly differently focused researchers. The growing community of experts involved in DTL Data makes sure required data-related expertise can be located for any researcher in the life sciences starting on any new project. At the publication date of this article DTL had over twenty confirmed member organisations. The current partner list can be found at www.dtls.nl/about/partnership/.
Contact
To find out how DTL Data can support your challenges or for more inquiries about the setup of the organisation, contact Rob Hooft (programme leader) at rob.hooft@dtls.nl
For further information on the other programmes of DTL contact Ruben Kok (director DTL) at ruben.kok@dtls.nl
Website: www.dtls.nl
Acknowledgements
Next to the authors, the following people have played instrumental roles in setting up DTL Data, and all share a DTL affiliation:
Jan-Willem Boiten, CTMM-TraIT, Eindhoven
Luiz Olavo Bonino, Dutch Techcentre for Life Sciences (Foundation office), Utrecht
Jildau Bouwman, TNO, Zeist and Netherlands Metabolomics Centre, Leiden
Richard Finkers, WUR, Wageningen and Plant Research International, Wageningen
Femke Francissen, Dutch Techcentre for Life Sciences (foundation office) and BioSB research school
Celia van Gelder, Dutch Techcentre for Life Sciences, Utrecht and Radboudumc, Nijmegen
Martien Groenen, WUR, Wageningen
Jaap Heringa, VU University Amsterdam and BioSB research school (Deputy Head of Node ELIXIR-NL)
Irene Nooren, SURFsara, Amsterdam
Merlijn van Rijswijk, Dutch Techcentre for Life Sciences (foundation office), Utrecht and Netherlands Metabolomics Centre, Leiden
Marco Roos, LUMC, Leiden
Morris Swertz, UMCG, Groningen
Funding Statement
This work has been funded by the author’s home institutes.
[version 1; referees: 3 approved with reservations]
References
- 1. Mons B, van Haagen H, Chichester C, et al. : The value of data. Nat Genet. 2011;43(4):281-3. 10.1038/ng0411-281 [DOI] [PubMed] [Google Scholar]
- 2. Goecks J, Nekrutenko A, Taylor J, et al. : Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86. 10.1186/gb-2010-11-8-r86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Williams AJ, Harland L, Groth P, et al. : Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today. 2012;17(21–22):1188-98. 10.1016/j.drudis.2012.05.016 [DOI] [PubMed] [Google Scholar]
- 4. Goble C, Wolstencroft K, Owen S, et al. : SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe. In: Proceedings of the UK e-Science All Hands Meeting.2009. Reference Source [Google Scholar]