. 2024 May 2;3(5):e0000492. doi: 10.1371/journal.pdig.0000492

Table 2. Roadmap domain and corresponding quotes.

Domain	Exemplary quotes
Conception and planning Diversity and collaboration Preliminary research Co-production Safety measures Preliminary testing Healthcare integration Evaluation and auditing Maintenance Termination	P19 (clinician) ‘If the NHS is your target customer, what do they expect when it comes to buying the services? Are they buying something based on outputs or outcomes? They want to see much like you would with an ad tech or a Facebook campaign. Are they looking to see impressions or reach or are they looking to see behaviour change? They want to see uptake of counselling, screening services. If that’s what they’re looking for, that’s what you wanted to design your digital solution towards. [...] I think it probably needs to be given at the moment the way that services are commissioned, it needs to be given to each trust, and particularly because those local trusts will be more familiar with the local needs assessments of their communities’ P12 (developer) ‘We did some preliminary work, try to understand why people are anti-vaccine. So one thing we did is we use social media data. So we collect, you know, like people’s opinions... analysis, you know from social media. And then we try to align that to health belief models and some other models and to analyse why…You know we’re developing conversational agents for vaccine promotion’ P18 (developer) ‘So I think …how we build our company, it’s been specifically around diversity as well and not as a sort of corporate buzzword or anything … we have a fairly equal mix of male and female engineers, which is quite hard to do in this field, right? But then we have ethnicity diversity in terms of background… we’ve got a lot of everything from Asian to European cultural. We’re ...don’t see everything represented, but we are as we’re growing specifically looking for these things, because we believe what we build is for everybody. And the only way to deal with unconscious bias is by having people from different backgrounds different education, different ethnicities, different everything. Because otherwise you will be blind with the best intentions. I’m the stereotypical white male in it. It doesn’t get much more stereotypical, right?’ P18 (community member) ‘…targeting my own group, like Hispanic people like I wouldn’t like that, to be honest…kind of phrasing it like we’re giving you tailored advice according to your ethnic background, I wouldn’t like that. But it’s completely different if you’re just like ticking boxes. So, if at the beginning of the conversation with the chatbot, he asks me….asks for kind of demographics and it includes my ethnic background. I wouldn’t care. Does that difference make sense to you?’ P39 (community member) ‘So I’m Muslim. In Islam it’s … There’s a difference of opinion because in Islam, I don’t … I do think I have like decent knowledge. God has given us the brains we must use the brain when necessary and I don’t see any harm in that. No one’s being hurt. In fact, people are being helped. So in that case, I believe the religion would not condone this. It would in fact encourage it. But there’s other people who are bit less educated, who twist their religion. They think God is God is the creator, he’s the one. Why should we replace humans with artificial intelligence? But God has created the brains for people to create the artificial intelligence. You should be able to use your brain to the best of the capacity. If you have the resources, use it’ P21 (clinician) ‘I like the preliminary research section. That’s really nice because you’re building on existing evidence which is something that people sometimes use as almost like a get out clause for addressing health disparity issues. So in our project, for example, we say you need to do your own scoping around where health inequalities may already exist in your use case, and then use that information to then inform your decisions for algorithm development or data selection’ P22 (clinician) ‘you see this a lot generally from tech designers, chatbot designers where they don’t, they …often forget to put in the preliminary research is what is the research for the medical condition you are trying to solve. And is there any research that technology maybe not chatbot technology, but technology has been used to solve that issue. So, for example, we use chatbot technology to surface mental health support. The chatbot technology is new but online CBT has been around for 10-15 years…. So we can talk about the evidence for online CBT to say going online is plausible and commutable.... Let’s try adding a chatbot to that. I think a lot of people tend to miss that - they focus so much on proving the chatbot’ P6 (developer) ‘I get worried with nuanced health information and using that for Google for translation purposes. We actually are publishing an article totally unrelated about Spanish language vaccine misinformation and how Google Translate actually can promote misinformation because they translate it incorrectly and there’s a lot of real nuances. I think, especially if you’re making a chat bot for a specific group of people that uses the language that they use and that the slang terms that they use, I personally would not trust Google Translate. I would have someone in the specific community that I’m working in, translate them’ P36 (community member) ’ the pictures they put in, the colour as well. I know it’s something little, but like if I see a chatbot with brown skin I’ll be like OK. I don’t if that if that makes any sense, you know? Like on chats on WhatsApp, there’s the emoji they made it of different colour. Now if I see one of that, it’s more appealing to find something of my skin and I’ll be like, OK or maybe one that has someone you know, the Muslim burka. It’s kind of appealing. I think it’s it shows inclusiveness from my point of view’ P21 (clinician) ‘I would love to see something like somewhere… somewhere in this list around public transparency. I don’t know if you have that but staying accountable to the public around like safety incidents. Or you know just bad things that happen essentially... So, for medical devices, um, in the US for example, if you’re registered on the FDA, you have an adverse event, it’s actually captured on a publicly available uh public facing database called Maude …I think is really good practice is to have some public facing accountability around things that may have gone wrong, and what you’ve done about it. I think there’s something around public transparency which would be really good here’ P2 (clinician) ‘People putting in, you know, for example, details about sexual partners. There needs to be no patient identifiable data. Any kind of any interactions in the chat bot that have been evaluated need to have any kind of patient ID taken out and unless they work in the department then you know you can obviously kind of link it up, but I think it’s more information governance than the use of the that the use of the data once patients have put in, you know what they’re there for. And if you were to go down the route of asking about risk for example, so you’re asking about sexual behaviours. Even things like their IP address need to be masked, because obviously they’re telling you they have sex with guys and they’re married’ P17 (developer) ‘but there is definitely some back and forth between the different steps as well and just... Yeah. I think that would just help to illustrate the overall process, because it’s not, you know, it’s not just a linear process like that. There’s a lot of steps that, you know, maybe involve going back to speak to the user groups or the professionals again. And one of the other things, that was useful for us earlier on… specifically usability testing with users, with the initial prototype, because one of the most frustrating things that we found when talking to people was technical errors that were coming up and immediately putting people off using the chatbot... Altogether... so the sooner you can iron out those problems, the better’. P23 (community member) ‘I also think that you know nowadays we can … book restaurants just saying okay Google, you know, book me a restaurant somewhere. Why can’t we do that with sexual health... and it’s used like, obviously there are things to consider that are very much related to confidentiality, but why can’t we have a chatbot that can actually give me the service I want? Instead of frustrating me by telling me that I have to access this link and then press these buttons, you know, put it, put it there, you know, make it easy for the, for, for the population for everyone’ P22 (clinician) ‘So, one of the big challenges of being adopted in the NHS is we did an accessibility and adoption report as part of our last study and the majority of staff when we asked them what do you think AI is? They either thought it was like the chatbot that you talked to when you’re trying to do your online banking and you spell out the password and it spells you back a completely different word. Or they thought it was like the Terminator, like there was no in between… no middle ground, and so there’s a lot of hearts and mind stuff to be done with clinicians because ultimately if you’ve got a patient facing technology or a clinician controlled technology, you have to get the frontline clinicians to believe in your product or they won’t implement it’. P10 (clinician) ’Evaluation is also really challenging to be honest with you. How one goes about determining whether a chatbot is good or bad is not straightforward. There is, first of all, like clinicians aren’t themselves accurate. Tests that clinicians take are like a really artificial environment... patient outcomes are probably the best measure of success, but they require a lot of longitudinal data. And it has a lot of selection bias attached to it. So yeah, evaluation is really challenging, and I would say probably the best approach is like a multi-pronged approach over a long period of time. It’s something that, you know, I think is really, really hard because the level of investment that is required is very high, and the ROI on it is questionable, interestingly enough, so I think it’s something, you know, to really think carefully about in terms of your strategy there.’ P3 (academic) ‘The way they would be evaluated is very different. So, it depends on what the chatbot is doing, if it’s, for example, diagnosing a disease. You would do a, you know, like an accuracy type study, if it was delivering therapy like there’s lots of mental health chatbots that are delivering cognitive behavioural therapy, talking therapies. You’ll be doing an effectiveness study.’ P19 (clinician) ‘Because I think, um, actually to implement a tool that is, uh, safe, efficient, and beneficial costs probably more than many people anticipate, and so does the design, development and implementation process of it. And I don’t think necessarily that you will see any efficiencies, outback end of implementing something like a chatbot for some time’ P20 (clinician) ‘if the chatbot needs changing, then has the... if it’s interoperable with other systems that have the risk of all these other systems falling down. So, any change needs to be done within a test system that can integrate with the test system of the other health informatics systems that are being used. So, we, we’ve recently stopped using a digital provider for our electronic or our digitally sent out letters because they can’t provide a test system’ P20 (clinician) ‘Removing the chatbot from the site. If something critical were to happen, we would find that it was causing more harm than good. Removing it would be trivial. Yeah. Also, I think it is within the scope of their work because essentially, how we would deliver it would be to give them instructions on how to add it to their website, which would be quite a simple process. So I think it would be on them to reverse those instructions.’ P1 (technical expert) ‘You need a long leading time for doing switched off. It needs communication with the department and the patient base that are using it about why it’s being turned off and when it’s being turned off. You need to leave a similar landing page or similar area or website around signposting for different things so that people can still access information online. And there needs to work within the department around trying to figure out the unmet need once it goes offline and trying to kind of compensate by either more staff or changing pathways or changing the websites or changing the other access points into the service for advice or so I think it needs a long ...well three month plus warnings being switched off and then people need to know that switched off because people may rely on it more than once or twice for advice. And it needs to be replaced with something obviously, either a human or another chatbot to replace it.’

Domain

Exemplary quotes

Conception and planning

Diversity and collaboration

Preliminary research

Co-production

Safety measures

Preliminary testing

Healthcare integration

Evaluation and auditing

Maintenance

Termination

P19 (clinician) ‘If the NHS is your target customer, what do they expect when it comes to buying the services? Are they buying something based on outputs or outcomes? They want to see much like you would with an ad tech or a Facebook campaign. Are they looking to see impressions or reach or are they looking to see behaviour change? They want to see uptake of counselling, screening services. If that’s what they’re looking for, that’s what you wanted to design your digital solution towards. [...] I think it probably needs to be given at the moment the way that services are commissioned, it needs to be given to each trust, and particularly because those local trusts will be more familiar with the local needs assessments of their communities’

P12 (developer) ‘We did some preliminary work, try to understand why people are anti-vaccine. So one thing we did is we use social media data. So we collect, you know, like people’s opinions... analysis, you know from social media. And then we try to align that to health belief models and some other models and to analyse why…You know we’re developing conversational agents for vaccine promotion’

P18 (developer) ‘So I think …how we build our company, it’s been specifically around diversity as well and not as a sort of corporate buzzword or anything … we have a fairly equal mix of male and female engineers, which is quite hard to do in this field, right? But then we have ethnicity diversity in terms of background… we’ve got a lot of everything from Asian to European cultural. We’re ...don’t see everything represented, but we are as we’re growing specifically looking for these things, because we believe what we build is for everybody. And the only way to deal with unconscious bias is by having people from different backgrounds different education, different ethnicities, different everything. Because otherwise you will be blind with the best intentions. I’m the stereotypical white male in it. It doesn’t get much more stereotypical, right?’

P18 (community member) ‘…targeting my own group, like Hispanic people like I wouldn’t like that, to be honest…kind of phrasing it like we’re giving you tailored advice according to your ethnic background, I wouldn’t like that. But it’s completely different if you’re just like ticking boxes. So, if at the beginning of the conversation with the chatbot, he asks me….asks for kind of demographics and it includes my ethnic background. I wouldn’t care. Does that difference make sense to you?’

P39 (community member) ‘So I’m Muslim. In Islam it’s … There’s a difference of opinion because in Islam, I don’t … I do think I have like decent knowledge. God has given us the brains we must use the brain when necessary and I don’t see any harm in that. No one’s being hurt. In fact, people are being helped. So in that case, I believe the religion would not condone this. It would in fact encourage it. But there’s other people who are bit less educated, who twist their religion. They think God is God is the creator, he’s the one. Why should we replace humans with artificial intelligence? But God has created the brains for people to create the artificial intelligence. You should be able to use your brain to the best of the capacity. If you have the resources, use it’

P21 (clinician) ‘I like the preliminary research section. That’s really nice because you’re building on existing evidence which is something that people sometimes use as almost like a get out clause for addressing health disparity issues. So in our project, for example, we say you need to do your own scoping around where health inequalities may already exist in your use case, and then use that information to then inform your decisions for algorithm development or data selection’

P22 (clinician) ‘you see this a lot generally from tech designers, chatbot designers where they don’t, they …often forget to put in the preliminary research is what is the research for the medical condition you are trying to solve. And is there any research that technology maybe not chatbot technology, but technology has been used to solve that issue. So, for example, we use chatbot technology to surface mental health support. The chatbot technology is new but online CBT has been around for 10-15 years…. So we can talk about the evidence for online CBT to say going online is plausible and commutable.... Let’s try adding a chatbot to that. I think a lot of people tend to miss that - they focus so much on proving the chatbot’

P6 (developer) ‘I get worried with nuanced health information and using that for Google for translation purposes. We actually are publishing an article totally unrelated about Spanish language vaccine misinformation and how Google Translate actually can promote misinformation because they translate it incorrectly and there’s a lot of real nuances. I think, especially if you’re making a chat bot for a specific group of people that uses the language that they use and that the slang terms that they use, I personally would not trust Google Translate. I would have someone in the specific community that I’m working in, translate them’

P36 (community member) ’ the pictures they put in, the colour as well. I know it’s something little, but like if I see a chatbot with brown skin I’ll be like OK. I don’t if that if that makes any sense, you know? Like on chats on WhatsApp, there’s the emoji they made it of different colour. Now if I see one of that, it’s more appealing to find something of my skin and I’ll be like, OK or maybe one that has someone you know, the Muslim burka. It’s kind of appealing. I think it’s it shows inclusiveness from my point of view’

P21 (clinician) ‘I would love to see something like somewhere… somewhere in this list around public transparency. I don’t know if you have that but staying accountable to the public around like safety incidents. Or you know just bad things that happen essentially... So, for medical devices, um, in the US for example, if you’re registered on the FDA, you have an adverse event, it’s actually captured on a publicly available uh public facing database called Maude …I think is really good practice is to have some public facing accountability around things that may have gone wrong, and what you’ve done about it. I think there’s something around public transparency which would be really good here’

P2 (clinician) ‘People putting in, you know, for example, details about sexual partners. There needs to be no patient identifiable data. Any kind of any interactions in the chat bot that have been evaluated need to have any kind of patient ID taken out and unless they work in the department then you know you can obviously kind of link it up, but I think it’s more information governance than the use of the that the use of the data once patients have put in, you know what they’re there for. And if you were to go down the route of asking about risk for example, so you’re asking about sexual behaviours. Even things like their IP address need to be masked, because obviously they’re telling you they have sex with guys and they’re married’

P17 (developer) ‘but there is definitely some back and forth between the different steps as well and just... Yeah. I think that would just help to illustrate the overall process, because it’s not, you know, it’s not just a linear process like that. There’s a lot of steps that, you know, maybe involve going back to speak to the user groups or the professionals again. And one of the other things, that was useful for us earlier on… specifically usability testing with users, with the initial prototype, because one of the most frustrating things that we found when talking to people was technical errors that were coming up and immediately putting people off using the chatbot... Altogether... so the sooner you can iron out those problems, the better’.

P23 (community member) ‘I also think that you know nowadays we can … book restaurants just saying okay Google, you know, book me a restaurant somewhere. Why can’t we do that with sexual health... and it’s used like, obviously there are things to consider that are very much related to confidentiality, but why can’t we have a chatbot that can actually give me the service I want? Instead of frustrating me by telling me that I have to access this link and then press these buttons, you know, put it, put it there, you know, make it easy for the, for, for the population for everyone’

P22 (clinician) ‘So, one of the big challenges of being adopted in the NHS is we did an accessibility and adoption report as part of our last study and the majority of staff when we asked them what do you think AI is? They either thought it was like the chatbot that you talked to when you’re trying to do your online banking and you spell out the password and it spells you back a completely different word. Or they thought it was like the Terminator, like there was no in between… no middle ground, and so there’s a lot of hearts and mind stuff to be done with clinicians because ultimately if you’ve got a patient facing technology or a clinician controlled technology, you have to get the frontline clinicians to believe in your product or they won’t implement it’.

P10 (clinician) ’Evaluation is also really challenging to be honest with you. How one goes about determining whether a chatbot is good or bad is not straightforward. There is, first of all, like clinicians aren’t themselves accurate. Tests that clinicians take are like a really artificial environment... patient outcomes are probably the best measure of success, but they require a lot of longitudinal data. And it has a lot of selection bias attached to it. So yeah, evaluation is really challenging, and I would say probably the best approach is like a multi-pronged approach over a long period of time. It’s something that, you know, I think is really, really hard because the level of investment that is required is very high, and the ROI on it is questionable, interestingly enough, so I think it’s something, you know, to really think carefully about in terms of your strategy there.’

P3 (academic) ‘The way they would be evaluated is very different. So, it depends on what the chatbot is doing, if it’s, for example, diagnosing a disease. You would do a, you know, like an accuracy type study, if it was delivering therapy like there’s lots of mental health chatbots that are delivering cognitive behavioural therapy, talking therapies. You’ll be doing an effectiveness study.’

P19 (clinician) ‘Because I think, um, actually to implement a tool that is, uh, safe, efficient, and beneficial costs probably more than many people anticipate, and so does the design, development and implementation process of it. And I don’t think necessarily that you will see any efficiencies, outback end of implementing something like a chatbot for some time’

P20 (clinician) ‘if the chatbot needs changing, then has the... if it’s interoperable with other systems that have the risk of all these other systems falling down. So, any change needs to be done within a test system that can integrate with the test system of the other health informatics systems that are being used. So, we, we’ve recently stopped using a digital provider for our electronic or our digitally sent out letters because they can’t provide a test system’
P20 (clinician) ‘Removing the chatbot from the site. If something critical were to happen, we would find that it was causing more harm than good. Removing it would be trivial. Yeah. Also, I think it is within the scope of their work because essentially, how we would deliver it would be to give them instructions on how to add it to their website, which would be quite a simple process. So I think it would be on them to reverse those instructions.’

P1 (technical expert) ‘You need a long leading time for doing switched off. It needs communication with the department and the patient base that are using it about why it’s being turned off and when it’s being turned off. You need to leave a similar landing page or similar area or website around signposting for different things so that people can still access information online. And there needs to work within the department around trying to figure out the unmet need once it goes offline and trying to kind of compensate by either more staff or changing pathways or changing the websites or changing the other access points into the service for advice or so I think it needs a long ...well three month plus warnings being switched off and then people need to know that switched off because people may rely on it more than once or twice for advice. And it needs to be replaced with something obviously, either a human or another chatbot to replace it.’