Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2025 May 22;2024:561–570.

Using think-aloud protocol to identify cognitive events while generating data-driven scientific hypotheses by inexperienced clinical researchers

Xia Jing 1, Brooke N Draghi 1,*, Mytchell A Ernst 1,*, Vimla L Patel 2, James J Cimino 3, Jay H Shubrook 4, Yuchun Zhou 5, Chang Liu 6, Sonsoles De Lacalle 7
PMCID: PMC12099392  PMID: 40417518

Abstract

We conducted a data-driven hypothesis generation study with clinical researchers using VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control, e.g., SPSS, SAS, R). The participants analyzed the same datasets and developed hypotheses using a think-aloud verbal protocol. Their screen activities and audio were recorded, transcribed, and coded for cognitive events. We analyzed the recordings to identify the cognitive events (e.g., “Analyze data”) during hypothesis generation. The VIADS group exhibited the lowest mean number of cognitive events per hypothesis with the smallest standard deviation. The highest percentages of cognitive events in hypothesis generation were “Using analysis results” (30%) and “Seeking connections” (23%). The results suggest that VIADS may guide participants better than the control group. Our framework for scientific hypothesis generation in clinical research contexts guides the elaboration of the underlying cognitive mechanism of the process.

Introduction

A research hypothesis is an educated guess regarding relationships between variables1. A research question typically comprises one or more scientific hypotheses that drive the direction of research1, 2. Hypothesis generation constitutes the starting point in the life cycle of a research project. Without a significant, insightful, and novel hypothesis, research can be unfocused and is less likely to have an impact within the field, regardless of the study design, experiment implementation, or results. Therefore, hypothesis generation plays a critical role in research. Several studies have investigated the mechanisms underlying the generation of scientific hypotheses by researchers, both in science (e.g., Dunbar and Klahr3, 4) and clinical medicine (e.g., Joseph and Patel5, 6). Over decades, researchers explored ways to facilitate scientific hypothesis generation for research purposes: Spangler, et al. developed an automatic hypothesis generation tool based on scientific literature mining7; Kehrer, et al. used an interactive visual data tool to facilitate hypothesis generation in climate research8, and Loadstar is an interactive computational notebook for novice data scientists to explore datasets and prototype workflow rapidly9, etc. The majority of such studies validated their tools by applying existing datasets to generate known conclusions (i.e., retrospective study) without controlled human subject studies to demonstrate how end users interact with such tools while generating new hypotheses (i.e., prospective study)7, 8. A granular understanding of how end users use such tools to generate hypotheses not only reveals an important but unknown hypothesis-generation process but also provides irreplaceable direct evidence to guide further development and improvement of similar tools.

In scientific research, many hypotheses are generated in two types of settings. The first is that hypotheses are originated from experimental observations, e.g., any unusual phenomena observed in the context of a biological “wet lab.” The second is that hypotheses are originated from data analysis, e.g., studies in epidemiology, genomics, and informatics10. Observations of unique or unusual phenomena in the first type and observations of trends in the second are critical to the development of scientific hypotheses4, 11. Herein, we focus on the process involved in hypothesis generation in the second type of setting.

Over the past few decades, there has been considerable progress in our understanding of scientific and medical reasoning, problem-solving, working memory, and the use of analogies4, 12. The reasoning process has been explored in educational settings and through careful examination of the process used to solve math problems13, 14. There are a couple of important differences between scientific reasoning and hypothesis generation. First, they have different starting points: scientific reasoning can begin with an existing problem or a given puzzle14-17, and these are most often in knowledge lean domains16. However, data-driven scientific hypothesis generation searches for a problem or focus area, and these are often related to knowledge rich domains, such as physics and medicine18. The latter approach has been named ‘open discovery’19. Second, they utilize different problem-solving mechanisms, with convergent thinking predominant in scientific reasoning4 and divergent thinking predominantly in data-driven scientific hypothesis generation. Meanwhile, hypothesis generation for the purposes of medical diagnosis is similar to scientific reasoning as it begins with an existing problem, i.e., a medical case and its signs and (or) symptoms16, 20.

We have previously developed a conceptual framework for scientific hypothesis generation and its contributing factors21. Although researchers have explored the possibility of automatically generating scientific hypotheses in the past7, 10, 22, 23, they recognized the challenges of completely automating such an advanced cognitive process7, 24.

In this study, we aim to learn more about the scientific hypothesis generation process in clinical research contexts. Since scientific hypotheses can directly affect and guide the direction of a research project, our findings may potentially impact the clinical research enterprise. We have previously described the algorithms and development of our research protocol25, a tool developed by our team for the analysis of secondary data, VIADS26-30 (a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies). The primary features of VIADS include visualization of hierarchical structure, filtering datasets, providing summary views of datasets, and routine visualization features, such as zoom in/out, graph/datasets export, showing expanded information, and changing the layout of graphs26. We have conducted a usability study31 and quality evaluation of the hypotheses generated by participants21 and have published an overall introduction to the research project32. A paper describing the development of our hypothesis evaluation metrics development and instrument is available as a preprint33. In this manuscript, we focus on understanding the cognitive processes of clinical researchers generating data-driven scientific hypotheses and identifying the cognitive events in their workflow. We anticipate our results will inform us about the process itself and how to improve it with informatics tools in the future.

Methods

Study flow and data sets

This 2 × 2 study compared the hypothesis generation process of clinical researchers using VIADS with those not using VIADS. The VIADS and non-VIADS participants used the same datasets and study scripts, were overseen by the same study facilitator, and generated their hypotheses within the same timeframe (2-hour study session) and setting. They all followed the think-aloud protocol. The participants were divided into experienced and inexperienced clinical researchers based on preestablished criteria25. These were years of study design experience, and data analysis experience, and the number of publications as significant contributors. They were further subdivided into the experienced or inexperienced groups. The participants were assigned to either the VIADS group or the non-VIADS (i.e. the control) group using block randomization. VIADS is a data analysis tool that can visualize, filter, summarize, and compare data sets that are coded with hierarchical terminologies, such as ICD codes. The VIADS group participants received an additional one-hour training session on using VIADS before their study session. The control group participants were free to use any analytical tools they wished, such as SPSS, SAS, R, or Excel.

The data sets used during the study sessions were extracted from the 2005 and 2015 versions of the National Ambulatory Medical Care Survey (NAMCS) by the Centers for Disease Control and Prevention. We preprocessed the NAMCS data sets by calculating and aggregating the ICD-9-CM diagnostic and procedural codes and their frequencies. We provided the full names of these ICD-9-CM codes for participants to use during the study sessions. During 2-hour study sessions, the participants (i.e., the clinical researchers) were asked to analyze the data sets and develop and generate hypotheses, articulating their thought processes and decision-making throughout to follow the think-aloud verbal protocol. The screen activity and conversations between participants and the study facilitator were recorded using BB Flashback for Windows (Blueberry Software, UK). The permission to record participants was consented by each participant prior to the study sessions. The recordings were transcribed by a professional transcription service and checked for accuracy by a content expert.

Cognitive event coding of the hypothesis generation recordings

Based on the study sessions, initial data analysis, feedback from the internal investigation team, and a literature review1, 11, 34-36, a preliminary conceptual framework of the hypothesis generation process was developed before coding (Figure 1). The framework served as a foundation to formulate the initial codes and code groups that were used to code the transcriptions of the recordings and for cognitive events (e.g., seek connections, analogy) identified in the hypothesis generation process. For example, “Analogy” was used when a participant compared one’s prior study with the analysis results in front of him/her. “Use PICOT” was used when a participant used PICOT (i.e., patient, intervention, comparison, outcome, type of study) to formulate an idea into a formal hypothesis statement.

Figure 1.

Figure 1.

The initial version of the framework on cognitive events during data-driven hypothesis generation

The transcription of one study session was utilized as a pilot coding case to set the initial coding principles. The pilot coding sessions were used as training sessions for the two coders as well. The rest of the transcriptions were coded by the two coders independently and separately. The two coders compared their coding results, discussed any discrepancies, and reached a consensus on coding later by including the study facilitator and modifying the coding principles. More codes and code groups were added while the coding progressed. After coding all the transcripts, the two coders organized each hypothesis generation process as an independent process and labeled the cognitive events during that hypothesis generation. This is one of the main rationales that one of our analysis units is a hypothesis instead of a participant. We investigated the possible hypothesis generation process based on coded cognitive events.

Data analysis strategy

This study used the cognitive events and the aggregated frequencies of these events to demonstrate the possible scientific hypothesis generation process. While analyzing the cognitive events, we considered the results from four levels: (1) each hypothesis generation as a unit, and we examined all hypotheses (n = 199) and the cognitive events for every hypothesis, (2) each participant as a unit and all participants (n = 16) as a unit and all cognitive events per participant and all cognitive events for all participants, (3) the group of participants who used VIADS as a unit (n = 9) and all cognitive events used by them, and (4) the group of participants who did not use VIADS as a unit (n = 7) and all cognitive events used by them. Correspondingly, the results were also organized at these four levels. We performed independent t-tests to compare the cognitive events between participants (a) in the VIADS and control groups and (b) between the experienced (3 participants, 36 hypotheses) and inexperienced clinical researchers (13 participants, 163 hypotheses). The study sessions of two additional participants (in the control group, both were inexperienced clinical researchers) were missing from the coding data because of technical failures resulting in partial recording of their study sessions, and their data were excluded from the analysis.

All hypotheses were rated by a panel of seven experts with extensive clinical research experience, using the same metrics for quality evaluation21. The brief version of the assessment instrument was used to assess hypotheses. This considers the significance, validity, and feasibility of each hypothesis33. Significance was defined as the potential to have an impact on medical needs, reduce medical costs, improve effectiveness, or benefit the target population. Validity referred to the soundness of a hypothesis clinically and scientifically. Feasibility was judged on whether the testing of the hypothesis was possible based on the available resources and the scope of the work. Each dimension was assessed on a five-point Likert scale, with 5 as the highest rating (i.e., most significant, most valid, most feasible). We deemed a hypothesis invalid if three or more experts rated its validity as 1 (the lowest rating). However, we analyzed and included both results for all the hypotheses and the results for only valid hypotheses.

Ethics statement

The study was approved by the Institutional Review Board of Clemson University, South Carolina (IRB2020-056) and Ohio University Institutional Review Boards (18-X-192).

Results

Hypothesis generation framework

Our preliminary understanding of hypothesis generation is shown in Figure 1. Figure 2 is a refined and evolving version of this initial framework and guided the coding of cognitive events. As shown in Figure 2, the predominant cognitive events within the processing evidence category were “Using analysis results” (30%), “Seeking connections” (23%), and “Analyze data” (20.81%). Table 1 separately presents the individual cognitive events used for all hypotheses and for valid hypotheses only. Nine female and seven male clinical researchers participated in the study, with the majority of them having < 2 years of study design and data analysis experience and < 5 publications as significant contributors. The full participants’ demographic data have been provided in a prior publication21. A power analysis for determination of sample size25, the actual power of the study, and the statistical analysis of our hypotheses’ quality assessment 21 have also been provided in prior publications.

Figure 2.

Figure 2

Cognitive process frameworks for scientific hypothesis generation in clinical research; the highest percentages of cognitive events used by clinical researchers were highlighted.

Table 1.

Cognitive events used while generating data-driven hypotheses by clinical researchers

graphic file with name 5603t1.jpg

Overall usage of cognitive event types during hypothesis generation

During the 2-hour study sessions, the 16 participants generated 199 hypotheses, with 163 originated from the inexperienced groups (Table 2). We used 20 distinct codes to represent 20 cognitive events and six code groups (Figure 2). In total, 1216 cognitive events were applied during the generation of the 199 hypotheses. On average, inexperienced clinical researchers in the control group applied 7.38 cognitive events per hypothesis (total number of cognitive events divided by total number of hypotheses). The inexperienced clinical researchers in the VIADS group used an average of 4.48 cognitive events per hypothesis (p< 0.001 versus inexperienced control) with the lowest standard deviation (SD) (2.43). Experienced clinical researchers employed an average of 6.15 cognitive events per hypothesis (p < 0.01 versus inexperienced VIADS). Notably, the inexperienced clinical researchers in the control group used the highest average number of cognitive events (7.38), with the largest SD (5.02). This was true regardless of whether we considered all hypotheses or just those deemed valid (Table 2). Approximately 10% more valid hypotheses were generated by the experienced participants (72.22% vs. 63.19%) than by inexperienced participants. However, we were only able to successfully recruit three experienced clinical researchers, so the results from the experienced groups must be interpreted cautiously. The three experienced participants used 246 cognitive events to generate 36 hypotheses. As these were sufficient numbers for analysis, we included the experienced groups only in those evaluations that used hypotheses and cognitive events as analysis units.

Table 2.

Group-wise comparison of cognitive events used while generating hypotheses

All hypotheses Valid #(%) Invalid #(%)
# Hypotheses by different groups
All participants 199 129 (64.82) 70 (35.18)
Junior clinical researchers (n = 13) 163 103 (63.19) 60 (36.81)
Experienced clinical researchers (n = 3) 36 26 (72.22) 10 (27.78)
Aggregated total cognitive event counts by different groups
All participants 1216 840 (69.08) 376 (30.92)
Junior clinical researchers 970 664 (68.45) 306 (31.55)
Junior clinical researchers- Control (C) 450 315 (70) 135 (30)
Junior clinical researchers- VIADS (V) 520 349 (67.12) 171 (32.89)
Experienced clinical researchers 246 176 (71.54) 70 (28.46)
Average cognitive events per participant per hypothesis
Junior clinical researcher C/hypothesis (SD) 7.38 (5.02) 7.68 (5.21) 6.75 (4.66)
Junior clinical researcher V/hypothesis (SD) 4.48 (2.43)# 4.59 (2.69) 4.28 (1.84)
Experienced clinical researcher/hypothesis (SD) 6.15 (3.03)* 5.87 (3.03) 7 (3.06)

Note: SD, standard deviation; # p < 0.001 between junior C and junior V; *p < 0.01 between junior V and experienced.

Cognitive events comparison between VIADS and non-VIADS participants

Furthermore, we compared the percentages of cognitive event count between the VIADS and non-VIADS groups among inexperienced clinical researchers (Figure 3). “Use analysis results” (31.3% vs. 27.1%, p < 0.001), “Seek connections” (25.4% vs. 17.8%, p < 0.001), and “Analyze data” (22.1% vs. 21.1%) were the events with the highest percentages. The “Seek connections”, “Use analysis results”, and “Pause/think” (3.8% vs. 9.3%, p < 0.05) all show statistical differences between the VIADS and control groups by t tests. Our results indicate that the participants in the VIADS group registered higher event counts during “Preparation”, when “Analyzing results”, and when “Seeking connections”. Conversely, the control group exhibited greater event counts in categories such as “Needing further study”, “Inferring”, “Pausing”, “Using checklists”, and “Using PICOT.”

Figure 3.

Figure 3

Comparing cognitive events generated by VIADS and control groups among inexperienced clinical researchers while generating hypotheses

Cognitive events comparison between experienced and inexperienced clinical researchers

We also examined the differences between experienced and inexperienced clinical researchers regarding the percentages of cognitive events they used (Figure 4). “Use analysis results” (31.7% vs. 29.4%, p < 0.01), “Seek connections” (27.6% vs. 21.9%, p < 0.01), and “Analyze data” (17.5% vs. 21.6%, p < 0.01) were events with the highest percentages of use. The data suggest that experienced clinical researchers exhibit higher percentages regarding these cognitive events: “Using analysis results”, “Seeking connections”, “Inferring”, and “Pausing”. Conversely, inexperienced clinical researchers demonstrated elevated percentages in cognitive events such as “Preparation”, “Data analysis”, “Utilizing suggestions”, “Utilizing checklists”, and “Utilizing PICOT”.

Figure 4.

Figure 4

Comparison of cognitive events between experienced and inexperienced clinical researchers while generating hypotheses

Summary of results

On average, the inexperienced VIADS group used the fewest cognitive events to generate each hypothesis than the inexperienced control group (p < 0.001) and the experienced clinical researchers (p < 0.01, Table 2). The cognitive events used most frequently during hypothesis generation were “Use analysis results” (29.85%), “Seek connections” (23.03%), and “Analyze data” (20.81%) (Figure 2). The inexperienced VIADS group demonstrated a similar trend to the experienced clinical researchers (Figures 3 and 4).

Discussion

Results interpretation

Several findings of this study were notable. The experienced clinical researchers had 10% more valid hypotheses than the inexperienced clinical researchers (72.22% vs. 63.19%; Table 2), consistent with proposition and experience. To the best of our knowledge, this is the first quantifiable comparison between the two groups although we do acknowledge the experienced clinical researchers’ group has very limited participants, which limits the generalizability of the results. However, we do want to emphasize in this analysis each hypothesis is a unit, not every participant. Another interesting phenomenon is regarding the average cognitive events used by the different groups: the inexperienced VIADS group used far fewer events per hypothesis than the control or experienced groups (4.38 vs. 7.38 vs. 6.15) (Table 2) and exhibited the lowest SD. This is highly significant as it indicates that the VIADS group, despite comprising inexperienced clinical researchers, used fewer cognitive events to generate each hypothesis on average. This result supports our hypothesis that VIADS facilitates hypothesis generation. In addition, this result supports our findings that the VIADS group used a shorter time to generate each hypothesis on average21.

Our results show clinical researchers spent ≥ 70% of cognitive events to process evidence during hypothesis generation. The top three cognitive events used by clinical researchers during hypothesis generation included “Using analysis results” (29.85%), “Seeking connections” (23.03%), and “Analyzing data” (20.81%, Figure 2).

Figure 3 presents the cognitive events and their distributions between the VIADS and control groups comprising the inexperienced clinical researchers. The participants in the VIADS group showed a higher number of cognitive events for interpreting the results, and the participants in the control group showed a higher number of cognitive events for external help, such as checklists and PICOT, during hypothesis generation. Figures 3 and 4 show that the VIADS group exhibits similar cognitive event trends with those of the experienced group in terms of “Using analysis results” and “Seeking connections”:

  • Using analysis results:
    • VIADS vs. control: 31.35% vs. 27.11% (p < 0.001);
    • experienced vs. inexperienced: 31.71% vs. 29.38% (p < 0.01)
  • Seeking connection:
    • VIADS vs. control: 25.38% vs. 17.78% (p < 0.001);
    • experienced vs. inexperienced: 27.64% vs. 21.86% (p < 0.01);
  • Pause/think:
    • VIADS vs. control: 3.8% vs. 9.3% (p < 0.05).

The results indicate that VIADS may help inexperienced clinical researchers move in a direction that aligns more with that of experienced clinical researchers. A more carefully designed study is needed to support or deny such a statement. However, it appears that the current quantitative evidence of cognitive events and their distributions among all cognitive events support such a trend. Our between-group comparison of cognitive events found that the VIADS group spent a greater proportion of their time using the analysis results and seeking connections. These are both more advanced cognitive activities than spending time understanding the results, as indicated by pause/think.

It should be emphasized that the control groups used analytical tools with which they were very familiar, while the VIADS groups had only spent an hour learning how to use VIADS before the study sessions. Therefore, the cognitive events differences between the VIADS and the control groups observed in this study could be greater than they appear to be. The difference could be larger in practice among researchers who have become more familiar with the VIADS. Hypothesis generation is a complex cognitive process, and, while we obtained interesting insights into the cognitive processes involved, we are a long way from determining a causality based on cognitive events. Nevertheless, our results establish a baseline for understanding the hypothesis generation process within clinical research contexts. Despite the small sample size, we still identified some contributions to the process by VIADS due to the randomization of participants. The ideal goal of research such as this is to reveal the causality of the hypothesis generation process, but this is not achievable without further work, as we must first be able to make accurate measurements of the variables that contribute to the process.

Significance of the work

Our study established a fundamental understanding of the data-driven hypothesis generation process among clinical researchers despite its complex cognitive process. The cognitive framework and the quantitative evidence on cognitive events, while clinical researchers generate hypotheses, provide a critical foundation to understand the cognitive mechanisms underneath the process. Our exploration provides a feasibility demonstration and baseline measurements of cognitive events to guide future studies in examining scientific hypothesis generation, a critical step in all research projects. Although it is a brand-new tool, our results show that VIADS can help inexperienced clinical researchers generate hypotheses faster and consistently with fewer cognitive events. VIADS is a data analytical tool, not a hypothesis generation tool. With a specifically designed hypothesis-facilitating tool in a more natural use environment, inexperienced clinical researchers likely generate hypotheses not only faster but also with higher quality.

Comparing to other studies

Patel et al. have explored medical reasoning through diagnoses, which have significantly influenced the design of the current study4, 5, 17, 20. From their studies, we know that there were differences in the reasoning processes and thinking steps between experienced and inexperienced clinicians in medical diagnosis6, 16, 20, 37, 38. Therefore, we separated the participants into experienced and inexperienced groups before assigning them randomly into VIADS or control groups. The findings of this study mostly align with those of Patel et al. despite our different settings, medical diagnosis vs. scientific hypothesis generation in clinical research. The experienced participants used fewer cognitive events than inexperienced participants on average, and the VIADS group used the lowest average number of cognitive events despite comprising inexperienced clinical researchers, which suggests the role of VIADS in the process.

Our study was informed by a study by Klahr and Dunbar published in 19883. In their study, the participants were taught to use an electronic device but then had to figure out an unencountered function of the device. This was used to study hypothesis generation and reasoning with iterative testing. The authors concluded that searching memory and using results from prior experiments are critical for hypothesis generation. The present study differed from that of Klahr and Dunbar in two primary respects: (1) the tasks performed by the participants and the types of hypotheses generated. Klahr and Dunbar’s study had one or more correct answers during hypothesis generation. Therefore, the participants most likely used convergent thinking4. Their study used a simulated lab environment to assess scientific thinking. Conversely, the scientific hypothesis generation in our study was an open discovery type without correct answers, so our participants likely used more divergent thinking during the process4. The hypothesis generation process in our study was substantially messier and more unpredictable, making it more challenging to consistently evaluate compared to the well-defined problem-solving tasks in the Klahr and Dunbar study.

Limitations and challenges

One of the main limitations of the study was that only three experienced clinical researchers participated in our study who generated 36 hypotheses. We compared the inexperienced and experienced groups regarding all the hypotheses and cognitive events used when we used each hypothesis as the analysis unit, i.e., the cognitive events used to generate a hypothesis, instead of using each participant as an analysis unit. However, we could not compare the cognitive events between the VIADS and control groups among the experienced clinical researchers. We made similar efforts to recruit inexperienced and experienced clinical researchers via comparable platforms; however, the recruitment results were considerably worse in the experienced group. This indicates that experienced clinical researchers may need different strategies to engage, or scientific hypothesis generation is not a high-priority area for experienced clinical researchers. From our experience of conducting study sessions and from interactions with and feedback from non-participants, it seems that a better hypothesis-generation process can still be beneficial to at least some experienced clinical researchers. We acknowledge that small sample sizes limit the generalizability of the study results, and different recruitment strategies will be needed to recruit experienced clinical researchers successfully for future studies.

Another limitation of the study was that the types of information could be captured via the think-aloud protocol. We acknowledge that we only captured the verbalized events during the study sessions, which is a subset of the conscious process and a small portion of the real cognitive process during scientific hypothesis generation. The nonverbal thinking process was not captured. Our coding, aggregation, and analysis are based on the captured events. In addition, the audio recordings of two participants were partial because of a technical failure. One mitigation strategy that could be used was to conduct a test recording each time for every participant, which can be particularly critical if a new device is used in the middle of the study. Moreover, log file analysis may provide additional and valuable insights into how the analysis tools (VIADS and others) were used during the hypothesis generation process.

Future work

The study has several directions for future research. First, we could explore the sequence pattern of cognitive events to furnish additional insights into the scientific hypothesis generation process. Furthermore, juxtaposing the frequencies of cognitive events with the quality evaluation results of the generated hypotheses might illuminate the potential patterns, further enriching our understanding of the process. Third, a larger scale study encompassing a large participant sample size and situated in a more natural environment would enhance the robustness and generalizability of our findings. Finally, we would like to explore how to improve the quality of hypotheses generated within a more natural use environment after we obtained a clear demonstration of VIADS to enhance the efficiency of hypothesis generation among inexperienced clinical researchers within a 2-hour window.

Conclusions

Experienced clinical researchers generate a higher percentage of valid hypotheses than inexperienced clinical researchers. The VIADS group of inexperienced clinical researchers used the fewest cognitive events with the lowest standard deviation to generate each hypothesis compared with experienced and inexperienced clinical researchers not using VIADS. This efficiency is further underscored by the VIADS group taking the least average time to generate a hypothesis. Notably, the VIADS inexperienced cohort mirrored the trend observed in experienced clinical researchers in terms of cognitive event distribution. Such findings indicate that VIADS may provide better guidance than other analytical tools during hypothesis generation. Further studies, ideally on a grander scale and in a more natural environment, could offer a deeper understanding of the process to provide evidence to improve the process. Our research provides foundational metrics on cognitive event measures during hypothesis generation in clinical research, demonstrating the viability of executing such experiments in a simulated setting and unraveling the intricacies of the hypothesis generation process through these experiments.

Acknowledgments

We want to thank all participants and expert panel sincerely for their precious time, courage, and expertise in helping us understand this critical but less-known hypothesis generation process better. This project received support from the National Library of Medicine (R15LM012941) and was funded partially by the National Institute of General Medical Sciences of the National Institutes of Health (P20 GM121342). The intellectual environment and research training resources provided by the NIH/NLM T15 SC BIDS4Health (T15LM013977) enriched this work.

Figures & Tables

References

  • 1.Supino P, Borer J. Principles of research methodology: A guide for clinical investigators. 2012.
  • 2.Gallin JI, Ognibene FP, Ognibene FP. Burlington, UNITED STATES: Elsevier Science & Technology; 2007. Principles and Practice of Clinical Research. [Google Scholar]
  • 3.Klahr D, Dunbar K. Dual Space Search During Scientific Reasoning. Cognitive Science. 1988;12(1):1–48. [Google Scholar]
  • 4.xix. New York, NY, US: Oxford University Press; 2012. The Oxford handbook of thinking and reasoning; p. 836-xix, p. [Google Scholar]
  • 5.Joseph G-M, Patel VL. Domain knowledge and hypothesis generation in diagnostic reasoning. Medical Decision Making. 1990;10:31–46. doi: 10.1177/0272989X9001000107. [DOI] [PubMed] [Google Scholar]
  • 6.Arocha J, Patel V, Patel Y. Hypothesis generation and the coordiantion of theory and evidence in novice diagnostic reasoning. Medical Decision Making. 1993;13:198–211. doi: 10.1177/0272989X9301300305. [DOI] [PubMed] [Google Scholar]
  • 7.Spangler S, Wilkins AD, Bachman BJ, Nagarajan M, Dayaram T, Haas P, et al. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, New York, USA: Association for Computing Machinery; 2014. Automated hypothesis generation based on mining scientific literature; pp. p. 1877–86. [Google Scholar]
  • 8.Kehrer J, Ladstädter F, Muigg P, Doleisch H, Steiner A, Hauser H. Hypothesis Generation in Climate Research with Interactive Visual Data Exploration. IEEE Transactions on Visualization and Computer Graphics. 2008;14(6):1579–86. doi: 10.1109/TVCG.2008.139. [DOI] [PubMed] [Google Scholar]
  • 9.Raghunandan D, Cui Z, Krishnan K, Tirfe S, Shi S, Shrestha TD, et al. Lodestar: Supporting rapid prototyping of data science workflows through data-driven analysis recommendations. Information Visualization. 2024;23(1):21–39. [Google Scholar]
  • 10.Spangler S. Accelerating discovery : mining unstructured information for hypothesis generation. 2016.
  • 11.Kitano H. Nobel Turing Challenge: creating the engine for scientific discovery. npj Systems Biology and Applications. 2021;7(1):29. doi: 10.1038/s41540-021-00189-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.New York: Cambridge University Press; 2005. The Cambridge Handbook of Thinking and Reasoning. [Google Scholar]
  • 13.Sprenger AM, Dougherty MR, Atkins SM, Franco-Watkins AM, Thomas RP, Lange N, et al. Implications of cognitive load for hypothesis generation and probability judgment. Front Psychol. 2011;2:129. doi: 10.3389/fpsyg.2011.00129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Klauer KC, Stegmaier R, Meiser T. Working Memory Involvement in Propositional and Spatial Reasoning. Thinking & Reasoning. 1997;3(1):9–47. [Google Scholar]
  • 15.Dunbar K, Fugelsang J. Causal thinking in science: How scientists and students interpret the unexpected. In: Gorman M, Kincannon A, Gooding D, Tweney R, editors. New directions in scientific and technical thinking. Mahway, NJ: Erlbaum; 2004. pp. p. 57–9. [Google Scholar]
  • 16.PATEL VL, GROEN GJ, AROCHA JF. Medical expertise as a function of task difficulty. Memory & cognition. 1990;18(4):394–406. doi: 10.3758/bf03197128. [DOI] [PubMed] [Google Scholar]
  • 17.Patel VL, Arocha JF, Zhang J. Chapter 30: Thinking and Reasoning in Medicine. In: Holyoak KJ, Morrison RG, editors. The Cambridge Handbook of Thinking and Reasoning. New York: Cambridge University Press; 2005. pp. p. 727–50. [Google Scholar]
  • 18.Patel VL, Groen GJ. The General and Specific Nature of Medical Expertise: A Critical Look. In: Ericsson A, Smith J, editors. Towards a General Theory of Expertise: Prospects and Limits. Cambridge, U.K: Cambridge University Press; 1991. pp. p. 93–125. [Google Scholar]
  • 19.Henry S, McInnes BT. Literature Based Discovery: Models, methods, and trends. J Biomed Inform. 2017;74:20–32. doi: 10.1016/j.jbi.2017.08.011. [DOI] [PubMed] [Google Scholar]
  • 20.Patel V, Groen G. Knowledge Based Solution Strategies in Medical Reasoning. Cognitive Sci. 1986;10:91–116. [Google Scholar]
  • 21.Jing X, Cimino JJ, Patel VL, Zhou YC, Shubrook JH, De Lacalle S, et al. Data-driven hypothesis generation among inexperienced clinical researchers: A comparison of secondary data analyses with visualization (VIADS) and other tools. Journal of Clinical and Translational Science. 2023;8(1):e13. doi: 10.1017/cts.2023.708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sybrandt J, Shtutman M, Safro I. Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking. Proc IEEE Int Conf Big Data. 2018;2018:1494–503. doi: 10.1109/bigdata.2018.8622637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sybrandt J, Shtutman M, Safro I. ACM; 2017. Moliere: Automatic biomedical hypothesis generation system. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Callahan A, Dumontier M, Shah NH. HyQue: evaluating hypotheses using Semantic Web technologies. Journal of Biomedical Semantics. 2011;2:NA. doi: 10.1186/2041-1480-2-S2-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jing X, Patel VL, Cimino JJ, Shubrook JH, Zhou Y, Liu C, et al. The Roles of a Secondary Data Analytics Tool and Experience in Scientific Hypothesis Generation in Clinical Research: Protocol for a Mixed Methods Study. JMIR Res Protoc. 2022;11(7):e39414. doi: 10.2196/39414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Jing X, Emerson M, Masters D, Brooks M, Buskirk J, Abukamail N, et al. A visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies (VIADS) BMC Med Inform Decis Mak. 2019;19(31) doi: 10.1186/s12911-019-0750-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jing X, Cimino JJ. A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection. Methods of Information in Medicine. 2014:53. doi: 10.3414/ME13-01-0075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Emerson M, Brooks M, Masters D, Buskirk J, Abukamail N, Liu C, et al. AMIA Annual Symposium. San Francisco2018: Nov 3-7, 2018. Improved visualization of hierarchical datasets with VIADS; p. p. 1956. [Google Scholar]
  • 29.Jing X, Cimino JJ. A complementary graphical method for reducing and analyzing large data sets: Case studies demonstrating thresholds setting and selection. Methods Inf Med. 2014;53:173–85. doi: 10.3414/ME13-01-0075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jing X, Cimino JJ. AMIA 2011; Washington DC2011. Graphical methods for reducing, visualizing and analyzing large data sets using hierarchical terminologies; p. p. 635-43. [PMC free article] [PubMed] [Google Scholar]
  • 31.Jing X, Patel VL, Cimino JJ, Shubrook JH, Zhou YC, Draghi B, et al. A Visual Analytic Tool (VIADS) to Assist the Hypothesis Generation Process in Clinical Research: Mixed Methods Usability Study. JMIR Human Factors. 2023;10:e44644. doi: 10.2196/44644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jing X, Cimino JJ, Patel VL, Zhou YC, Shubrook JH, Liu C, et al. Data-Driven Hypothesis Generation in Clinical Research: What We Learned from a Human Subject Study? Medical Research Archives. 2024;12(2) doi: 10.18103/mra.v12i2.5132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jing X, Zhou Y, Cimino J, Shubrook J, Patel V, Lacalle SD, et al. Development, validation, and usage of metrics to evaluate clinical research hypothesis quality. BMC Medical Research Methodology, under review. 2023 doi: 10.1186/s12874-025-02460-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Pruzan P. Springer International Publishing Switzerland; 2016. Research Methodology: The Aims, Practices and Ethics of Science. [Google Scholar]
  • 35.Hicks CM. Research methods for clinical therapists: Applied project design and analysis. 1999. [DOI] [PubMed]
  • 36.Foster JG, Rzhetsky A, Evans JA. Tradition and Innovation in Scientists’ Research Strategies. American Sociological Review. 2015;80(5):875–908. [Google Scholar]
  • 37.Patel V, Groen G, Patel Y. Cognitive aspects of clinical performance during patient workup: The role of medical expertise. Advances in Health Sciences Education. 1997;2:95–114. doi: 10.1023/A:1009788531273. [DOI] [PubMed] [Google Scholar]
  • 38.Kushniruk A, Patel V, Marley A. Small worlds and medical expertise: implications for medical cognition and knowledge engineering. Int J Med Inform. 1998;49:255–71. doi: 10.1016/s1386-5056(98)00044-6. [DOI] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES