Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Oct 1.
Published in final edited form as: Appl Ergon. 2010 Feb 26;41(6):812–821. doi: 10.1016/j.apergo.2010.01.009

Validation of A Trust In Medical Technology Instrument

Enid Montague 1,
PMCID: PMC2893257  NIHMSID: NIHMS177038  PMID: 20189163

Abstract

A patient’s trusting attitude towards technology used in their medical care may be a predictor of acceptance or rejection of the technology and, by extension, the physician. The aim of this study was to rigorously determine the validity of an instrument for measuring patients’ trust in medical technology. Instrument validity was established based on a framework, which included test and data evidence for validity assessment. The framework for validity assessment evaluates the instrument on content, substantive, structural, generalizability, external and consequential aspects of validity. The results of the current study show that the instrument is reliable and valid for assessing a patient’s trust in obstetric medical technology.

Keywords: Trust, Health care, Patients, Technology, Instrument Validation, Sociotechnical Systems

1. Introduction

In the medical domain, understanding the work system effects of patient-physician interpersonal trust relationships (Pearson and Raeke 2000) and how to design medical technologies for lay users (Martin, Norris et al. 2008) have been identified as important research questions. Patient safety researchers have also advocated for a sociotechnical systems approach to solving health system problems, which include patients as members of the works system (Buckle, Clarkson et al. 2006). Developing tools to collect empirical data about patients’ perceptions of their health care system is the first step towards understanding the relationship between patients’ perceptions and the functioning of the work system. Valid measurement tools are important as we begin to think of how to design better, safer and more efficient work systems that include multiple types of users (patients and care providers). This study describes the validation of a tool to measure patients’ trust in medical technology, with the future goal that this tool will be used to characterize patient physician relationships and patients’ request, acceptance, and rejection of technologies used in their care.

The development of the trust in medical technology instrument involved defining variables, modeling observations, and evaluating the measures. Work with variables involved defining constructs within a theoretical framework, identifying indicators of the construct, and mapping those indicators onto the theoretical framework. Modeling the observations involved specifying rules for converting observations into numbers, creating a mathematical model of how those numbers could be combined to create measures, and specifying a frame of reference for interpreting measures. The evaluation of measures was completed by reliability and validity analyses that ensured measures were stable across multiple contexts and consistent with the framework within which they were created.

1.1. Trust

Trust, a person’s level of belief in a person or thing, is a fundamental attribute in all relationships. A variety of trust relationships exist; trust between two or more people (interpersonal trust) (Larzelere and Huston 1980), a person’s trust with a system or institution (social trust) (Castelfranchi and Falcone 2002) and a person’s trust with a technology or device (technological trust) (Muir 1994). Humans use metrics of trust to determine which humans they form relationships with, which institutions they want to be a part of and which technologies they use.

Interpersonal trust has been operationalized as comprised of the factors predictability, reliability and dependability (Larzelere and Huston 1980), as stages of predictability, dependability and faith (Rempel, Holmes et al. 1985), and as a behavioral model (DeFuria 1996). DeFuria (1996) developed the interpersonal trust surveys from a behavioral model of interpersonal trust, which defines the behaviors that lead to increased or decreased trust between individuals (see table 1). DeFuria (1996) argues that trust situations always involve vulnerability, risk, and expectations of the other person’s trustworthy motivation and competence. The concept of interpersonal trust is explored across domains though differences exist in how the construct is measured and defined in work systems such as education (Tschannen-Moran and Hoy 2000) and health care (Pearson and Raeke 2000).

Table 1.

Trust enhancing and reducing behaviors

Trust Enhancing Behaviors Trust Reducing Behaviors
Sharing relevant information Distorting, withholding, or concealing real motives
Reducing controls Falsifying relevant information
Allowing for mutual influence Attempting to control or dominate
Clarifying mutual expectations Obscuring, distorting, or avoiding the discussion of mutual expectations
Meeting expectations Not meeting the trusting individuals expectations of performance or behavior
Attempting to evade responsibility for behavior

Table from DeFuria, G. L. (1996). Interpersonal trust surveys: Jossey-Bass.

1.2. Patient trust in physician

Patient trust in physician is a multidimensional construct that has different definitions between and within disciplines. Thom, Kravitz et al. (2002) define patient trust in physician as a set of beliefs or expectations that a care provider will perform in a certain way, while Pearson and Raeke (2000) define trust as an emotional characteristic, where patients have a comforting feeling of faith or dependence in a care provider’s intentions. Several scales have been developed to measure patients’ trust in their care-provider, but none have explored the construct as a sociotechnical construct (Kao, Green et al. 1998; Thom, Ribisl et al. 1999; Pearson and Raeke 2000;Thom, Kravitz et al. 2002; Zheng, Hall et al. 2002; Boehm 2003). Pearson and Raeke’s (2000) review paper found the factors; competence, compassion, privacy, confidentiality, reliability, dependability, and communication to be the most common factors of patient trust in physician. Satisfaction is sometimes conflated with trust as it is also considered an important measure of quality in health systems, however, Thom, Ribisl et al. (1999) found patient trust in physician to be distinct from patient satisfaction. Patient trust in physician has been explored in relation to patient outcomes and behaviors such as adhering to medical advice, malpractice litigation and seeking healthcare services. While patient trust in physician has been found to be an important predictor of patient health and quality outcomes some researchers hypothesize that changes in healthcare services and practices may be undermining the trust relationship between patients and physicians (Pearson and Raeke 2000). Other researchers argue that enhancing patient trust in physicians may be a solution to existing problems in the provision of health care (Boehm 2003). In healthcare systems the concept of interpersonal trust cannot be separated from organizational, social, and technological trust (Montague 2009). In order to understand how changes in health care service (such as technology implementation) might affect trust and how patient trust in physician may be an effective solution for health system problems a theoretical understanding of trust in health systems in needed. Therefore measures of trust are needed at the interpersonal, social and technological level.

1.3 Trust in technology

Researchers in consumer studies, psychology, engineering, and information systems have looked at trust relationships between users and technologies. Trust and technology scholarship has largely focused on trust between users and automation technologies (Lee and Moray 1994; Muir 1994; Muir and Moray 1996; Parasuraman and Riley 1997; Bisantz and Seong 2001; Hoc 2000; Lewandowsky, Mundy et al. 2000; Moray 2000; Sheridan 2002; Dzindolet, Peterson et al. 2003; Lee and See 2004) and trust between users and internet technologies (Basso, Goldberg et al. 2001; Corbitt, Thanasankit et al. 2003; Grabner-Krauter and Kaluscha 2003; McKnight, Choudhury et al. 2003; Kong and Hung 2006; McKnight and Choudhury 2006; Awad and Ragowsky 2008). This paper explores patients’ trust in medical technology and the validation of an instrument to measure patients’ trust in medical technology.

1.4 Trust in medical technology

Patient trust in medical technology may be an important factor of functional work systems; particularly as health care work systems move to higher reliance on and use of medical technologies. Understanding patients’ trust in medical technologies may also provide insight into optimized patient-physician relationships. Trusting attitudes towards in medical technologies has been explored from both provider and patient perspectives. Moffa and Stokes (1997) explored health worker trust in medical expert systems, to gain insight into the importance of work system domain in user formation of trust in technologies. Kjerulff, Pillar et al. (1992) also looked at health workers’ trust in medical technologies in their study of nurses’ technology anxiety or fear of working with medical technology. They found trust in medical technologies to be related to the department nurses worked in, and attitudes about their work, such as satisfaction, stress and interpersonal relationships.

Trusting attitudes towards medical technologies has also been researched from the patient’s perspective. Timmons, Harrison et al. (2008) conducted a study of trust in medical technologies, where their primary research questions was ‘how do lay users come to trust automatic external defibrillators’. They found that when confronted with using an unfamiliar medical technology; users’ trust in the technology is constructed through a combination of trust in the technology, people and institutions. Montague, Kleiner et al. (2009) developed an empirical model of the construct trust in medical technology for patients. They found trust in technologies in medical domains to be distinct from trust in technologies in other domains.

Trust in medical technology has also been explored as a social measure. Calnan, Montaner et al. (2005) conducted a national survey to assess public attitudes about a variety of innovative health care technologies. They found general public ambivalence about new medical technologies. However, respondents who reported consistent negative responses regarding new medical technologies were more likely to report distrusting attitudes about science, health care and care providers. Calnan, Montaner et al.’s (2005) findings provide some evidence for a sociotechnical systems understanding of the construct. Individual differences in attitudes about medical technologies have been explored as well; Groeneveld, Sonnad et al. (2006) found that people of differing ethnic groups may have differing attitudes about using innovative medical technologies. Specifically, they found that Whites reported higher degrees of acceptance of medial technology innovations than Blacks. Similar sociocultural differences have been reported in patient trust in care provider and institution, which provides support for beliefs that patient trust in medical technology may be related to other types of patient trust (Rose, Peters et al. 2004).

1.5 Patient trust

Studies of patient interpersonal relationships have alluded to a missing dimension of the construct patient trust in care provider. An observational study examined patients’ perceptions of their physician’s interpersonal manner through ratings such as satisfaction, trust, and knowledge of patient and autonomy support (Franks, Fiscella et al. 2005). The researchers found a relationship between patients’ perceptions of their physician and health status decline. Through multilevel analyses, the researchers believe that this relationship is not a physician effect, but may be the result of another confounding variable (Franks, Fiscella et al. 2005). Patient’s trust in technology may be a variable that provides additional insight into the factors that predict patient trust or distrust in their physician. Other studies have looked at patients’ assessments of their care provider and linked these ratings with changes in health status, patient satisfaction, and adherence to advice (Franks, Fiscella et al. 2005). Franks, Fiscella et al. (2005) argue that patients are likely to report being more satisfied with their care provider if they have better health status and conclude these results may not be related to care provider traits, indicating a need for more dynamic measures of physician traits. While Franks, Fiscella et al. (2005) recommend assessment of psychological and personality traits to understand patient trust in care provider, attitudes towards technology would be an insightful addition.

In a study of patient trust in care providers Fung et al. (2005) found that patients report a preference for physicians with technical qualities as opposed to interpersonal qualities, which is part of an array of research on the trust relationship between physician and patient (Pearson and Raeke 2000; Fung, Elliott et al. 2005; Tarn, Meredith et al. 2005). It is not known if characteristics of technologies used in patient care will affect a patients’ trust in medical technologies. The effects of a patient trusting or not trusting medical technologies on patient health outcomes and assessments of quality are also unknown. The validated trust in medical technology instrument, presented in this study, provides a means to explore the role of patient trust in medical technologies in patient physician relationships.

In a recent comprehensive review (Pearson and Raeke 2000), researchers describe a need for more research and ways of measuring the relationship between patients and physicians. Insights into the relationship between interpersonal trust and trust in technology in health care work systems can be explored with the instrument describes in this study and will provide an important contribution to health systems research.

2. Methods

The trust in medical technology instrument (TMT) was developed using the Rasch instrument development framework and Crocker and Algina (1986) methods as described in (Wolfe and Smith 2007). Items for the instrument were developed by the authors in a series of pilot research studies related to understanding patients’ trusting attitudes towards medical technology (Montague, Wolfe et al. 2008; Montague 2009; Montague, Kleiner et al. 2009). The steps were as follows:

  • .1. Identified that the primary purpose of these measure would be for research purposes, to assess how trust in medical technology may relate to other patient variables in health care work systems.

  • 0.2. Identified behaviors that represented the construct and defined the domain (Montague, Kleiner et al. 2009).

  • .3. Organized a set of test specifications (Montague, Wolfe et al. 2008).

  • .4. Assembled an initial pool of items (Montague, Kleiner et al. 2009).

  • 0.5. Pilot testing. Items were reviewed (and revised when necessary) (Montague and Wolfe unpublished manuscript).

  • .6. Field testing. Held preliminary item tryouts.

  • .7. Determined statistical properties of item scores and, when appropriate, eliminated items that did not meet pre-established criteria.

  • .8. Designed and conducted reliability and validity studies for the final form of the test.

  • .9. Developed guidelines for administration, scoring, and interpretation of the test scores.

The validation study presented in this manuscript describes steps one and six through nine. Steps two through five have already been described in previous publications (Montague, Wolfe et al. 2008; Montague 2009; Montague, Kleiner et al. 2009). The Crocker and Algina (1986) approach to instrument development was followed and items were developed using methods identified by Jian, Bisantz et al. (2000). The instrument validation process is based on the Messick (1995) validation framework and Rasch validation methods (Wolfe and Smith 2007).

During the validation phase of instrument development, data were collected from participants with experiences with electronic fetal monitors. To determine the validity of the TMT instrument, analyses were conducted to determine the measure’s dimensionality, reliability, fit, and rating scale.

2.1. Sample

The TMT instrument was designed in a checklist format, which required the responder to have had a specific experience with the specified medical technology to complete. Participants were all women who had given birth and used the electronic fetal monitor. Twenty mothers from each age group of 18 to 35; (i.e. twenty 18 year olds, twenty 19 year olds, twenty 20 year olds etc.) were randomly invited to complete the instrument through an email data base of women with children to total 360 invited participants. Additionally, an advertisement was placed in an online community for new mothers. One hundred and one participants completed the instrument and some provided additional qualitative responses for each item; data collection was closed after 101 responses were obtained. The average number of children for participants was 1.78. Eighty-four women reported having insurance, 16 women did not have health insurance, and 1 declined to respond to the question. The average age of the participants was 25.28, and the average years of education was 14.53. Participants identified as white or Caucasian (n=82), African, black, African American or Caribbean (n=8), Asian (n=3), Hispanic (n=1) or declined to select a racial or ethnic group (n=7). To have a representative range of childbirth experiences and levels of trust with technology, participants were recruited who used the electronic fetal monitors in a variety of contexts including home births, in hospitals, intermittently, and continuously. Participants also used a variety of primary care providers for their births including physicians, registered nurses, nurse midwives, doulas, and rotating physicians.

2.2 Measurement model

Item Response Theory (IRT), sometimes referenced as latent trait theory, is a measurement model that is an alternative to true score test theory. IRT assumes a link between a participant’s response on a test item and the construct being measured (trust in medical technology). IRT makes stronger assumptions than classical test theory and in many cases provides correspondingly stronger findings (Kline 2005). IRT provides several improvements in scaling items and people (Hambleton, Swaminathan et al. 1991) and is better at predicting item bias amongst different groups, such as gender and ethnicity (Hogan 2003). One set back of using this method is that IRT requires complicated estimations when not using the basic Rasch model (Hambleton, Swaminathan et al. 1991). IRT was used with the Rasch Rating Scale model in this study.

The Rasch Rating Scale model was used in this study, because the TMT collects Likert-type responses and the models require smaller sample sizes for validation (Wolfe and Smith 2007). As illustrated in the equation below, the model specifies that the probability that a respondent will respond to a particular item is a logistic function of the respondent’s level of the attitude (θ) and the item’s representation of the construct (δ). The probability (πnix) that a specific respondent n will rate a particular item i with a specific rating scale category x. The model includes a threshold difficulty parameter (tk) that depicts the difficulty of moving from one scoring category to another on a polytomous (multiple response option) item. It is assumed that the distance between each category threshold is constant across items within the same rating scale. The WINSTEPS (Linacre 2002) software package was used to estimate the parameters in the model, which were reported on a single linear continuum in logistic odds ratio units (logits) (Smith 2000).

πnix=expj=0xD(θnδiτk)k=0mexpj=0kD(θnδiτj)

2.3. Dimensionality

A principal component analysis was performed to evaluate the dimensionality of the instrument. The residual components’ eigenvalues and the percentage variance accounted for by each component were calculated. In congruence with Kaiser’s Criterion, only components with an eigenvalue greater than 1.00 were retained and components with four or more loadings with absolute values greater than .60 were selected as reliable components for interpretation (Kaiser 1970; Kline 2005). Items with absolute value loadings greater than .30 were considered part of the construct. Each dimension was then interpreted based on apparent similarities in the content of items loaded on that dimension. Principal components analysis and scree were performed to make sure the model’s assumption of a unidimensional latent trait were not violated.

2.4. Reliability

Person and item reliability were conducted in WINSTEPS. In WINSTEPS person reliability is the same as test reliability. Low values in person reliability indicate a tight range of person measures, or an undersized number of items; low item reliability scores indicate a narrow range of item measures, or a small sample size (Linacre 2002). High reliability (of persons or items) means that there is a high probability that persons (or items) estimated with high measures actually do have higher measures than persons (or items) estimated with low measures (Linacre 2002).

2.5. Fit

Item fit indices were calculated to confirm whether the items identify the variable targeted for measurement (trust in medical technology). Person fit indices verified whether participants responded to the items in the way they are supposed to, as depicted by the model. Item and person fit indices were calculated using WINSTEPS computer software (Linacre 2002).

First point-measure correlations (PTMEA CORR.), rpm or RPM, were examined between the observations on an item and the persons estimated level of trust (Linacre 2002). Point-measure correlations over .5 were considered good, while .3 or .4 required scrutiny of the item. In the weighted infit mean-square statistic (INFIT) MNSQ column values less than two, but close to one were examined and items above 1.5 were considered too high. Infit MNSQ are standard fit statistics based on chi-square statistics, which are more sensitive to patterns in observations. Outfit z-standardized (OUTFIT) ZSTD fit statistics are based on chi-square statistics and are more sensitive to unexpected observations by persons on items (Linacre 2002). Unweighted (OUTFIT) ZSTD is considered standard, if the absolute value is greater than two the item was flagged for further examination (Smith 2000). Items with unweighted MNSQ items under one were ignored, because that score indicated over fit, which meant the model was doing a better job at predicting than possible. Values around 1.4 were used, while values greater than 1.4 were scrutinized. When items were scrutinized, items were reassessed qualitatively for discrepancies such as the wording, format, or item position that may cause the item to function differently than other items.

2.6. Rating scale analysis

A rating scale analysis was conducted to examine the use of the rating scale categories for polytomous items. The analysis provided additional information about whether or not the structure the TMT required respondents to utilize when providing responses was employed in the manner in which it was intended (Wolfe and Smith 2007). Generalized p-values indicated the degree to which the answers for the item are difficult, relative to the total available points (Wolfe and Smith 2007). Point-polyserial, polyserial, and point-measure correlations indicated the degree to which the item scores were consistent with the total test scores while fit indices indicated the degree to which the scored responses of individual respondents were consistent with the expectations of the Rasch model (Wolfe and Smith 2007). For methodical purposes the goal was to verify that the rating scale observation conform closely to a specified model (Linacre 2002).

3. Results

3.1. Dimensionality

The principal components analysis provided evidence for multidimensionality, which lead to the creation of three subscales. Kaiser’s rule indicated 13 possible subscales, but only three possessed enough items and logic to be included. Eigenvalues were calculated, variance was accounted, and Kaiser’s criterion was used to determine the numbers of components retained in the instrument (see Table 2). The first factor’s eigenvalue is 31.17 and accounts for 39% of the variance. The eigenvalues of two or more components are well above Kaiser’s criterion; suggesting retention of three or more components and thus multidimensionality. The three largest components represent the three domain areas represented in the instrument design model; technology, provider and how the provider uses the technology.

Table 2.

Eigenvalues of all Components

Factor Eigenvalue Percent Cum Percent
1 31.1732 38.967 38.967
2 12.2743 15.343 54.309
3 5.5092 6.886 61.196
4 2.6888 3.361 64.557
5 2.4134 3.017 67.574
6 2.1051 2.631 70.205
7 1.8676 2.334 72.539
8 1.6205 2.026 74.565
9 1.4563 1.82 76.385
10 1.3461 1.683 78.068
11 1.1506 1.438 79.506
12 1.0942 1.368 80.874
13 1.0269 1.284 82.158
14 0.9483 1.185 83.343
15 0.8488 1.061 84.404

To ensure satisfaction of the unidimensional assumption of the Rasch rating scale model, item analyses were performed on subscales separately. An investigation of the item loadings of the first residual show that all of the items on the factor are positively related to trust in technology (see Table 3).

Table 3.

Factor One Eigenvalues, Item Number and Text

Loading Item # Item text
0.812036 Q1 The technology was accurate.
0.883677 Q2 The technology was trustworthy.
0.876487 Q3 The technology was reliable.
0.669783 Q4 The technology was safe.
0.785337 Q5 The technology had reliability.
0.828666 Q6 The technology was precise.
0.790656 Q8 The technology was honest.
0.820144 Q9 I trust the technology.
0.867855 Q10 I had positive feeling about the technology.
0.819439 Q11 The technology was responsible.
0.647163 Q12 The technology behaved responsibly.
0.86061 Q13 The technology was successful.
0.82544 Q14 The technology performed accurately.
0.834301 Q15 I felt secure because of the technology.
0.792787 Q16 The technology was secure.
0.418035 Q17 I was well informed about the technology.
0.74693 Q18 The technology was well informed.
0.821281 Q19 The technology gave real results.
0.849689 Q20 I felt confident that the technology was working.
0.809519 Q21 The technology gave correct results.
0.504104 Q23 I believe the technology was well researched.
0.487355 Q24 The technology was well researched.
0.836129 Q25 I had confidence in the technology
0.809389 Q26 The technology was helpful.
0.80656 Q28 Using the technology gave me security.
0.850788 Q29 The technology was effective.
0.638151 Q30 The technology made/ kept me healthy.
0.580229 Q31 I relied on the technology.

Factor two has positive loadings on items related to the health care provider (see Table 4).

Table 4.

Factor Two Eigenvalues, Item Number and Text

Loading Item # Item text
0.640448 Q32 My health care provider was accurate.
0.785753 Q33 My health care provider was trustworthy.
0.862042 Q34 My health care provider was reliable.
0.632924 Q35 My health care provider behaved in a safe manner.
0.626033 Q36 My health care provider was precise.
0.37382 Q37 I knew a lot about my health care provider.
0.820321 Q38 My health care provider was honest.
0.817634 Q39 I trust my health care provider.
0.874118 Q40 I feel good about the health care provider I had.
0.837499 Q41 I have positive feelings about my health care provider.
0.824046 Q42 My health care provider was honest.
0.807036 Q43 My health care provider was responsible.
0.801116 Q44 My health care provider behaved responsibly.
0.359269 Q45 My health care provider was successful with regards to my treatment.
0.728892 Q46 My health care provider was well informed.
0.574001 Q47 My health care provider gave real results.
0.841816 Q48 I had confidence in my health care provider.
0.618576 Q49 My health care provider gave me the correct information about my health situation.
0.258403 Q50 I researched my health care provider.
0.74181 Q51 I had confidence in my health care provider.
0.750931 Q52 My health care provider was helpful.
0.777136 Q53 My health care provider had integrity.
0.570084 Q54 My health care provider cared about me.
0.124698 Q55 My health care provider affected my life.
0.694556 Q56 My health care provider was effective.
0.324146 Q57 My health care provider made or kept me healthy.

Factor three positively loading items are related to how the provider uses the technology (see Table 5).

Table 5.

Factor Three Eigenvalues, Item Number and Text

Loading Item # Item text
0.770999 Q58 My health care provider used the medical technology accurately.
0.763641 Q59 My health care provider used the technology in a trustworthy manner.
0.608639 Q60 My health care provider made sure the technology was reliable.
0.835192 Q61 My health care provider used the technology in a safe way.
0.806665 Q62 My health care provider was knowledgeable about the technology.
0.78116 Q63 I trust how my health care provider used the technology.
0.74261 Q64 I feel good about the way my health care provider used the technology.
0.664132 Q65 I have positive feelings about how my health care provider used the technology.
0.89806 Q66 My health care provider used the technology in a responsible manner.
0.853382 Q67 My health care provider used the technology responsibly.
0.767558 Q68 My health care provider was successful in their use of the technology.
0.859124 Q69 My health care provider used the technology accurately.
0.770122 Q70 My health care provider was well informed about the medical technology.
0.539829 Q71 My health care provider used the technology to give real results.
0.880509 Q72 My health care provider was confident using the medical technology.
0.526948 Q73 The results my health care provider obtained from the technology were correct results.
0.425647 Q74 My health care provider used the medical technology to give correct results.
0.817842 Q75 My health care provider used the technology with confidence.
0.173468 Q76 My health care provider believed the technology was helpful.
0.784565 Q77 My health care provider used the technology in a caring manner.
0.670522 Q78 My health care provider used the technology effectively.
0.236036 Q79 My health care provider used the technology to make me or keep me healthy.
−0.077502 Q80 My health care provider had high reliance on the medical technology.

3.2. Fit

Point measure correlations generally remained in the expected range. 39 items were flagged for further evaluation because of mean square weighted (MS), z weighted (Zw), mean square unweighted (Msu), or z unweighted (Zu) scores above two (see Table 6). Criterion for evaluation was based on Rasch validation guidelines. Point-measure correlations over .5 are considered good, while .3 or .4 require scrutiny of the item (Wolfe and Smith 2007). Mean-square weighted and unweighted statistic (MSw and MSu) outside of the 0.5 – 1.5 range of productive of measurement were also flagged for further evaluation (Linacre 2004). Weighted and unweighted mean-square fit statistics between −2 and 2 were considered to be functioning properly (Smith 2000). Items with three or more flags should be excluded from use as indicated in Table 6.

Table 6.

Items Flagged for Further Evaluation

# Target Item rpm MSw Zw Msu Zu Decision
1 Tech The technology was accurate. 0.05 2.82 5.30 5.57 7.60 Drop/revise
3 Tech The technology was reliable. 0.80 0.50 −3.30 0.37 −2.60 Drop/ revise
8 Tech The technology was honest. 0.29 2.10 4.00 2.94 2.90 Drop
11 Tech The technology was responsible. 0.80 0.66 −2.20 0.48 −2.30 Drop
12 Tech The technology behaved responsibly. 0.80 0.59 −2.80 0.53 −2.10 Keep
14 Tech The technology performed accurately. 0.80 0.46 −3.50 0.34 −2.50 Drop/ revise
18 Tech The technology was well informed. 0.45 1.78 3.30 1.99 2.20 Drop/ revise
21 Tech The technology gave correct results. 0.81 0.53 −3.10 0.38 −2.60 Drop/ revise
23 Tech I believe the technology was well researched. 0.15 3.62 9.90 4.31 9.40 Drop/ revise
26 Tech The technology was helpful. 0.79 0.59 −2.60 0.45 −2.10 Drop/ revise
28 Tech Using the technology gave me security. 0.42 2.09 5.40 3.05 6.80 Drop/ revise
30 Tech The technology made/ kept me healthy. 0.82 0.42 −3.90 0.32 −2.70 Drop/ revise
32 Prov My health care provider was accurate. 0.54 1.81 4.30 3.52 6.70 Drop/ revise
34 Prov My health care provider was reliable. 0.58 0.56 −1.40 0.25 −1.10 Keep
35 Prov My health care provider behaved in a safe manner. 0.63 0.66 −1.10 0.19 −1.70 Keep
36 Prov My health care provider was precise. 0.57 0.77 −0.60 0.23 −1.10 Keep
38 Prov My health care provider was honest. 0.51 1.85 3.30 1.89 2.50 Drop/ revise
39 Prov I trust my health care provider. 0.58 0.46 −1.70 0.11 −1.40 Keep
40 Prov I feel good about the health care provider I had. 0.69 0.46 −2.20 0.18 −2.10 Drop/ revise
41 Prov I have positive feelings about my health care provider. 0.67 0.47 −2.10 0.18 −1.90 Drop/ revise
42 Prov My health care provider was honest. 0.66 0.62 −1.40 0.23 −1.80 Keep
45 Prov My health care provider was successful with regards to my treatment. 0.56 0.51 −1.40 0.21 −0.90 Keep
49 Prov My health care provider gave me the correct information about my health situation. 0.67 0.44 −2.20 0.14 −1.90 Drop/ revise
51 Prov I had confidence in my health care provider. 0.59 1.73 3.60 2.08 4.00 Drop/ revise
52 Prov My health care provider was helpful. 0.66 0.60 −1.40 0.23 −1.70 Keep
53 Prov My health care provider had integrity. 0.64 0.69 −1.00 0.26 −1.50 Keep
54 Prov My health care provider cared about me. 0.65 0.33 −2.40 0.15 −1.30 Drop/ revise
56 Prov My health care provider was effective. 0.40 2.20 3.60 1.87 1.80 Drop/ revise
57 Prov My health care provider made or kept me healthy. 0.65 0.71 −1.00 0.38 −1.20 Keep
58 P & T My health care provider used the medical technology accurately. 0.63 1.80 3.50 1.62 2.30 Drop/ revise
60 P & T My health care provider made sure the technology was reliable. 0.75 0.41 −2.30 0.33 −1.20 Drop/ revise
62 P & T My health care provider was knowledgeable about the technology. 0.75 0.41 −2.30 0.19 −1.70 Drop/ revise
63 P & T I trust how my health care provider used the technology. 0.74 0.33 −2.60 0.14 −1.60 Drop/ revise
64 P & T I feel good about the way my health care provider used the technology 0.77 0.57 −1.70 0.37 −1.50 Keep
65 P & T I have positive feelings about how my health care provider used the technology. 0.77 0.58 −1.80 0.33 −1.80 Keep
68 P & T My health care provider was successful in their use of the technology. 0.79 0.43 −2.40 0.26 −1.80 Drop/ revise
73 P & T The results my health care provider obtained from the technology were correct results. 0.70 0.89 −0.30 0.42 −0.90 Keep
77 P & T My health care provider used the technology in a caring manner. 0.38 2.78 4.90 3.37 3.80 Drop/ revise
80 P & T My health care provider had high reliance on the medical technology. 0.60 2.08 4.60 1.99 3.50 Drop/ revise
*

Bold indicates fit statistic that inspired further investigation.

+ Item should be kept

− Item should be dropped from subscale or revised

3.3 Rating scale analysis

Items in all four analyses, the total instrument, scale 1, scale 2, and scale 3, appear the be functioning well for the three scale points agree, neutral and disagree based on plots of category probabilities and Linacre’s rating scale guidelines (see Table 7).

Table 7.

Linacre’s Rating Scale Guidelines

Guideline Decision
At least 10 observations of each category Yes
Regular observation distribution Yes
Average measures advance monotonically with category Yes
OUTFIT mean-squares less than 2.0 Yes
Step calibrations advance Yes
Ratings imply measures and measure imply ratings Yes
Step difficulties advance by at least 1.4 logits Yes
Step difficulties advance by less than 5.0 logits Yes

The preliminary guideline is that all items orientate with the latent variable (Linacre 2004). All items employed the same rating scale and therefore cooperated to a shared latent variable. Rating scale categories for negatively-oriented items often function differently in rating scale categories (Yamaguchi 1997). A separate study generated factors of the construct trust in medical technology and found trust and distrust to be theoretical opposites, which provides more evidence that all items are orientated with the latent variable (Montague, Kleiner et al. 2009).

Guideline #1 is that at least 10 observations must occur in each category (Linacre 2004). The lowest number of observations for the total instrument and separate sub scales is 55, which is well above the 10-observation guideline, therefore meeting guideline #1 for the total instrument and subscales (see Tables 811).

Table 8.

Rating scale total instrument 1= Disagree, 2= Neutral, 3= Agree

Total
Category Count Count% Measure average Outfit Unweighted MNSQ Structure measure (t) Coherence M->Ca Coherence C-> Mb
1 441 6 −2.25 1.35 none 64% 24%
2 1165 15 11.16 0.71 −5.59 45% 44%
3 5574 71 30.63 1.32 5.59 88% 93%

Table 11.

Rating scale for scale 3 Technology 1= Disagree, 2= Neutral, 3= Agree

Scale 3
Category Count Count% Measure average Outfit Unweighted MNSQ Structure measure (t) Coherence M->Ca Coherence C-> Mb
1 55 5 −12.8 1.5 NONE 75% 43%
2 277 23 13.2 0.69 −16.66 66% 55%
3 826 69 38.66 1.15 16.66 86% 93%

Guideline #2 is that there must be regular observation distribution (Linacre 2004). It was expected that the population would be mostly trusting (Anderson and Dedrick 1990; Thom, Ribisl et al. 1999). Therefore the distribution count should have higher number of counts for 3=agree, which indicated higher trust and descend to category 1=disagree. This distribution pattern is consistent across the total instrument and all three subscales (see Tables 811).

Guideline #3 is that average measures must advance monotonically with category (Linacre 2004). Smith (2000) says, “observations in higher categories must be produced by higher measures; which means that the average measures by category, for each empirical set observations, must advance monotonically up the rating scale,” (p.22). This guideline was met by measure averages across the four analyses (see Tables 811). This means that empirically category 3 (trust) represents a higher level of the construct trust than categories two (no trust) and three (distrust).

Guideline #4 is that OUTFIT mean-squares must be less than 2.0 (Linacre 2004). All unweighted mean-squares are less than 2.0 in both the total instrument and subscales, meeting the requirement for guideline 4.

Guideline #5 is that step calibrations must advance, which is an indication of the rating scales ability to increase proportionately with the level of the construct (Linacre 2004). The "scale structure measures", also called "step calibrations", progress in all occurrences, meeting the requirements for guideline 5 (see Tables 811).

Guideline #6 is that ratings imply measures and measure imply ratings (Linacre 2004). This guideline says that the rating should imply the measure and vice versa, this is determined with the coherence statistic (see Tables 811). The computation coherence is outlined in Tables 6 as (X) M->C (Measure implies Category %) which indicates the percentage of the rating that is expected to be observed in a category and are actually observed in the category (Linacre 2004). This guideline was met and is indicated by moderate coherence statistics.

Guideline #7 and 8 states that step difficulties must advance by at least 1.4 logits and less than 5.0 logits (Linacre 2004). Scale structure measures (t) represent step difficulties and advance a distance of 1.7 logits for all categories in the rating scale which is between 1.4 and 5.0 logits (see Tables 811).

3.4. Reliability

The TMT produced high reliability across the various subscales for person (test) and item reliability ranging from .71 to .92 (see Table 12).

Table 12.

KR-20, Real and Modeled Person Reliability, and Real and Modeled Item Reliability for Scales 1–3 and Total Instrument

Scale Items winsteps (KR-20) Person RELIABILITY (real) Person RELIABILITY (modeled) item reliability (real) item reliability (modeled)
total 80 1* 0.86 0.87 0.92 0.92
1 31 1* 0.81 0.84 0.82 0.86
2 26 .96* 0.53 0.56 0.86 0.87
3 23 1 0.58 0.58 0.71 0.75

4. Discussion

Our results indicate that the trust in medical technology instrument is sufficiently valid and ready for use. Future studies will involve validation in a variety of medical domains including consumer health products. The validation studies found that trust in medical technology is in fact a multidimensional construct involving subscales of technology characteristics, provider characteristics and characteristics of the how the provider uses technology. When disseminating this instrument, subscales can be used individually or collectively, as each subscale was validated individually and as a complete instrument.

The purpose of the fit indices was to flag potentially problematic response patterns. Once a flag has been raised, it is the responsibility of the data analyst or instrument developer to seek plausible explanations for the flag. It is unwise to simply delete an item from an instrument based on a fit flag alone. These indices provide relative measures and are subject to Type I statistical errors. By scrutinizing the misfitting items or persons or the patterns of responses associated with them, valuable knowledge may be gained. Misfit might occur for persons because of guessing, familiarity with the test, carelessness, or because the user has specialized knowledge or deficiencies. Misfitting may occur when there is multidimensionality, the quality of the items is poor, the item alludes to a correct answer, or because of miskeying. Therefore, items flagged as misfitting should be evaluated and redesigned for future validation studies (Wolfe and Smith 2007). The flagged items should be used in future validation studies and reassessed for fit. In general use, items with fewer than two flags should be reworded and reassessed for fit as indicated in Table 6.

All of Linacre’s (2004) guidelines were met, which provided evidence of measure stability, measure accuracy (fit), description of the sample, and inference for the next sample (see Table 7). Additionally, the instrument produced high reliability across the various subscales for person (test) and item reliability, using both Cronbach’s alpha and the more conservative KR-20. Reliability scores indicate high internal consistency of the total instrument and subscales. A rating scale analysis was conducted to assess the appropriate rating scale for the instrument using Linacre’s framework for rating scale evaluation (Linacre 2004). Each of the guidelines was met for rating scale analysis and therefore the rating scale is appropriate for the instrument.

4. 1 Validity evidence

Messick (1995) proposed a six-component framework for validity assessment involving content, substantive, structural, generalizability, external and consequential aspects of validity (Messick 1995). Validity evidence was provided through test (Table 14) and data based evidence (Table 15).

Table 14.

Data Based Validity Evidence

Type of Evidence Aspect of Validity Validity Evidence
Data based evidence Structural
  • ✓ Demonstrating internal relationships among items (structural fidelity) or subsets of items consistent with the underlying theory (factor analysis, item difficulty hierarchies that are consistent with the construct map)

  • ✓ Demonstrating that conditional difficulties of items are equal or consistent with known influences on test performance (differential item functioning analysis)

  • ✓ Utilizing a measurement model that combines (weights) information across observations (scales) and takes into account (controls for) undesirable influences on the scores in a manner that is consistent with intended score interpretation (e.g., criterion-referenced versus norm-referenced).

Generalizability
  • ✓ Demonstrating reliability of measures from the instrument across a variety of contexts

    • ✓ Internal consistency

    • ☓ Test-retest

    • ☓ Interrater

    • ☓ Alternate form

    • ☓ Application of meta-analytic procedures to validity coefficients across a variety of measurement contexts and samples (validity generalization)

External
  • ☓ Demonstrating that predicted group differences are realized empirically.

  • ☓ Experimentation: Any theory-based comparison that yields outcomes consistent with theory.

  • ☓ Comparison of Groups: Differences between groups that theory predicts will be different on the underlying construct

  • ☓ Changes Over Time: Individuals are typically expected to change over time as a result of maturation. Observed differences such as these can also be used as evidence for construct validity.

Consequential
  • ✓ Detecting positive or negative impact on individuals (bias) (e.g., identifying substantively explainable DIF versus sources of item or test bias, determining whether the labels generated in test use result in stigmatization of some groups, expert judgment about the suitability of test content for making decisions about individuals and groups)

Table 15.

Test Based Validity Evidence

Type of Evidence Aspect of Validity Validity Evidence
Test based evidence Content
  • ✓ Documentation of purpose & uses of the instrument

  • ✓ Documentation of use of domain analysis

  • ✓ Development of test blueprint & item templates

  • ✓ Documentation of test development process

  • ✓ Expert Review (Readability, Clarity, Clarity of instructions, Fairness, Sensitivity)

Substantive
  • ✓ Development of operationalized definition & theoretical framework of the construct (including internal, external, & processing models)

  • ✓ Documentation of use of expertise

  • ☓ Verifying of use of the proposed processes by respondents (think aloud)

Structural
  • ✓ Development of internal model

  • ✓ Rationale for adoption of reference framework (criterion versus normative)

  • ✓ Rationale for developing response format (rating scale and/or distracters)

  • ✓ Rationale for adoption of measurement model

Generalizability
  • ✓ Specification of target population

  • ✓ Selection of norming population

External
  • ✓ Demonstrating that the construct theory is sufficiently developed to support investigations of external aspect evidence

Consequential
  • ☓ Detecting positive or negative impact on systems (systemic validity)

The content aspect refers to evidence of content relevance, representativeness, and technical quality of the instrument. The documentation of the purpose and use of the instrument, the domain, the development of the test blueprint, documentation of the test development process and expert review of the instrument measure content aspects of validity. These steps were conducted in studies prior to this validation study (Montague, Wolfe et al. 2008; Montague 2009; Montague, Kleiner et al. 2009)

The substantive aspect refers to the theoretical rationale for observed consistencies in responses, including process models of task performance and empirical evidence that the theoretical processes are actually engaged by respondents in the assessment tasks. Structural aspects of validity refer to the fidelity of the scoring structure to the structure of the construct domain. Activities towards assessing structural validity include developing an internal model of the construct; the model was developed from a review of the literature and evaluated with the dimensionality analysis. Evidence of structural validity was also based on a rationale for the reference and response framework and measurement model, which were evaluated by the rating scale analysis.

The generalizability aspect refers to the degree to which test score properties and interpretations generalize to and across population groups, settings, and tasks including validity generalization of test criterion relationships. This was established by specifying the target population for the instrument (patients in health care work systems) and a norming population (a diverse set of obstetric patients).

External aspects of validity include convergent and discriminate evidence from multitrait-multimethod comparisons as well as evidence of criterion relevance and applied utility. Consequential aspects of validity refers to the value implications of score interpretation as a basis for action as well as the actual and potential consequences of test use, especially regarding bias, fairness, and distributive justice. Analyzing the suitability of the test content for individuals and groups that could potentially be stigmatized by scores assessed consequential aspects of validity.

The table below illustrates efforts that were made to contribute to the various aspects of validity with test based and data based evidence; checks represent evidence of validity that were found in the current study, X’s represent aspect that should be evaluated in future validation efforts (Wolfe and Smith 2007).

Limitations of the study

Instrument validation often occurs over a period of time over several validation studies. This instrument was tested with only one patient population, as is appropriate for the assessment of validity (Kline 2005). Future studies should validate the instrument with a larger sample of participants and in different health settings. The response rate for the instrument was under 30%, which may be a sign of response bias; future research should explore the characteristics of participants who choose not to participate in trust related studies. Some aspects of validity were not assessed in this study and should be addressed in future studies to strengthen the validity assessment of the instrument. Substantive validity could be further explored by verifying the use of the proposed processes by respondents using think aloud. Consequential validity could be assessed, by detecting the positive or negative impact on the system; this would answer the question of how understanding and/ or measuring trust in medical technology will impact how systems function over time. Generalizability could be improved by testing the instrument in other contexts, using test-rest, inter-rater options, alternate forms, and applying a meta analytic procedure to validate coefficient across a variety of measurement contexts and samples, such as other health domains, and a more diverse sample of users (i.e. other cultures, backgrounds, levels of illness). External validity could also be further explored with future studies that explore hypothesized differences across groups. For example studies have shown that minorities have a higher degree of distrust in health systems and physicians. Future research can explore racial and ethnic disparities in the context of medical technology (Rose, Peters et al. 2004). Individual differences in trust in medical technology may exist in relation to positive or negative medical outcomes and the amount of time between the medical event and measuring the construct. Future studies should also explore the relationship of trust in medical technology through experimentation, any theory based comparison that yields outcomes consistent with theory. Another aspect of validity that was not explored in this study was user changes over time. Individuals are typically expected to change over time as a result of maturation; these observed differences can be used as evidence of construct validity. The present validation will help understand patient trust in medical technologies, which is a gap in the present literature. The authors hope that this work will inspire interest in the topic and that the validated instrument will support future research on the subject.

5. Conclusion

An instrument was developed to assess patients’ trusting attitudes towards medical technologies. Validity was assessed using Messick’s (1995) framework; support towards instrument validity was presented in the results in the form of both test and data based evidence for all but consequential and external aspects of validity. The instrument measures patient trust in medical technology using three sub-scales, in accordance with findings of multidimensionality of the construct. High reliability scores across the various subscales for person (test) and item reliability are evidence of instrument internal consistency.

Table 9.

Rating scale for scale 1 Technology 1= Disagree, 2= Neutral, 3= Agree

Scale 1
Category Count Count% Measure average Outfit Unweighted MNSQ Structure measure (t) Coherence M->Ca Coherence C-> Mb
1 289 10 −8.65 1.15 none 69% 39%
2 653 24 7.48 0.8 −8.81 53% 60%
3 1546 56 26.34 1.46 8.81 84% 86%

Table 10.

Rating scale for scale 2 Technology 1= Disagree, 2= Neutral, 3= Agree

Scale 2
Category Count Count% Measure average Outfit Unweighted MNSQ Structure measure (t) Coherence M->Ca Coherence C-> Mb
1 96 5 −6.6 1.15 NONE 51% 19%
2 235 13 12.94 0.56 −6.22 48% 51%
3 1514 81 35.26 1.83 6.22 92% 95%

Table 13.

Summary of Guideline Pertinence

Guideline Measure Stability Measure accuracy Description of sample Inference for next sample
Pre. Scale oriented with latent variable Essential Essential Essential Essential
At least 10 observations of each category Essential Helpful Helpful
Regular observation distribution Helpful Helpful
Average measures advance monotonically with category Helpful Essential Essential Essential
OUTFIT mean-squares less than 2.0 Helpful Essential Helpful Helpful
Step calibrations advance Helpful
Ratings imply measures and measure imply ratings Helpful
Step difficulties advance by at least 1.4 logits Helpful Helpful
Step difficulties advance by les than 5.0 logits Helpful

Acknowledgments

This work was partially supported by the Francis Research Fellowship and by grant 1UL1RR025011 from the Clinical and Translational Science Award (CTSA) program of the National Center for Research Resources, National Institutes of Health. The authors are grateful for comments about the development of the instrument provided by Dr. Brian Kleiner, Dr. Woodrow Winchester III, Dr. Tonya Smith-Jackson and Dr. Bernice Hausman. The authors also appreciate comments and suggestions from the editor Dr. Ann Bisantz and two anonymous reviewers.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Anderson LA, Dedrick RF. Development of the Trust in Physician scale: a measure to assess interpersonal trust in patient-physician relationships. Psychol Rep. 1990;67(3.2):1091–1100. doi: 10.2466/pr0.1990.67.3f.1091. [DOI] [PubMed] [Google Scholar]
  2. Awad NF, Ragowsky A. Establishing trust in electronic commerce through online word of mouth: An examination across genders. Journal of Management Information Systems. 2008;24(4):101–121. [Google Scholar]
  3. Basso A, Goldberg D, Greenspan S, Welmer D. First impressions: Emotional and cognitive factors underlying judgments of trust e-commerce. Proceedings of the 3rd ACM conference on Electronic Commerce; 2001. pp. 137–143. [Google Scholar]
  4. Bisantz AM, Seong Y. Assessment of operator trust in and utilization of automated decision-aids under different framing conditions. International Journal of Industrial Ergonomics. 2001;28(2):85–97. [Google Scholar]
  5. Boehm FH. Building trust. Family Practice News. 2003;3312(15)(11) [Google Scholar]
  6. Buckle P, Clarkson PJ, Coleman R, Ward J, Anderson J. Patient safety, systems design and ergonomics. Applied Ergonomics. 2006;37(4):491–499. doi: 10.1016/j.apergo.2006.04.016. [DOI] [PubMed] [Google Scholar]
  7. Calnan M, Montaner D, Horner R. How acceptable are innovative health-care technologies? A survey of public beliefs and attitudes in England and Wales. Social Science & Medicine. 2005;60(9):1937–1948. doi: 10.1016/j.socscimed.2004.08.058. [DOI] [PubMed] [Google Scholar]
  8. Castelfranchi C, Falcone R. Social trust: A cognitive approach. Trust and deception in virtual societies. 2002:55–90. [Google Scholar]
  9. Corbitt BJ, Thanasankit T, Yi H. Trust and e-commerce: A study of consumer perceptions. Electronic Commerce Research and Applications. 2003;2(3):203–215. [Google Scholar]
  10. Crocker L, Algina J. Introduction to classical and modern test theory. Holt: Rinehart and Winston; 1986. [Google Scholar]
  11. DeFuria GL. Interpersonal trust surveys. San Francisco: Jossey-Bass; 1996. [Google Scholar]
  12. Dzindolet MT, Peterson SA, Pomranky RA, Pierceb LG, Beckc HP. The role of trust in automation reliance. Int J Human-Computer Studies. 2003;58:697–718. [Google Scholar]
  13. Franks P, Fiscella K, Shields CG, Meldrum SC, Duberstein P, Jerant AF, Tancredi DJ, Epstein RM. Are patients' ratings of their physicians related to health outcomes. Annals of Family Medicine. 2005;3(3):229–234. doi: 10.1370/afm.267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Fung CH, Elliott MN, Hays RD, Kahn KL, Kanouse DE, McGlynn EA, et al. Patients' preferences for technical versus interpersonal quality when selecting a primary care physician. Health Serv Res. 2005;40(4):957–977. doi: 10.1111/j.1475-6773.2005.00395.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Grabner-Krauter S, Kaluscha EA. Empirical research in on-line trust: A review and critical assessment. Int J Human-Computer Studies. 2003;58:783–812. [Google Scholar]
  16. Groeneveld PW, Sonnad SS, Lee AK, Asch DA, Shea JE. Racial differences in attitudes toward innovative medical technology. Journal of General Internal Medicine. 2006;21(6):559–559. doi: 10.1111/j.1525-1497.2006.00453.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of item response theory. Sage: Newbury Park; 1991. [Google Scholar]
  18. Hoc J. From human-machine interaction to human-machine cooperation. Ergonomics. 2000;43(7):833–844. doi: 10.1080/001401300409044. [DOI] [PubMed] [Google Scholar]
  19. Hogan TP. Psychological testing: A practical introduction. John Wiley and Sons; New York: 2003. [Google Scholar]
  20. Jian JY, Bisantz AM, Drury CG. Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics. 2000;4(1):53–71. [Google Scholar]
  21. Kjerulff KH, Pillar B, Mills ME, Lanigan J. Technology anxiety as a potential mediating factor in response to medical technology. Journal of Medical Systems. 1992;16(1):7–13. doi: 10.1007/BF01674093. [DOI] [PubMed] [Google Scholar]
  22. Kaiser HF. A second generation little jiffy. Psychometrika. 1970;35:401–415. [Google Scholar]
  23. Kao AC, Green DC, Davis NA, Koplan JP, Cleary PD. Patients' trust in their physician: Effects of choice, continuity, and payment method. Journal of General Internal Medicine. 1998;13(10):681–686. doi: 10.1046/j.1525-1497.1998.00204.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kline TJB. Psychological testing: A practical approach to design and evaluation. Sage; Thousand Oaks: 2005. [Google Scholar]
  25. Kong W, Hung Y. Modeling initial and repeat online trust in b2c e-commerce. Hawaii International Conference on System Sciences: IEEE; 2006. pp. 120–120. [Google Scholar]
  26. Larzelere RE, Huston TL. The dyadic trust scale: Toward understanding interpersonal trust in close relationships. Journal of Marriage and the Family. 1980;42(3):595–604. [Google Scholar]
  27. Lee JD, Moray N. Trust, self-confidence, and operators' adaptation to automation. International Journal of Human-Computer Studies. 1994;40:153–184. [Google Scholar]
  28. Lee JD, See KA. Trust in automation: Designing for appropriate reliance. Human Factors: The Journal of the Human Factors and Ergonomics Society. 2004;46:50–80. doi: 10.1518/hfes.46.1.50_30392. [DOI] [PubMed] [Google Scholar]
  29. Lewandowsky S, Mundy M, Tan GPA. The dynamics of trust: Comparing humans to automation. Journal of Experimental Psychology: Applied. 2000;6(2):104–123. doi: 10.1037//1076-898x.6.2.104. [DOI] [PubMed] [Google Scholar]
  30. Linacre JM. Winsteps. 3.63. Chicago: 2002. [Google Scholar]
  31. Linacre JM. Introduction to Rasch measurement Maple Grove. Minnesota: 2004. Optimizing rating scale category effectiveness. [PubMed] [Google Scholar]
  32. Martin JL, Norris BJ, Murphy E, Crowe JA. Medical device development: The challenge for ergonomics. Applied Ergonomics. 2008;39(3):271–283. doi: 10.1016/j.apergo.2007.10.002. [DOI] [PubMed] [Google Scholar]
  33. McKnight DH, Choudhury V. Distrust and trust in b2c e-commerce: Do they differ?. Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the Internet; ACM New York, NY, USA. 2006. pp. 482–491. [Google Scholar]
  34. McKnight DH, Choudhury V, Kacmar C. Developing and validating trust measures for e-commerce: An integrative typology. Information Systems Research. 2003;13(3):334–359. [Google Scholar]
  35. Messick S. Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist. 1995;50(9):741–749. [Google Scholar]
  36. Moffa AJ, Stokes AF. Trust in a medical expert system: Can we generalize between domains. In: Mouloua M, Koonce JM, editors. Human-automation interaction: Research and practice. Mahwah, New Jersey: Lawrence Erlbaum Associates; 1997. [Google Scholar]
  37. Montague E. Ergonomics and Health Aspects of Work with Computers, Lecture Notes in Computer Science 5624. Springer; 2009. Patient user experience of medical technology; pp. 70–77. [Google Scholar]
  38. Montague E, Kleiner BM, Winchester WW. Empirical evaluation of the construct trust in medical technology. International Journal of Industrial Ergonomics. 2009;39(4):628–634. [Google Scholar]
  39. Montague E, Wolfe EW. Preliminary development of a trust in medical technology instrument unpublished manuscript. [Google Scholar]
  40. Montague E, Wolfe EW, Kleiner BM, Winchester WW. Using Rasch analysis to construct a trust in medical technology instrument. Institute of Objective Measurement Workshop; New York, NY: 2008. [Google Scholar]
  41. Moray N. Adaptive automation, trust, and self-confidence in fault management of time-critical tasks. Journal of Experimental Psychology: Applied. 2000;6(1):44–58. doi: 10.1037//1076-898x.6.1.44. [DOI] [PubMed] [Google Scholar]
  42. Muir B. Trust in automation: Part 1. Theoretical issues in the study and human intervention in automated systems. Ergonomics. 1994;37:1905–1923. [Google Scholar]
  43. Muir B, Moray N. Trust in automation. Part ii. Experimental studies of trust and human intervention in a process control simulation. Ergonomics. 1996;39(3):429–460. doi: 10.1080/00140139608964474. [DOI] [PubMed] [Google Scholar]
  44. Parasuraman R, Riley V. Humans and automation: Use, misuse, disuse, abuse. Human Factors: The Journal of the Human Factors and Ergonomics Society. 1997;39:230–253. [Google Scholar]
  45. Pearson SD, Raeke LH. Patients' trust in physicians: Many theories, few measure, and little data. Journal of General Internal Medicine. 2000;15:509–513. doi: 10.1046/j.1525-1497.2000.11002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Rempel JK, Holmes JG, Zanna MP. Trust in close relationships. Journal of Personality and Social Psychology. 1985;49(1):95–112. [PubMed] [Google Scholar]
  47. Rose A, Peters N, Shea JA, Armstrong K. Development and testing of the Health Care System Distrust Scale. J Gen Intern Med. 2004;19:57–63. doi: 10.1111/j.1525-1497.2004.21146.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Sheridan TB. Humans and automation. John Wiley and Sons; Santa Monica: 2002. [Google Scholar]
  49. Smith RM. Fit analysis in latent trait measurement models. Journal of Applied Measurement. 2000;1(2):199–218. [PubMed] [Google Scholar]
  50. Tarn DM, Meredith LS, Kagawa-Singer M, Matsumura S, Bito S, Oye RK, Liu H, Kahn KL, Fukubara S, Wenger NS. Trust in one's physician: The role of ethnic match, autonomy, acculturation, and religiosity among Japanese Americans. Annals of Family Medicine. 2005;3(4):339–347. doi: 10.1370/afm.289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Thom DH, Kravitz RL, Bell RA, Krupat E, Azari R. Patient trust in physician: Relationship to patient requests. Family Practice. 2002;19(5) doi: 10.1093/fampra/19.5.476. [DOI] [PubMed] [Google Scholar]
  52. Thom DH, Ribisl KM, Stewart AL, Luke DA. Further validation and reliability testing of the trust in physician scale. Med Care. 1999;37(5):510–517. doi: 10.1097/00005650-199905000-00010. [DOI] [PubMed] [Google Scholar]
  53. Timmons S, Harrison-Paul R, Crosbie B. How do lay people come to trust the Automatic External Defibrillator? Health Risk & Society. 2008;10(3):207–220. [Google Scholar]
  54. Tschannen-Moran M, Hoy WK. A multidisciplinary analysis of the nature, meaning, and measurement of trust. Review of Educational Research. 2000;70(4):547–593. [Google Scholar]
  55. Wolfe EW, Smith EV. Instrument development tools and activities for measure validation using Rasch models: Part ii validation activities. In: Smith EV, Smith RM, editors. Rasch measurement: Advanced and specialized applications. JAM Press; Maple Grove, MN: 2007. [Google Scholar]
  56. Yamaguchi J. Positive vs negative wording. Rasch Measurement Transactions. 1997;11:567. [Google Scholar]
  57. Zheng B, Hall MA, Dugan E, Kidd KE, Levine D. Development of a scale to measure patients' trust in health insurers. Health Serv Res. 2002;37(1):187–202. [PubMed] [Google Scholar]

RESOURCES