Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 Oct 11:arXiv:2310.07543v1. [Version 1]

Ordinal Characterization of Similarity Judgments

Jonathan D Victor 1, Guillermo Aguilar 1, Suniyya A Waraich 1
PMCID: PMC10593068  PMID: 37873008

Abstract

Characterizing judgments of similarity within a perceptual or semantic domain, and making inferences about the underlying structure of this domain from these judgments, has an increasingly important role in cognitive and systems neuroscience. We present a new framework for this purpose that makes very limited assumptions about how perceptual distances are converted into similarity judgments. The approach starts from a dataset of empirical judgments of relative similarities: the fraction of times that a subject chooses one of two comparison stimuli to be more similar to a reference stimulus. These empirical judgments provide Bayesian estimates of underling choice probabilities. From these estimates, we derive three indices that characterize the set of judgments, measuring consistency with a symmetric dis-similarity, consistency with an ultrametric space, and consistency with an additive tree. We illustrate this approach with example psychophysical datasets of dis-similarity judgments in several visual domains and provide code that implements the analyses.

Keywords: perceptual spaces, maximum-likelihood estimation, ultrametric space, additive trees, triads, multidimensional scaling

Introduction

Characterization of the similarities between elements of a domain of sensory or semantic information is important for many reasons. First, these similarities, and the relationships between them, (Edelman, 1998; Kemp & Tenenbaum, 2008; Tversky, 1977), reveal the cognitive structure of the domain. Similarities are functionally important as they are the substrate for learning, generalization, and categorization (Kemp & Tenenbaum, 2008; Saxe, McClelland, & Ganguli, 2019; Shepard, 1958; Zaidi et al., 2013). At a mechanistic level, the quantification of similarities provides a way to test hypotheses concerning their neural substrates (Kriegeskorte & Kievit, 2013). Thus, measuring of perceptual similarities, and using these judgments to make inferences about the geometry of the underlying perceptual spaces, plays an important role in cognitive and systems neuroscience.

The goal of this work is to present a novel approach that complements the standard strategies used for this purpose. The starting point for the present approach, in common with standard strategies, is a set of triadic similarity judgments: is stimulus x or stimulus y more similar to a reference stimulus ? To make geometric inferences from such data, one standard approach is to make use of a variant of multidimensional scaling (de Leeuw & Heiser, 1982; Knoblauch & Maloney, 2008; Maloney & Yang, 2003; Tsogo, Masson, & Bardot, 2000; J. D. Victor, Rizvi, & Conte, 2017; Waraich & Victor, 2022), i.e., to associate the stimuli with points in a space, so that the distances between the points account for the perceptual similarities. Once these points are determined, inferences can be made about the dimensionality of the space, its curvature, and its topology. A second approach, topological data analysis, makes use of the distances directly, and then invokes graph-theoretic procedures (Dabaghian, Memoli, Frank, & Carlsson, 2012; Giusti, Pastalkova, Curto, & Itskov, 2015; Guidolin, Desroches, Victor, Purpura, & Rodrigues, 2022; Singh et al., 2008; Zhou, Smith, & Sharpee, 2018) to infer these geometric features.

In applying these approaches to experimental data, one must deal with the fact that even if a forced-choice response is required, the response likely represents an underlying choice probability – and that this choice probability may depend on sensory noise, noise in how distances are mentally computed and transformed into dis-similarities, and noise in the decision process in which dis-similarities are compared. As a consequence, analysis of an experimental dataset requires, at least implicitly, substantial assumptions. Such assumptions are not always benign: a monotonic transformation of distances – which preserves binary similarity judgments – can alter the dimensionality and curvature of a multidimensional scaling model (Kruskal & Wish, 1978). Topological data analysis via persistent homologies, which only makes use of rank orders of distances, is invariant to a global monotonic transformation of distances, but makes other assumptions (for example, that this transformation is the same across the domain), and does not typically take into account a noise model.

With these considerations in mind, here we pursue an approach to make inferences from the choice probabilities themselves, as estimated from repeated triadic judgments. Our main assumption is that if, for any particular triad, stimulus x is chosen more often than stimulus y as closer to a reference r, then the difference between x and r is less than the distance between y and r. Note that we do not make any assumptions about how relative or absolute distances are transformed into choice probabilities within an individual triad, or whether this transformation is the same across the domain.

While the limited nature of these assumptions necessary limits the inferences that can be made, the approach nevertheless can characterize a set of similarity judgments in three important ways. First, we ask whether the similarity judgments satisfy a relationship that is required for a symmetric notion of distance. Then, assuming that distances are symmetric, we derive an index of whether the judgments are consistent with an ultrametric space (Semmes, 2007), i.e., a set of distances that derives from a hierarchical representation. Finally, we index the extent to which the judgments are consistent with an additive tree, (Sattath & Tversky, 1977), a generalization of an ultrametric space. Each of these indices are graded, and can be viewed as quantifying the global characteristics of a set of similarity judgments, without the need to model how choice probabilities, distances, noise, and decision processes are related.

We illustrate the approach with several sample psychophysical datasets, and provide code to carry out these computations.

Theory

Overview and Preliminaries

Our goal is to develop indices that characterize a dataset of triadic similarity judgments, in a way that provides insight into the structure of the underlying perceptual space. Our central assumption is that, within a triadic judgment, the probability that a participant chooses one pair of stimuli as more similar than an alternative is monotonically related to the similarity. Typical datasets include large numbers of similarity judgments of overlapping triads, and the relationship between these judgments contains information about the underlying perceptual space. We show how this information can be accessed, without further making assumptions about the specifics of the monotonic relationship between choice probability and (dis)-similarity, whether it is constant throughout the space, or the decision process itself.

We focus on paradigms in which the basic unit of data collection and analysis is a triad (r;x,y), consisting of a stimulus r, designated as the reference, and two other stimuli, x and y designated as the comparison stimuli. The participant is asked to decide, in a forced-choice response, which of the two comparison stimuli is more similar to the reference. We consider this to be a probabilistic judgment, and denote the underlying probability that a participant judges x as more similar to r than y is to r as R(r;x,y).

The underlying probability R(r;x,y) is unknown, and must be estimated from the trials in which the triad (r;x,y) is presented. We denote the number of such trials by N(r;x,y) and use C(r;x,y) to denote the number of trials in which the participant judges x as more similar to r than y is to r. These provide a naïve estimate Robs(r;x,y) of the choice probability:

Robs(r;x,y)=C(r;x,y)N(r;x,y). (1.1)

We assume that the experimental procedure guarantees that the two comparison stimuli of a triad are treated identically. (One way to guarantee this is to randomly swap or balance the positions of x and y across trials.) With this assumption,

C(r;x,y)+C(r;y,x)=N(r;x,y)=N(r;y,x), (1.2)
Robs(r;x,y)+Robs(r;y,x)=1, (1.3)

and

R(r;x,y)+R(r;y,x)=1. (1.4)

A triadic judgment is the result of a two-step process: first, estimation of the dis-similarity between the reference and each of the comparison stimuli, and second, comparison of these dis-similarities. We denote the dis-similarity of a comparison stimulus z to a reference stimulus r by D(r,z). Our central assumption is that a participant is more likely to judge that x is more similar to r than y is to r if, and only if, D(r,x)<D(r,y). That is,

R(r;x,y)>1/2D(r,x)<D(r,y). (1.5)

An immediate consequence is that a choice probability of exactly 1/2 only occurs when dis-similarities are exactly equal: combining (1.4) with (1.5) yields

R(r;x,y)=1/2D(r,x)=D(r,y), (1.6)

We also assume that dis-similarities are non-negative, and that a dis-similarity of zero only occurs for stimuli that are identical:

D(x,y)0 and D(x,y)=0x=y. (1.7)

The central assumption embodied by eq. (1.5)- that choice probabilities reflect rank order of dis-similarities – stops short of making a more quantitative assumption about the relationship between the choice probability and perceived dis-similarities. Specifically, we only make use of the sign of the comparison- for example, that an alligator and toothpaste are more dis-similar than an alligator and a panda. We do not attempt to infer the size of this difference, in absolute terms, relative to an internal noise, or relative to the dis-similarity of another pair of stimuli that was not explicitly compared within a triad.

The present analysis, which applies most directly to a paradigm in which each trial is devoted to judgment of a single triad, is applicable to other paradigms in which individual trials yield judgments about more than one triad, provided that each judgment can be considered to be independent of the context in which it is made. For example, in the “odd one out” paradigm (also known as the “oddity task” (Kingdom & Prins, 2016)), three stimuli are presented and the participant is asked to choose which one is the outlier. Here, a selection of a stimulus xj out of a triplet xj,xk,xl can be interpreted as a judgment that Dxk,xj>Dxk,xl and also that Dxl,xj>Dxl,xk, and thus contributes to estimates of choice probabilities for two triads, xk;xj,xl and xl;xj,xk. The analysis is also applicable to paradigms in which the participant is asked to rank m comparison stimuli x1,,xm in order of similarity to a reference stimulus r (Waraich & Victor, 2022). The ranking obtained on each trial then contributes to estimation of choice probabilities for m2 triads r;xk,xl, one for each pair of comparison stimuli.

We do not require that the experiment explore all triads (in an experiment with M stimuli, there are MM-12=12M(M-1)(M-2) distinct triads ), but the incisiveness of the approach will naturally improve as more triads are explored and as each triad is presented more often, so that Robs(r;x,y) is a better estimator of R(r;x,y).

Assessing symmetry

The first index we develop tests the extent to which the dis-similarities underlying the perceptual judgments are consistent with a symmetric distance, i.e., a distance in which the reference and comparison stimuli are treated alike. To do this, we first identify necessary conditions on the dis-similarities D(x,y) required by this symmetry. This yields a set of inequalities that the underlying choice probabilities R(r;x,y) must satisfy. Then, we take a Bayesian approach: given the observed data Robs(r;x,y), which are only estimates of the choice probabilities R(r;x,y), we determine the likelihood that these choice probabilities R(r;x,y) are consistent with the requisite inequalities.

(2.3). A necessary condition for symmetry

To derive conditions on the choice probabilities that are necessary for symmetric dis-similarities, we first note that if the dis-similarity is symmetric, then at least one of the three inequalities

D(x,y)<D(x,z)D(z,x)<D(z,y)D(y,z)<D(y,x)} (2.1)

must be false. For if all of the inequalities (2.1) held, and the dis-similarity is symmetric, alternate application of one of (2.1) and the assumption of symmetry would lead to a contradiction

D(x,y)<D(x,z)=D(z,x)<D(z,y)=D(y,z)<D(y,x), (2.2)

as a quantity cannot be less than itself.

To translate the condition that at least one of (2.1) must be false into a condition on the choice probabilities, we use the central hypothesis, that dis-similarities and choice probabilities are monotonically related within a triad (eq. (1.5)). If at least one of (2.1) must be false, then at least one of the inequalities

R(x;y,z)>1/2R(z;x,y)>1/2R(y;z,x)>1/2} (2.3)

must consequently be false. Similarly (reversing all the inequalities in (2.1)), at one of the inequalities

R(x;y,z)<1/2R(z;x,y)<1/2R(y;z,x)<1/2} (2.4)

must be false as well. A slightly stronger statement is that at least one of (2.3) and at least one of (2.4) must be false, even when two of the three inequalities in either set are replaced by an inclusive inequality. (These borderline cases, while not crucial here, will be important for testing the ultrametric property below.)

These conditions can be summarized as follows: if at least one of the triplet of choice probabilities R(x;y,z),R(z;x,y), and R(y;z,x) is strictly greater than 1/2, then at least one of them must also be strictly less than 1/2, and vice-versa. Put another way: the triplet of choice probabilities R(x;y,z),R(z;x,y), and R(y;z,x) can only reflect a symmetric dis-similarity if the triplet of values lies in a specific subset Ssym of [0,1]3. Other than boundary points, Ssym consists of the cube [0,1]3 from which two smaller cubes, 12,13 (eq. (2.3)) and 0,123 (eq. (2.4)), are removed.

Likelihood ratio calculation and an index

We now use the observed data to determine the likelihood that, given observed responses for a triplet of triads (x;y,z),(z;x,y), and (y;z,x), the corresponding triplet of choice probabilities lies in Ssym, vs. its complement [0,1]3Ssym. We use a Bayesian approach: the a posteriori likelihood of a triplet of choice probabilities R(x;y,z),R(z;x,y), and R(y;z,x) is proportional to the product of their prior probabilities, and the probability that they will lead to the observed responses. We then integrate this product over the space of choice probabilities in which, for each triplet, at least one of (2.3) and at least one of (2.4) is false. This space is a product of the domains Ssym for each triplet, which we denote by Ωsym. We compare this integral with an integral of the a posteriori likelihood over all possible choice probabilities, which we denote Ω, to assess how much of this total likelihood is consistent with symmetry. To carry out these integrations, we exploit the fact that the constraints that define Ωsym are grouped into triplets, and the choice probabilities in each triplet are disjoint. Because of this disjointness, the integral over Ωsym factors into a product of integrals over multiple copies of the domain Ssym, with one copy for each triplet.

To begin the Bayesian approach, we choose a product of Dirichlet distributions (Ferguson, 1973) of identical shape as the prior for the set of choice probabilities. This choice is both analytically convenient and practical for real data (see Results), but it is not essential to the approach. As each probability is univariate, the Dirichlet prior reduces to a beta function, so our prior is that each choice probability is independently distributed according to

pa,b(R)1Ba,bRa-1(1-R)b-1. (2.5)

Here, B(a,b) is the beta-function, defined in the standard fashion by

B(a,b)01ua1(1u)b1du=Γ(a)Γ(b)Γ(a+b). (2.6)

Since R(r;x,y)=1-R(r;y,x), the prior must be symmetric, so we can take a=b in (2.5):

pa(R)1B(a,a)Ra-1(1-R)a-1, (2.7)

We determine the Dirichlet parameter a by maximizing the likelihood of the observed set of choice probabilities, assuming that the individual responses for the i th triad (r;x,y) are independently drawn from a Bernoulli distribution with parameter Ri=R(r;x,y). Given Ri, the probability that the subject reports D(r,x)<D(r,y) in Ci of Ni presentations is

p(CiRi,Ni)=(NiCi)RiCi(1Ri)NiCi. (2.8)

Integrating over the prior (2.7) for Ri yields the probability of observing Ci reports of D(r,x)<D(r,y) in Ni presentations, given the Dirichlet parameter :

p(CiNi,a)=(NiCi)01(1B(a,a)Ria1(1Ri)a1)RiCi(1Ri)NiCidRi=(NiCi)B(a+Ci,a+NiCi)B(a,a). (2.9)

Making use of the independence of each triad yields the overall log-likelihood:

LL(a)=log(ip(CiNi,a))=logK+ilogB(a+Ci,a+NiCi)B(a,a), (2.10)

where K is a combinatorial factor independent of a, and the sum ranges over all triads. The value of a that maximizes (2.10), along with (2.7), defines the prior for the set of choice probabilities:

P(R)=i(1B(a,a)Ria1(1Ri)a1), (2.11)

where R denotes the set of choice probabilities for all triads.

Via Bayes rule, this prior determines the posterior likelihood of a set of choice probabilities:

p(R|C,N)=p(C|R,N)P(R)p(C,N). (2.12)

where C denotes the responses Ci to each of the triads, N denotes the number of times that each was presented, and R denotes the underlying choice probabilities.

The key step is to compare the likelihood that the observations C result from underlying choice probabilities within the subset Ωsym in which the requisite inequalities hold, to the likelihood that the observations result from choice probabilities within the entire space of choice probabilities, Ω. We denote these quantities by

Lsym(C,N)=Ωsymp(RC,N)dR. (2.13)

and

L(C,N)=Ωp(RC,N)dR, (2.14)

and their ratio by LRsym(C,N).

To calculate these integrals, we make use of the fact that Ωsym is defined by disjoint sets of inequalities ((2.3) and (2.4)), one for each triplet. (Note that a “triplet” refers equally to a triple of stimuli and the three triadic judgments among them, independent of order. For each triplet of stimuli {x,y,z}, there are three triads (x;y,z),(y;z,x), and (z;x,y), and vice-versa.) Lsym(C,N) is therefore a product of integrals, each over a triplet of choice probabilities:

LRsym(C,N)=Lsym(C,N)L(C,N)=Ωsymp(RC,N)dRΩp(RC,N)dR=Ωsymp(CR,N)P(R)dRΩp(CR,N)P(R)dR=Ttrip(Ssym(T)iT(RiCi(1Ri)NiCipa(Ri))dRT)k01(RkCk(1Rk)NkCkpa(Rk))dRk (2.15)

where T ranges over the set of triplets trip,iT denotes the three triads in the triplet T,RT denotes the vector of choice probabilities for these triads, and Ssym(T) is the domain in which at least one of (2.3) and at least one of (2.4) is false for the three components of T :

Ssym=[0,1]3([12,1]3[0,12]3) (2.16)

Because of (2.16), each factor in the numerator of (2.15) is an integral over the cube [0,1]3, from which 12,13 or 0,123 has been excluded:

Ssym(T)iT(RiCi(1Ri)NiCipa(Ri))dRT=[0,1]3iT(RiCi(1Ri)NiCipa(Ri))dRT[12,1]3iT(RiCi(1Ri)NiCipa(Ri))dRT[0,12]3iT(RiCi(1Ri)NiCipa(Ri))dRT, (2.17)

With the Dirichlet prior (2.7), each factor can be written in terms of incomplete beta functions:

vwRC(1R)NCpa(R)dR=1B(a,a)vwRC+a1(1R)NC+a1dR=B(w;a+C,a+NC)B(v;a+C,a+NC)B(a,a), (2.18)

where

B(w;a,b)=0wua1(1u)b1du. (2.19)

Each factor in the denominator of (2.15) can be expressed in terms of beta functions:

01(RkCk(1Rk)NkCkpa(Rk))dRk=1B(a,a)01(RkCk(1Rk)NkCkRka1(1Rk)a1)dRk=1B(a,a)01(RkCk+a1(1Rk)NkCk+a1)dRk=B(a+Ck,a+NkCk)B(a,a). (2.20)

Thus, the likelihood ratio in (2.15) reduces to

LRsym(C,N)=TT(1iT(1B(12;a+Ci,a+NiCi)B(a+Ci,a+NiCk))iT(B(12;a+Ci,a+NiCi)B(a+Ci,a+NiCi))). (2.21)

For numerical reasons, many software packages (e.g., MATLAB) provide the normalized beta-function

Bnorm(w;a,b)B(w;a,b)B(a,b)=1B(a,b)0wua1(1u)b1du, (2.22)

which recasts the result as

log LRsym(C,N)TTlog(1iT(1Bnorm(12;a+Ci,a+NiCi))iT(Bnorm(12;a+Ci,a+NiCi))). (2.23)

In sum, (2.23) tests consistency of an experimental dataset with a symmetric dis-similarity by comparing the mass of the posterior distribution of choice probabilities contained within the region Ωsym consistent with the conditions (2.3) and (2.4), to the total mass. Each triplet of triadic judgments contributes an independent additive term to the log of this ratio. We therefore normalize by the number of triplets, leading to an index that quantifies the consistency of the observations with a symmetric dissimilarity:

Isym(C,N)=1#(trip)log LRsym(C,N) (2.24)

Values close to zero indicate that nearly all of the posterior distribution of choice probabilities lie within Ωsym; progressively more negative values indicate that the posterior shifts into its complement ΩΩsym in which symmetry is violated.

A useful benchmark is that in the absence of any data (i.e., C=N=0), each of the normalized beta-functions has a value of Bnorm 12;a,a=12, so

Isym(0,0)=1#(trip)logT trip (1(112)3(12)3)=log340.2877, (2.25)

independent of the Dirichlet parameter a. Thus, values of Isym(C,N) greater than log34 are more consistent with symmetry than an index derived from a large number of choice probabilities drawn randomly from the prior. Also note that deviations from this a priori value can only be driven by triplets in which there are observations for at least two of the triads. This follows from the fact that Bnorm 12;a,a=12, so that if only one triad kT has a nonzero number of observations,

1iT(1Bnorm(12;a+Ci,a+NiCi))iT(Bnorm(12;a+Ci,a+NiCi))=11212(1Bnorm(12;a+Ck,a+NkCk))1212Bnorm(12;a+Ck,a+NkCk)=34. (2.26)

This makes intuitive sense: we can only make inferences about the structure of the dis-similarity judgments if there is experimental data about more than one triad within a triplet. With data about only one triad, knowing the sign of the comparison is useless since this sign is arbitrarily determined by how the triad is labeled, i.e., (r;x,y) vs. (r;y,x).

Finally, we note that this analysis focuses on a condition that is necessary for symmetry, but is not sufficient. The chain of inequalities of eq. (2.1) the simplest of a series of necessary conditions: more generally, for any n-cycle a1,a2,,an, there is a set of inequalities

D(a1,a2)<D(a1,an)D(an,a1)<D(an,an1)D(a2,a3)<D(a2,a1)} (2.27)

for which at least one must be false. Otherwise (generalizing (2.2)), there would be a contradiction:

D(a1,a2)<D(a1,an)=D(an,a1)<D(an,an-1)=D(an-1,an)<=D(a2,a3)<D(a2,a1), (2.28)

Hence, at least one of

R(a1;a2,an)>1/2R(an;a1,an1)>1/2R(a2;a3,a1)>1/2} (2.29)

must be false for a symmetric dis-similarity. It is possible to construct scenarios in which this criterion is violated, but the triplet criteria (2.27) hold.

These conditions can be analyzed in a manner analogous to the triplet conditions above, but we do not pursue this analysis here: these more elaborate conditions exclude progressively smaller portions of the choice probability space, as they rely on a conjunction of progressively more inequalities. These additional conditions are also not independent of each other, since (for n4), the triads in (2.29) that correspond to different n-cycles may be partially overlapping with each other. Such overlaps prevent the factorization that facilitated exact evaluation of the likelihood ratio integrals.

Assessing ultrametric structure

The motivation for the next index begins with the observation that consistency with a symmetric dis-similarity guarantees consistency with a metric-space structure (Appendix 1). Since symmetric dis-similarities guarantee consistency with a metric-space structure, it is therefore natural to ask whether the dis-similarities have further properties consistent with specific kinds of metric spaces.

Ultrametric spaces (Semmes, 2007) are one important such kind, as they abstract the notion of a hierarchical organization. Points in an ultrametric space correspond to the terminal nodes of a tree, and the distance between two points corresponds to the height of their first common ancestor. Formally, a distance d is said to satisfy the ultrametric inequality if, for any three points x,y, and z,

d(x,y)max(d(x,z),d(y,z)), (3.1)

a condition that implies the triangle inequality (Appendix 1). Essentially, (3.1) states that any triangle is isosceles, with the two equal sides no shorter than the third.

Necessary and sufficient conditions

To determine the extent to which a set of responses is consistent with the ultrametric inequality, we first restate (3.1) in terms of dis-similarities and distances:

D(x,y)max(D(x,z),D(y,z)). (3.2)

Since (Appendix 1, eq. (6.6)) dis-similarities can always be transformed into distances via a monotonic transformation, consistency of the dis-similarity structure with (3.2) is equivalent to consistency with distances that satisfy the ultrametric property (3.1).

We now recast (3.2) in terms of choice probabilities. Since the conditions need to apply to the three points x,y, and z taken in any order, (3.2) means that among D(x,y),D(y,z) and D(z,x), two must be equal and no less than the third. Writing R1R(x;y,z),R2R(y;z,x)R3R(z;x,y), this means that at least one of the following must hold:

D(x,y)D(z,x)=D(y,z)R112,R212,R3=12D(y,z)D(x,y)=D(z,x)R1=12,R212,R312D(z,x)D(y,z)=D(x,y)R112,R2=12,R312. (3.3)

We denote the region of R1,R2,R3-space defined by (3.3) as Sumi. Just as we had measured consistency with symmetry by determining the fraction of the posterior distribution of choice probabilities that lies in Ssym, here we seek to measure consistency with the ultrametric condition by determining to what extent the posterior distribution of choice probabilities lies in Sumi.

Likelihood ratio calculation and an index

A direct application of the machinery developed in the previous section runs into an immediate difficulty: the conditions (3.3) are only satisfied on a set of measure zero, when at least one of the Ri are exactly equal to 12. So a Bayesian analysis that begins with a Dirichlet prior will always lead to a likelihood ratio of zero, independent of the data: because the Dirichlet prior is continuous, any posterior derived from via Bayes rule will never have a discrete mass at 12, as would be required to satisfy (3.3).

It is nevertheless possible to capture the spirit of ultrametric behavior in a rigorous way, and at the same time, address a way in which the Dirichlet prior may be unrealistic. To do this, we recognize that for some triads, it may be appropriate to model the underlying choice probability as exactly 12, corresponding to stimuli for which there is no basis for comparison: is a toothbrush or a mountain more similar to an orange? But we don’t know, a priori, how many of the triads have this property. To take this into account, we generalize the prior for each choice probability to be a sum of two components: one component is the Dirichlet prior used above (2.7), normalized to 1-h; the second component is a point mass at 12, normalized to h :

pa,h(R)(1h)pa(R)+hδ(R12)=1hB(a,a)Ra1(1R)a1+hδ(R12). (3.4)

With this prior, we can then determine the likelihood ratio as a function of h. For small values of h, the likelihood ratio will be proportional to h, since the mass in the prior at 12 is proportional to h. The proportionality as h0 thus serves as an index of consistency with the ultrametric property. An alternative approach (not taken here) is that if the experimental dataset suggest that a prior pa,h(R) with h>0 is a substantially better fit to the distribution of choice probabilities than pa,0(R), this prior can be used directly to calculate a likelihood ratio, and the best-fitting value of h then provides an additional descriptor of the dataset.

To implement this strategy, we write the likelihood ratio as

LRumi(C,N;h)=Uumi(C,N;h)Usym(C,N;h). (3.5)

where the numerator is a likelihood equal to the integral over choice probabilities consistent with the ultrametric inequality (and a symmetric dis-similarity), and the denominator is a likelihood equal to the integral over choice probabilities consistent with a symmetric dis-similarity but without the ultrametric constraint. As above, because the triads form independent triplets, both numerator and denominator can be factored into a product of terms UumiCT,NT;h or UsymCT,NT;h, one for each triplet Ttrip. These terms are of the form

U(CT,NT;h)=[0,1]kV(sgn(R112),,sgn(Rk12))R1C1(1R1)N1C1RkCk(1Rk)NkCkpa,h(R1)pa,h(Rk)dR1dRk, (3.6)

where Vσ1,,σk, which is 0 or 1, defines the space of choice probabilities that contributes to the integral for the triads in T. Since consistency of choice probabilities with the ultrametric condition or symmetry depends only on their rank order, V only depends on whether the choice probabilities are less than, equal to, or greater than 12. In (3.6), k=3 (for the three triads within a triplet); the analysis of additive trees will require need k=6.

For the numerator of (3.5), we incorporate the ultrametric conditions (3.3) into V=Vumi :

Vumi(+1,1,0)=1, corresponding to R1>12,R2<12,R3=12Vumi(0,+1,1)=1, corresponding to R1=12,R212,R312Vumi(1,0,+1)=1, corresponding to R112,R2=12,R312Vumi(0,0,0)=1, corresponding to R1=12,R2=12,R3=12. (3.7)

All other values of Vumi(σ) are zero, since either they don’t correspond to any of the conditions (3.3), or to exactly two of those conditions. This latter is impossible, as it would require two equalities and one strict inequality among the ‘s.

For the denominator of (3.5), we incorporate the symmetry conditions of the previous section into V=Vsym, adding an explicit consideration of the borderline cases. Thus, Vsym=1 for the following arguments: (i) When no arguments are zero, both signs must be represented among the arguments. This corresponds to the requirement that at least one of (2.3) and at least one of (2.4) are false. (ii) When one argument is zero, for all arguments at which Vumi(σ)=1 in (3.7) (corresponding to an isosceles triangle, with the equal sides larger than the third), and also all of the values for which Vumi(-σ)=1 in (3.7) (corresponding to an isosceles triangle, with the equal sides smaller than the third). (iii) Vsym(0,0,0)=1 (corresponding to an equilateral triangle):

(i)Vsym(±1,±1,1)=Vsym(±1,1,±1)=Vsym(1,±1,±1)=1(ii)Vsym(±1,1,0)=Vsym(0,±1,1)=Vsym(1,0,±1)=1(iii)Vsym(0,0,0)=1. (3.8)

Equivalently, Vsym is nonzero when, and only when, either (a) arguments include both positive and negative signs, or (b) all arguments are zero.

To establish the behavior of the likelihood ratio (3.5) as h0, we use (3.4) to isolate the dependence of integrals (3.6) on h. This is a polynomial:

U(CT,NT;h)=σhZ(σ)(1h)kZ(σ)V(σ1,,σk)W(σ1;C1,N1)W(σk;Ck,Nk), (3.9)

where the sum is over all 3k assignments of the elements of σ=σ1,,σk to {-1,0,+1},Z(σ) is the number of entries in σ that are equal to zero (each such entry incurring a factor of h), and W(σ;C,N) is the integral of the prior (3.4), weighted by the experimental data, over one segment of the domain:

W(σ;C,N)={012RC(1R)NCpa(R)dR,σ=112ε12+εRC(1R)NCδ(R12)dR,σ=0,121RC(1R)NCpa(R)dR,σ=+1 (3.10)

These evaluate to

W(σ;C,N)={1B(a,a)B(12;a+C,a+NC),σ=112N,σ=01B(a,a)(1B(12;a+C,a+NC)),σ=+1, (3.11)

Consequently,

U(CT,NT;0)=Z(σ)=0V(σ1,,σk)W(σ1;C1,N1)W(σk;Ck,Nk), (3.12)
ddhU(CT,NT;h)=σ(Z(σ)hZ(σ)1(1h)kZ(σ)(kZ(σ))hZ(σ)(1h)kZ(σ)1)V(σ1,,σk)W(σ1;C1,N1)W(σk;Ck,Nk), (3.13)

and

ddhU(CT,NT;0)=Z(σ)=1V(σ1,,σk)W(σ1;C1,N1)W(σk;Ck,Nk)+kZ(σ)=0V(σ1,,σk)W(σ1;C1,N1)W(σk;Ck,Nk). (3.14)

For the numerator of (3.5), UumiCT,NT;0=0, since the nonzero values of Vumi(σ) all have Z(σ)1; thus, the small- h behavior is proportional to (3.14). For the denominator of (3.5), Usym CT,NT;0 is nonzero since Vsym (σ)=1 for several triplets of nonzero arguments (those that include positive and negative signs). Thus, for small h, the likelihood ratio (3.5) is proportional to h-- and this proportionality tells us to what extent adding a small amount of mass to the prior at R=12 captures triplets of choice probabilities that are consistent with the ultrametric property.

This analysis motivates an index of the extent to which a set of observations supports a model in which dis-similarities are consistent with an ultrametric model:

Iumi(C,N)=1#(trip)limh0(log LRumi(C,N;h))log h, (3.15)

As is the case with the index of symmetry Isym (eq. (2.24)), a useful benchmark is the a priori value, Iumi(0,0). From (3.11),

W(σ;0,0)={12,σ=±11,σ=0. (3.16)

There are three nonzero contributors to Vumi with Z(σ)=1 (see (3.7)), so this yields

Uumi(0,0;h)=322h+O(h2) (3.17)

from (3.14). There are six nonzero contributors to Vsym with Z(σ)=0 (any set of arguments that includes both signs), so (3.12) and (3.16) yield

Usym(0,0;0)=623. (3.18)

Since each of the triplets contributes an equal factor to the likelihood ratio,

Iumi(C,N)=limh0(log(34h+O(h2)34)log h)=0. (3.19)

In sum, the index Iumi(C,N) (eq. (3.15)) evaluates whether an experimental set of dis-similarity responses is consistent with an ultrametric model, and does so in a way that recognizes the intrinsic limitation that experimental data can never show that a choice probability is exactly 12. If the index is greater than 0, the observed data are more consistent with an ultrametric model than a set of unstructured choice probabilities; values less than 0 indicate progressively greater deviations from an ultrametric model.

Assessing additive tree structure

Like the ultrametric model, the additive similarity tree model (Sattath & Tversky, 1977) is metric space model that places constraints on the properties of the distance, but these constraints are less-restrictive than the constraints of the ultrametric model (Appendix 2). In this model, here referred to as “addtree,” the distance between two points is determined by a graph that has a tree structure, in which each link has a specified nonzero weight. The distance between two points is given by the total weight of the path that connects the points. Because of the requirement that the graph is a tree structure, there are no loops – and this places constraints on the inter-relationships of the distances.

Here, we determine the extent to which the dis-similarities implied by a set of triadic judgments can be monotonically transformed into the distances in an addtree model.

The starting point is a necessary and sufficient condition for distances in a metric space to be consistent with an addtree structure (Buneman, 1974; Dobson, 1974; Sattath & Tversky, 1977). This condition, known as the “four-point condition,” is that given any four points u,v,w, and x,

None of the three quantities{d(u,v)+d(w,x)d(u,w)+d(v,x)d(u,x)+d(v,w)}is strictly greater than the other two. (4.1)

Put another way, of the three pairwise sums in eq. (4.1), two must be equal, and larger than the third. Appendix 2 shows that this condition is weaker than the ultrametric inequality and stronger than the triangle inequality, and that a one-dimensional arrangement of points is always consistent with an addtree model.

Since the four-point condition is based on adding distances, we cannot apply it directly to dis-similarities – as distances are linked to dis-similarity via an unknown monotonic function. Instead, we frame a condition on the dis-similarities that is necessary for the four-point inequality to hold.

Necessary conditions

Since the four-point condition asserts that, among the three ways of pairing four points, none can result in a total distance that is strictly greater than the other two, it is forced to fail if the three pairwise-summed distances are distinct. If this is the case, we can relabel the points so that

d(u,v)+d(w,x)>d(u,w)+d(v,x)>d(u,x)+d(v,w). (4.2)

If (4.2) holds, (4.1) cannot. So failures of at least one of the inequalities in (4.2) is necessary for the addtree model, and consequently, inequalities among dis-similarities that guarantee the inequalities among the distances in (4.2) rule out the addtree model.

For example, the inequalities in (4.2) are forced to hold if the three first terms and the three second terms in each of the its sums are in descending order:

d(u,v)>d(u,w)>d(u,x) and d(w,x)>d(v,x)>d(v,w). (4.3)

Since the distances are monotonically related to the dis-similarities, (4.3) is equivalent to

D(u,v)>D(u,w)>D(u,x) and D(w,x)>D(v,x)>D(v,w). (4.4)

Thus, (4.4), which does not rely on adding dis-similarities or distances, suffices to rule out the addtree model.

It is useful to think of the two parts of (4.4) geometrically, with the four points u,v,w,x arranged in a tetrahedron (Figure 1). The first part of (4.4) compares dis-similarities that pair one vertex u with the other three vertices; we call this set of dis-similarities a “tripod”. The second part of (4.4) compares the dis-similarities of the edges of the triangle on the face opposite to u; we call these dis-similarities the “base”. The combination of a tripod and a base is a “tent” -a quadruple of points in which one of them is distinguished.

Figure 1.

Figure 1.

A tent of dis-similarities, consisting of a tripod, the three dis-similarities involving the vertex u (solid lines), and a base, the three opposite dis-similarities spanning the vertices v,w, and x (dashed lines).

Assuming all dis-similarities are distinct, we can relabel the points v,w, and x so that the dis-similarities involving u are in descending order. Thus, (4.4) can be restated simply: the rank order of dis-similarities of the tripod, and the rank order of dis-similarities of the opposite edges in the base, cannot be identical.

This viewpoint reveals other ways that rank-orders of dis-similarities can force (4.2): whenever the longest side of a tripod and the longest side of a base are opposite each other. When this happens, the leftmost pair of (4.2) is necessarily larger than each of the other two pairs, and the four-point condition (4.1) cannot hold. That is, for a tent with vertex z and base {a,b,c}, the following conjunction rules out the addtree model:

D(z,c)>D(z,b) and D(z,c)>D(z,a) and D(a,b)>D(c,a) and D(a,b)>D(b,c). (4.5)

The addtree model is also ruled out if some of the inequalities in (4.5) are non-strict: D(z,c)>D(z,a) can be replaced by D(z,c)D(z,a) provided that D(a,b)>D(b,c) remains strict (and vice-versa), and similarly for D(z,c)>D(z,b) and D(a,b)>D(c,a). Note that this conjunction only involves choice probabilities within the tripod, or within the base.

We show in Appendix 3 that if the conjunction (4.5) is false for the dis-similarities among four points and all of their relabelings, then there is a monotonic transformation d=F(D) of the dis-similarities into distances, for which the distances satisfy the four-point condition. That is, falsity of the conjunction (4.5) is necessary for an addtree model and sufficient for an addtree model among the four points. This stops short of showing that falsification of the conjunction (4.5) for all quadruplets is sufficient for a global addtree model, as the argument in Appendix 3 only identifies a monotonic transformation of the dis-similarities among individual four-point subsets. Appendix 3 does not show that the monotonic transformations needed for each quadruple of points can be made in a globallyconsistent way, though we do not have examples to the contrary.

Likelihood ratio calculation and an index

We now formulate an index that measures the extent to which the choice probabilities underlying a set of observations corresponds to dis-similarities that falsify the conjunction (4.5). This index therefore reflects the likelihood that the choice probabilities fulfill a condition that is necessary for an addtree model.

At first, we consider an index closely analogous to (2.24): a log-likelihood ratio, averaged over tents, comparing the mass of choice probabilities that is within the region in which the conjunction (4.5) is falsified, to the mass that is merely consistent with a symmetric dis-similarity of the choice probabilities that make up the tent:

Iaddtree(C,N;h)=1#(tent)log(Uaddtree(C,N;h)Usymtent(C,N;h)), (5.1)

where tent is the set of tents.

This formulation, however, is problematic: in contrast to the indices for symmetry and ultrametric structure, which were averages over triplets, this is an average over tents. Distinct triplets have non-overlapping triads, but for distinct tents, triads may overlap. For example, a tent with vertex z and base {a,b,c} has a triad (z;a,b), and this triad is shared with the tent with vertex z and base {a,b,q}, and with the tent with vertex y and base {z,a,b}. Thus, the previous factorization of the likelihood integrals into low-dimensional pieces does not apply to the numerator of (5.1).

We therefore replace (5.1) by a heuristic approximation, in which we ignore this overlap:

Iaddtree'(C,N;h)=1#(tent)Ttentlog(Uaddtree(CT,NT;h)Usyment(CT,NT;h)). (5.2)

Iaddtree' is no longer a log likelihood ratio, but will still express the extent to which the data are consistent with a necessary condition for the addtree model. The approximation amounts to considering each tent as providing independent evidence concerning this consistency.

The numerator and denominator of (5.2) have the same form as (3.6), so it suffices to specify the values of VsgnR1-12,,sgnR6-12 for the numerator Vaddtree and denominator Vsymtent. For definiteness, given a tent with z at the vertex and {a,b,c} at the base, we specify the six choice probabilities needed to compute V as follows: for the tripod component, R1R(z;b,c), R2R(z;c,a),R3R(z;a,b); for the base, R4R(a;b,c),R5R(b;c,a),R6R(c;a,b). All of these (and no others) enter into the condition that (4.5) is falsified for this tent: the choice probabilities R1,R2,R4, and R5 are explicit in (4.5), all of the Ri are used equally as the base elements {a,b,c} are permuted. Since V has six arguments, each of which can take on any of three values {-1,0,1}, there are 36=729 values to specify.

For Vaddree, it is simplest to specify these values algorithmically. For a set of choice probabilities to be consistent with the addtree model (i.e., for Vaddree=1), the conjunction (4.5) cannot hold for any of the permutations of {a,b,c}. Since (4.5) is symmetric under interchange of a and b, it suffices to consider the cyclic permutations. So the region of [0,1]6 in which Vaddree=1 is the intersection of the region that falsifies the conjunction (4.5), which we denote Vaddree[c], and the regions that falsify it after cyclic permutation of (a,b,c), which we denote Vaddrree[a] and Vaddrree[b]. Additionally, Vaddtree=0 for sets of choice probabilities that are inconsistent with a symmetric dis-similarity. Thus,

Vaddtree=Vaddtree[a]Vaddtree[b]Vaddtre [c]Vsymten . (5.3)

Vaddrree[c]=1 except when all of the inequalities of (4.5) hold, or, as noted following that equation, when D(z,c)>D(z,a) or D(a,b)>D(b,c) (but not both) is replaced by equality, and D(z,c)>D(z,b) or D(a,b)>D(c,a) (but not both) is replaced by equality:

Vaddtree[c](+τ1,τ2,σ3,τ4,+τ5,σ6)=0for (τ1,τ4) and (τ2,τ5){(1,1),(1,0),(0,1)};σ3 and σ6{1,0,+1} (5.4)

Here, the paired τi ‘s -- not both of which can be zero -- handle the allowed equalities, and the σ ‘s handle the lack of a dependence on the third and sixth arguments. Vaddrree[a] and Vaddtree[b] are then determined by cyclic permutation:

Vaddtree[a](σ1,σ2,σ3,σ4,σ5,σ6)=Vaddtree[c](σ2,σ3,σ1,σ5,σ6,σ4)Vaddtree[b](σ1,σ2,σ3,σ4,σ5,σ6)=Vaddtree[c](σ3,σ1,σ2,σ6,σ4,σ5). (5.5)

Vsymtent occurs both as a factor in the numerator (5.3) and alone in the denominator. The three triads in the base, which only depend on dis-similarities between the elements of the triplet {a,b,c}, so the choice probabilities consistent with symmetry are determined by Vsymσ4,σ5,σ6 (eq. (3.8)). The three triads in the tripod are comparisons between D(z,a),D(z,b), and D(z,c). While these are unconstrained by symmetry, they must be self-consistent. That is, all of the inequalities:

D(z,a)<D(z,b)D(z,b)<D(z,c)D(z,c)<D(z,a)} (5.6)

cannot hold, nor can they all hold if all signs of comparison are inverted. This is precisely the constraints of (2.3) and (2.4), so it is captured by Vsymσ1,σ2,σ3 (eq. (3.8)). Thus,

Vsymtent(σ1,σ2,σ3,σ4,σ5,σ6)=Vsym(σ1,σ2,σ3)Vsym(σ4,σ5,σ6). (5.7)

In sum, the index (5.2) is specified by

Uaddtree(CT,NT;h)=[0,1]6Vaddtree(sgn(R112),,sgn(R612))R1C1(1R1)N1C1R6C6(1R6)N6C6pa,h(R1)pa,h(R6)dR1dRk (5.8)

and

Usymtent(CT,NT;h)=[0,1]6Vsymtent(sgn(R112),,sgn(R612))R1C1(1R1)N1C1R6C6(1R6)N6C6pa,h(R1)pa,h(R6)dR1dRk, (5.9)

where Vaddree and Vsymtent are given by eqs. (5.3) and (5.7).

As a benchmark, we calculate the value of Iaddree' based on the prior alone, for h=0. The two instances of Vsym in its denominator (eq. (5.7)) each contribute a factor of 34 (for each, six of 23 sign combinations are included, as in the calculation of (2.25)). In the numerator, by direct enumeration, 24 of 26 sign combinations are nonzero. So

Iaddrree(0,0;0)=log(24643434)=log23~0.4055. (5.10)

Example Applications

Here we demonstrate the present approach via application to sample datasets from three psychophysical experiments, encompassing two methods for acquiring similarity judgments and spanning low- and high-level visual domains.

Methods

The first two experiments (“textures” and “faces”) make use of the method of Waraich et al. (Waraich & Victor, 2022): on each trial, participants rank the eight comparison stimuli c1,,c8, in order of similarity to a central reference r. These rank-orderings are then interpreted as a set of similarity judgments: ranking ci as more similar than cj to the reference r is interpreted as a triadic judgment that Dcj,r>Dci,r. Data are accumulated across all trials in which ci and cj are presented along with the reference r, leading to an estimate of Robsr;ci,cj. Stimulus sets consisted of 24 or 25 items (described in detail with Results below), and 10 sessions of 100 trials each are presented. On each trial, stimuli are randomly chosen to be the reference or the comparison stimuli. As there were 10 sessions of 100 self-paced trials each and each trial yielded 82=28 triadic judgments, each participant’s dataset contained 28,000 triadic judgments.

For the “textures” and “faces” datasets (described in detail below), stimuli were generated in MATLAB, and were displayed and sequenced using open-source PsychoPy software on a 22-inch LCD screen (Dell P2210, resolution 1680×1050, color profile D65, mean luminance 52 cd/m2). The display was viewed binocularly from a distance of 57 cm. The visual angle of the stimulus array was 24 degrees; each stimulus (a texture patch or a face) subtended 4 degrees. Tallying of responses and multidimensional scaling as described in (Waraich & Victor, 2022) was carried out via Python scripts. Computation of the indices and visualization was carried out in MATLAB using code that is publicly available at https://github.com/jvlab/simrank.

The third experiment (“brightness”) uses an odd-one-out paradigm. On each trial, three stimuli are presented, each consisting of a central disk, drawn from one of eight luminances, and an annular surround. The surround was either of minimal or maximal luminance, and was perceived as black or white, respectively. The participant is asked to judge the brightness of the central disk, and to choose which of the three is the outlier. We interpret selection of a stimulus xj out of a triplet xj,xk,xl as a judgment that the pairwise dis-similarities involving this stimulus are larger than the dis-similarity of the two non-outliers, i.e., that Dxk,xj>Dxk,xl and also that Dxl,xj>Dxl,xk. Each trial thus contributes to estimates of choice probabilities for two triads, xk;xj,xl and xl;xj,xk, and these judgments are tallied across the experiment. Note though that, in contrast to the “textures” and “faces” datasets, here the specific triadic comparisons that enter into the tallies depend on the participant’s responses.

For the “brightness” dataset, stimuli were generated in Python 3.10 and the NumPy library. Stimuli were displayed on a calibrated 24-inch ViewPixx monitor (1920×1080 pixel resolution, mean luminance 70 cd/m2, Vpixx Technologies, Inc.), running custom Python libraries that handle high bit-depth grayscale images (https://github.com/computational-psychology/hrl). Monitor calibration was accomplished using a Minolta LS-100 photometer (Konica Minolta, Tokyo, Japan). The display was viewed binocularly from a distance of 76 cm. The visual angle of the display was 39 degrees; each stimulus subtended 5 degrees, with the central disk subtending 1.67deg. The three stimuli were arranged in a triangular manner, 4 degrees equidistant from the center (Fig. 5A). There were 16 unique stimuli, consisting of all pairings of 8 values for the luminance of the center disk (14,33,55,78,104,131,163 and 197 cd/m2 and 2 values of luminance for the surrounding annulus (0.77 and 226 cd/m2). The 16 stimuli generated 163=560 possible triplet combinations, which were presented in randomized order and position, constituting one block. Each session consisted of two blocks, and each participant ran four sessions. In total, we collected 4480 trials per participant. As each trial gives information for two triadic judgments (as mentioned above), there were 8960 triadic judgments per participant.

Figure 5.

Figure 5.

Panel A: Stimuli for the brightness experiment. Each stimulus had a disk-and-annulus configuration, in which the disk had one of 8 luminances (columns) and either a black (upper row) or a white (lower row) surround annulus. The colored lines encircle three of the stimulus subsets used in Panel C. Panel B: A sample trial. Panel C: Indices Isym,Iumi, and Iaddrree' for similarity judgments in the brightness experiment for the full stimulus set (black symbols), 8-element subsets with only one of the two kinds of surround (blue symbols), and 8-element subsets with both surrounds (green symbols). Other graphical conventions as in Figure 3.

The “texture” and “faces” experiments were performed at Weill Cornell Medical College, in four participants (3F, 1M), ranging in age from 23 to 62. Participants MC and SAW (an author) were experienced observers and familiar with the “texture” stimulus set from previous studies; participants BL and ZK were naïve observers. All participated in the “textures” experiment; 2F (SAW and MC) participated in the “faces” experiment and neither had prior familiarity with those stimuli. The “brightness” dataset was performed at Technische Universität Berlin in three participants (1F,2M), ranging in age from 31 to 39. Participant JP was a naïve observer; participants GA (an author) and JXV were experienced observers. All participants had normal or corrected-to-normal vision. They provided informed consent following institutional guidelines and the Declaration of Helsinki, according to a protocol that was approved by the relevant institution.

In addition to the calculations described above, we also calculated the indices Isym,Iumi, and Iaddree' for surrogate datasets. Surrogate datasets were constructed two ways. The “flip all” surrogate was created by randomly selecting triplets and flipping all choice probabilities (replacing Robs(r;x,y) by 1-Robs(r;x,y) for the three triads within the selected triplets. The “flip any” surrogate was created by randomly selecting individual triads, and flipping the choice probabilities for the selected triads. Since the indices are sums of values that are independently computed either from triplets or tents, the exact means and standard deviations of the surrogate indices could be computed efficiently by exhaustive sampling of each triplet or tent separately, rather than approximated via a random sampling. Note also that the “flip all” surrogate leaves Isym unchanged, since the two criteria for symmetric dis-similarities (eqs. (2.3) and (2.4)) simply swap when all choice probabilities are flipped. The maximum-likelihood parameters a and h (eq. (3.4)) for the surrogates are also identical to those for the original datasets, since the prior is unchanged by flipping the choice probabilities.

Finally, we estimated the standard errors for the indices calculated from the original datasets via a jackknife on triplets (for Isym,Iumi) or tents (for Iaddrree'). Maximum-likelihood parameters a and h were not re-calculated for the jackknife subsets, as pilot analyses confirmed that removal of one triplet or tent made very little change in the maximum-likelihood value.

Results

Textures

The “textures” experiment made use of the stimulus space described in (Victor & Conte, 2012), a 10-dimensional space of binary textures with well-characterized discrimination thresholds (Victor, Thengone, Rizvi, & Conte, 2015). We chose a two-parameter component of this domain (Figure 2A) that allowed a focus on testing for compatibility for addtree structure. The two parameters chosen, β- and β|, determine the nearest-neighbor correlations in the horizontal or vertical direction: the probability that a pair of adjacent checks have the same luminance (either both black or both white) is (1+β)/2, and the probability that a pair of adjacent checks have the opposite luminance (one black, the other white) is (1-β)/2. Other than these constraints, the textures are maximum-entropy (see (Victor & Conte, 2012) for details). For these experiments, we chose values of β- or β from −0.9 to 0.9 in steps of 0.15. That is, six stimuli had positive values of β-(0.15,0.30,0.45,0.60,0.75,0.90) with β1=0, six had the corresponding negative values of β-, six had positive values of β with β-=0, six had negative values of β with β-=0, and one had β-=β=0. In the experiment, each stimulus example was unique– that is, a stimulus is specified by a particular (β,β|) pair, but the texture example used on each trial was a different random sample from that texture.

Figure 2.

Figure 2.

Figure 2.

Panel A: Stimuli used in the texture experiment. Each stimulus is an array of 16 × 16 black or white checks. For stimuli enclosed in dark blue, checks are correlated (or anticorrelated) along rows. Correlation strength is parameterized by β-, where β->0 indicates positive correlation and β-<0 indicates negative correlation. For stimuli enclosed in light blue, checks are correlated (or anticorrelated) along columns, similarly parameterized by β|. The full stimulus set consists of 6 equally-spaced values positive and negative values of β- and β, and an uncorrelated stimulus (center), where β_=β=|0. Panel B: Multidimensional scaling of similarity judgments for the stimuli in panel A for four participants. The data from each participant have been rotated into a consensus alignment via the Procrustes procedure (without rescaling). Lines connect stimuli along each of the rays in Panel A. One unit indicates one just-noticeable difference in an additive noise model (Waraich & Victor, 2022).

The rationale for this stimulus set is that we anticipated that certain subsets of stimuli would be more compatible with the addtree model than others. The basis for these expectations is shown in Figure 2B, which presents non-metric multidimensional scaling of the similarity data. This analysis, carried out with the procedure detailed in (Waraich & Victor, 2022), uses a maximum-likelihood criterion to place the 25 stimulus points in a space, so that the Euclidean distances between them best account for the choice probabilities (assuming a uniform, additive decision noise). Consistently across participants, the points along each stimulus axis (β- or β) map to a gradually curving trajectory. For this reason, we anticipate that comparison data from the stimuli on one of these trajectories (the 13 points with either β- or β| equal to zero, here called an “‘axis”) when analyzed in isolation, will be close to an addtree model. However, the two trajectories are not perpendicular: rays with same signs of β meet at an acute angle of 〖45° or less. That is, stimuli with strong positive correlations β->0 compared to β1>0) are seen as relatively similar to each other. This is anticipated to make the subset consisting of the 13 points with either β- or β positive (a “vee”) inconsistent with the addtree model, as the shortest perceptual path between two points at the end of the positive β- or β rays is much shorter than a path that traverses each ray back to the origin. Similar reasoning indicates that the vee formed by the two negative rays should also be inconsistent with an addtree model. Note, though, that this intuition assumes that the Euclidean distances in Figure 2B are an accurate account of the perceptual dis-similarities; the analysis via Iaddtree' does not make this assumption.

Figure 3 shows the indices Isym,Iumi, and Iaddrree' computed from the full datasets for each participant, and for the axis and vee subsets. As expected from the above analysis, the addtree index Iaddree' is substantially higher for the “axis” subsets than for the “vee” subsets, and has an intermediate value for the full dataset. Note that “axis” and “vee” subsets are in terms of the number of stimuli, and were collected simultaneously within a single experiment. This finding supports the efficacy of Iaddree' in providing a characterization of the dataset regarding consistency with the addtree model. In all cases, it is higher than the a priori value, and substantially higher than values computed from surrogate datasets in which choice probabilities are randomly flipped. This latter point indicates (not surprisingly) that for all of these subsets, there are portions of the data that are more consistent with an addtree model than chance.

Figure 3.

Figure 3.

Indices Isym,Iumi, and Iaddree' for the similarity judgments in the texture experiment. Stimulus subsets are indicated by symbol color; participants by symbol shape. The vertical extent of the wide boxes indicate pm 1 standard deviation for the “flip any” surrogates; the vertical extent of the narrow boxes indicate pm 1 standard deviation for the “flip all” surrogates (not plotted for Iumi, as this index is unchanged by the “flip all” operation). The thin vertical lines are to aid visualization, and do not represent ranges. Standard errors for the experimental datasets are smaller than the symbol sizes. Null-hypothesis values of the indices are indicated by the horizontal dashed line.

For this dataset, values of Isym were quite close to zero (usually >−0.1), indicating that nearly all >e-0.1 of the posterior distribution of choice probabilities was consistent with a symmetric dissimilarity. Iumi, which measures consistency with the ultrametric model, was typically −0.25 or less, substantially below the a priori value of zero. But interestingly, the highest values of Iumi were seen in the “vee” subsets, suggesting a partially hierarchical structure -e.g., that the two directions of correlation formed categories. As was the case for Iaddrree', all indices were higher than for surrogates constructed by randomly flipping the choice probabilities. For Isym, this is unsurprising, as randomly flipping choice probabilities would be unlikely to lead to a set of symmetric judgments. For Iumi, this finding indicates that, even though the ultrametric model is excluded, the data has islands of consistency with the ultrametric structure.

The above results were insensitive to the parameters a and h of the prior for the distribution of choice probabilities (eqs. (2.7) and (3.4)). The Dirichlet parameter a obtained by maximum likelihood (eq. (2.10)) ranged from 0.25 to 1.25 (with the lowest values for the full texture dataset), but very similar results as Figure 3 were obtained with setting a=0.5 for all datasets. For Iumi, the limit in (3.15) was estimated by setting h=0.001 but similar values were obtained for h=0.01. The findings for Isym and Iaddree', here shown for h=0, were not substantially changed when h was determined by maximum likelihood. These values of h were typically quite small (median, 7 × 10−5)).

Faces

The “faces” experiment used stimuli drawn from the public-domain library of faces, at https://faces.mpdl.mpg.de/imejil, which contained color photographs of 171 individuals, stratified in three age ranges (“young”, “middle”, “old”). We randomly selected two males and two females from each age range, and for each individual in the faces database, used the two example photographs with neutral expressions, for a total of 24 unique images (2 genders × 3 age ranges × 2 individuals × 2 photographs of each).

The rationale for this choice of stimuli was that the above hierarchical organization might lead to a similarity structure close to ultrametric behavior. As shown in Figure 4, upper row, while this was not the case for analysis of the full dataset (Iumi<0, the a priori level), it was the case for the 8-stimulus subsets within each age bracket Iumi>0. Values of Iumi>0 were also seen for some subsets subdivided by gender (restricted to two age ranges, to equate the number of stimuli), as shown in Figure 4, lower row. Values of Isym were again quite close to zero (usually > −0.1), indicating strong consistency with a symmetric dis-similarity. Values of Iaddree' were similar to the a priori value, but much larger than for the surrogates. As was the case for the texture experiment, these results were insensitive to the parameters a and h of the prior for the distribution of choice probabilities. Here, values of the Dirichlet parameter a obtained by maximum likelihood ranged from approximately 0.1 to 0.5; results similar to those of Figure 4 were obtained with setting a=0.3 for all datasets. Also as was the case for the texture experiment, findings for Isym and Iaddtree', were not substantially changed when h was determined by maximum likelihood – even though the typical values of h were larger (median, 6 × 10−2), supporting the idea that many underlying choice probabilities were close to 0.5.

Figure 4.

Figure 4.

Indices Isym,Iumi, and Iaddrree' for the similarity judgments in the faces experiment. Stimulus subsets are indicated by symbol color; participants by symbol shape. Upper row: full stimulus set (black symbols), and subsets partitioned by age. Lower row: full stimulus set (black symbols, repeated), subsets partitioned by gender, with two age ranges each. Other graphical conventions as in Figure 3.

Brightness

The “brightness” experiment consisted of judgments of brightness dis-similarity for the set of disk-and-annulus stimuli as shown in Figure 5A. This disk-and-annulus configuration has been extensively used to study the effect of the context surround on the appearance of the inner disk (Gilchrist, 2006; Wallach, 1948). A light surround is expected to have make the inner disk appear darker, and conversely, darker surround is expected to make the inner disk appear lighter. While it is generally assumed that this shift in appearance is along a one-dimensional brightness continuum, the evidence is ambiguous(Murray, 2021). For example, Madigan and Brainard (Madigan & Brainard, 2014) found that one dimension suffices to explain brightness similarity ratings, while Logvinenko & Maloney (Logvinenko & Maloney, 2006) found that dissimilarity ratings under different illuminations required a 2-dimensional perceptual space.

This open question motivated the stimuli used in the present experiment: a gamut of 8 disk luminances, presented with either of 2 surround contexts (Figure 5A). Participants judged the brightness of the inner disk for triplets constructed from all possible combinations of disk luminance and surround (Figure 5B). If brightness is one dimensional, then dis-similarity judgments for the full set of stimuli should be consistent with a one-dimensional model, which is a special case of an addtree model. If, on the other hand, the surround produces differences in appearance that are not one dimensional, then the full set of judgments should be inconsistent with addtree. Under this hypothesis, restricting the judgments to stimuli with the same surround (the subsets encircled by the dark and light blue lines inFigure 5A) should recover a one-dimensional structure and consistency with an addtree model, while restriction to a same-sized set but with two kinds of surrounds (green lines in Figure 5A) should remain inconsistent.

Figure 5C shows the results. For the full stimulus set (black symbols), Iaddree' is close to zero (> 0.17), and substantially higher than the a priori value, for all three participants. Even higher indices are found for the 8-element stimulus subsets of only black Iaddree'>0.09 or only white Iaddtree'>0.05 surrounds (blue symbols in Figure 5C). This is consistent with the notion that, when context is held constant, dis-similarity judgments are highly consistent with a one-dimensional space. However, when Iaddrree' was computed for 8-element subsets of the stimuli in which judgments were made across two surrounds (green symbols in Figure 5C), Iaddtree' was lower, and varied substantially across participants. GA always the lowest value (Iaddtree'-0.26 to −0.17 ) and JP the highest value close to zero Iaddree'-0.08 to −0.03).

These findings show that when the surround context is constant, judgments are highly consistent with an addtree model, but there is inter-observer variability when judgments are made across two surround contexts. The variability is not surprising, as previous research has shown that individual idiosyncrasies can play a substantial role when disk-in-context stimuli are used to study brightness or color (Radonjic, Cottaris, & Brainard, 2015). Our method seems to be capturing these inter-individual differences, but – as we are focusing on a demonstration of the analysis methods – we do not attempt to probe the basis for this difference here.

Similar to the texture and faces experiments, values of Isym are all close to zero for the brightness dataset, indicating consistency with symmetric dis-similarity judgments. Ultrametric indices Iumi are below the a priori value for all cases, indicating inconsistency with an ultrametric model, as expected for a one-dimensional continuum.

Also as in the texture and faces experiments, results were robust to changes of analysis details.Values of the Dirichlet parameter a obtained by maximum likelihood ranged from approximately 0.07 to 0.22; results similar to those of Figure 4 were obtained with setting a=0.1 for all datasets. Findings for Isym and Iaddtree' were not substantially changed when h was determined by maximum likelihood, yielding values of h with a median of 5 × 10−2, comparable to the faces dataset.

Discussion

The main contribution of the paper is to advance a strategy for connecting similarity judgments of a collection of stimuli to inferences about the structure of the domain from which the stimuli are drawn. The starting point is an experimental dataset in which the judgments are assumed to be independently drawn binary samplings distributed according to the underlying choice probabilities. We assume that for each triad (a reference stimulus and two comparison stimuli), the comparison stimulus that is more often judged to be more similar to the reference is at a shorter distance from it, but we do not assume, or attempt to infer, a relationship between choice probabilities and the distances. This approach also takes into account the possibility that each triad may have its own “local” transformation that links choice probability and distance. While we recognize that judgments may be uncertain, we refrain from postulating a noise model or a decision model – or even that sensory or decision noise is uniform throughout the space.

Despite the paucity of assumptions, we show that it is possible to characterize dis-similarity judgments along three lines: consistency with symmetry, consistency with an ultrametric model, and consistency with an additive tree model. These characteristics are functionally significant aspects of a domain’s organization. Symmetry (or its absence) has implications for the mechanism(s) by which comparisons are made (Tversky, 1977; Tversky & Gati, 1982). For symmetric similarity judgments, addtree models, but not ultrametric models, are consistent with the Tversky contrast model (Sattath & Tversky, 1977). More broadly, semantic domains are anticipated to be consistent with a hierarchical model of similarity judgments (ultrametric or addtree), while domains of features are not (Kemp & Tenenbaum, 2008; Saxe et al., 2019). It is also worth noting that one-dimensional domains are a special case of the addtree model, so the present approach can address whether the apparent “curvature” in a one-dimensional perceptual space can be eliminated by alternative choices of the linkage between distance and decision – a limitation of the analysis in (Bujack, Teti, Miller, Caffrey, & Turton, 2022). Furthermore, our method is sensitive enough to reveal inter-individual differences: for some participants data are consistent the addtree model and for others, not (or less so) – consistent with other studies of the influence of context (Radonjic et al., 2015), and an interesting area for further investigation.

Comparisons to other methods

The present strategy, whose overriding consideration is to keep assumptions at a minimum, is complementary to other ways of analyzing similarity judgments. A classical and commonly-used approach, non-metric multidimensional scaling (de Leeuw & Heiser, 1982; Tsogo et al., 2000), explicitly postulates that the original data (here, the choice probabilities) reflect a monotonic transformation of a metric distance. The distance is taken to be the Euclidean distance, but distances in a hyperbolic or spherical geometry can also be used. An important related approach for one-dimensional models is maximum-likelihood difference scaling (Knoblauch & Maloney, 2008; Maloney & Yang, 2003), which– via a decision model – takes into account the noisy nature of psychophysical judgments. This approach can also be extended to multidimensional models, but the need for a decision model remains (Victor et al., 2017; Waraich & Victor, 2022).

Our approach also contrasts with that of topological data analysis via persistent homologies (Dabaghian et al., 2012; Giusti et al., 2015; Singh et al., 2008; Zhou et al., 2018). Like our approach, TDA avoids the need to postulate a specific relationship between dis-similarities and distances, as the Betti numbers are calculated from a sequence of graphs that depend only on the rank order of similarity judgments. However, construction of this sequence of graphs does require a globally uniform linkage between triadic judgments and relative distance, and also, that every pairwise distance is included in the measured triads. The characterizations yielded by TDA area also complementary: they focus on dimensionality and homology class, rather than the characterizations considered here.

Finally, we note another approach that directly seeks to identify features of ultrametric behavior in neural data. Treves (Treves, 1997) developed an information-theoretic measure of dis-similarity of neural responses to faces. The strategy for seeking evidence of ultrametric behavior was to examine the ratio, within each triplet, of the middle distance to the largest. This ratio, which would be 1 for an ultrametric, was found in that study to be closer to 1 than for expected by chance. Nonparametric generalizations of this approach may permit a relaxation of the assumed linkage between extensions the information-theoretic measure and dis-similarity, and even an evaluation of addtree models – but in contrast to our approach, it begins with a set of responses to each stimulus, rather than a sampling of triadic comparisons.

Caveats

Keeping assumptions to a minimum necessarily leads to certain limitations. The indices for symmetry and addtree structure reflect necessary, but not sufficient, conditions. Moreover, these indices do not measure the goodness of fit of a model, and are not directly applicable to model comparisons: the indices merely measure to what extent the data act to concentrate the a priori distribution of choice probabilities into the subset of choice probabilities that have a particular characteristic. Thus, it is important to recognize that the extent of concentration will depend on the typical coverage of each triad: a greater number of trials of each triad will data provides better estimates of the underlying choice probabilities, and thus, can move the indices further from their a priori values. For this reason, the examples above (axes vs. vees in Figure 3; subdivision by age vs. gender in Figure 4, same context vs. different contexts in Figure 5C) focus on comparisons of indices between datasets with a similar number of stimuli, and a similar coverage of each.

Extensions and open issues

There are open questions that are raised by the present approach, as well as some directions in which it might be extended.

At a practical level, if the present analysis indicates that a dataset is consistent with an ultrametric or addtree model, to what extent can the ultrametric or addtree structure be determined? Existing methods for taking this step require a complete set of dis-similarity measures (Abraham, Bartal, & Neiman, 2015; Barthélemy & Guénoche, 1991; Hornik, 2005; Sattath & Tversky, 1977), along with the assumption that the transformation from triadic choice probabilities into dis-similarities is uniform across the space. Choice probabilities provide constraints even without this assumption – for example certain relationships among the triadic judgments involving four points are sufficient for a local addtree model – but it is unclear whether these constraints are sufficient for a global model, or how such a global model can be determined.

The observation that a metric that obeys the four-point condition can always be realized by the path metric on a weighted acyclic graph (Buneman, 1974) suggests the possibility of a succession of further characterizations of a set of choice probabilities. By definition, acyclic graphs have no three-cycles. An isolated 3-cycle with nodes a1,a2, and a3 can always be removed by adding a node c, with distances dc,a1=da1,a2+da1,a3-da2,a3/2, etc.; this quantity is guaranteed to be non-negative via the triangle inequality, and dc,a1+dc,a2=da1,a2. Thus, ruling out the four-point condition on distances via the condition (4.5) implies that the dis-similarity structure cannot be realized on a weighted acyclic graph, or on a weighted graph with only isolated three-cycles. Consequently, a graph with two non-disjoint three-cycles or a four-cycle, is required. Similarly, more elaborate conditions analogous to (4.5) then rule out realizing the dis-similarities on graphs with less than some specified level of cycle structure. For example, if the conditions (4.5) hold for two quadruplets with three points in common, then the four-cycles required by each configuration must have three points in common, so a five-cycle, or two linked four-cycles, or multiple edge-sharing three-cycles, are required to be present in a graph that accounts for the dis-similarities among the five points.

In this regard, it is interesting to note that the ultrametric condition and the four-point condition have a similar structure: both state that of three numbers (three single distances for the ultrametric, three pairwise sums for the four-point condition), the largest two must be identical. This intriguing similarity raises the further possibility of a sequence of analogous conditions, each specifying a progressively less-restrictive aspect of a set of dis-similarity judgments – such as compatibility with planar graphs, or statements about the Betti numbers.

Acknowledgements

This work was supported by NIH NEI EY7977 (JV), the Fred Plum Fellowship in SystemsNeurology and Neuroscience (SAW). GA would like to thank Marianne Maertens for all her support. We thank Laurence T. Maloney for his many helpful discussions, especially concerning the addtree model.

Appendix 1: Metric spaces

This appendix demonstrates that consistency of a set of choice probabilities with a symmetric dis-similarity implies consistency with a metric-space structure.

A metric space is defined to be a set of points {x,y,z,}, along with a distance d(x,y) that associates each pair of points, and that satisfies three properties:

d(x,y)0 and d(x,y)=0x=y, (6.1)
symmetry:d(x,y)=d(y,x), (6.2)

and

the triangle inequality:d(x,y)d(x,z)+d(y,z). (6.3)

As we have generically assumed (in eq. (1.7)) that dis-similarity D(x,y) satisfies (6.1) and now add the assumption that it is symmetric, we need to show that we can replace D(x,y) – which need not satisfy the triangle inequality – by a distance d(x,y) which both satisfies the triangle inequality and accounts for the choice probabilities via

R(r;x,y)>1/2d(r,x)<d(r,y). (6.4)

Since our central assumption is that the rank-order of dis-similarities accounts for whether a choice probability is less than or greater than 12 (eq. (1.5), which is (6.4) with D instead of), eq. (6.4) will hold when the distance d is any monotonic transformation of the dis-similarity D:

d(x,y)=F(D(x,y)). (6.5)

So it suffices to find a monotonic transformation for which d(x,y) satisfies the triangle inequality.

A suitable transformation for this purpose is

F(u)={0,u=01+u1+u,u>0. (6.6)

For distinct elements a and b,D(a,b)>0 and consequently d(a,b)=F(D(a,b)) is between 1 and 2.

So the left-hand side of (6.3)is always < 2 and each term on the right-hand side is > 1, and the triangle inequality holds.

Appendix 2: Addtree Models and the Four-Point Condition

This Appendix demonstrates some well-known and basic (Buneman, 1974; Dobson, 1974; Sattath & Tversky, 1977) relationships between the four-point condition, the triangle inequality, and the ultrametric inequality.

To see that the four-point condition (4.1) implies the triangle inequality, we set w=x. Then (4.1) becomes

None of the three quantities{d(u,v)d(u,x)+d(v,x)d(u,x)+d(v,x)}is strictly greater than the other two. (7.1)

That is, d(u,v)d(u,x)+d(v,x), which is the triangle inequality.

To see that the ultrametric inequality (3.1) implies the four-point condition, we first relabel the points u,v,w, and x so that d(u,v) is the smallest distance, and

d(u,v)d(u,w)d(u,x). (7.2)

Applied to the triangle with vertices u,v, and w, the ultrametric inequality -- which says that all triangles are isosceles, with the two equal sides no shorter than the third -- yields,

d(u,v)d(u,w)d(v,w)=d(u,w). (7.3)

Similarly, applied to the triangle with vertices u,v, and x, the ultrametric inequality yields,

d(u,v)d(u,x)d(v,x)=d(u,x). (7.4)

Combining these two yields

d(u,w)+d(v,x)=d(v,w)+d(u,x). (7.5)

Applied to the triangle with vertices u,w, and x, the ultrametric inequality yields,

d(u,w)d(u,x)d(w,x)=d(u,x). (7.6)

Combining this with (8.2) yields

d(u,v)+d(w,x)=d(u,v)+d(u,x)d(v,w)+d(u,x). (7.7)

Together, (8.5) and (8.7) constitute the four-point condition.

Examples

For four distinct points on a line in the order and distances given by the ordinary Euclidean distance d(y,z)=|z-y|, the addtree conditions hold. Taking u<v<w<x,

d(u,w)+d(v,x)=(wu)+(xv)=(wv)+(xu)=d(u,x)+d(v,w), (7.8)

while

d(u,v)+d(w,x)=(vu)+(xw)=(v+x)(u+w)<(w+x)(u+v)=(xu)+(wv)=d(u,x)+d(v,w), (7.9)

However, the ultrametric inequality does not in general hold, since d(u,v)<d(u,w)<d(u,x).

For the standard distance between four distinct in general position in a plane, the four-point condition does not hold, since the three pairsums are typically distinct.

Appendix 3: Local Sufficiency for the Four-Point Condition

Here, we show that falsification of the conjunction (4.5) suffices to ensure that the dis-similarities are consistent with a local addtree model. More precisely, we show that if, for each relabeling of a set of four points, (4.5) does not hold, then there is a monotonic transformation d(x,y)=F(D(x,y)) of the dis-similarities into distances, for which the four-point condition (4.1) holds. We will always choose F to be of the form

F(D)={D+k,kD0D,D<D0. (8.1)

so the demonstration rests on finding an appropriate choice of D0 and k We note that we can ignore whether the quantity that results from the monotonic transformation satisfies the triangle inequality, since Appendix 2 shows that the four-point condition implies the triangle inequality.

We refer to the three quantities in the four-point condition (4.1) and the corresponding sums of pairs of dis-similarities as “pairsums.” We at first assume that all dis-similarities are unequal, and consider the case of equality below.

Given that the dis-similarities are unequal, we relabel the points so that D(z,c) is the greatest dis-similarity. If D(z,c)+D(a,b) is not the largest pairsum, we choose D0=D(z,c) in (9.1). With this choice, d(z,c)+d(a,b)=D(z,c)+D(a,b)+k, but the other pairsums are unchanged. Setting k equal to the difference between D(z,c)+D(a,b) and the next-largest pairsum yields a transformation to distances that satisfies the four-point condition.

So we only need to consider cases where D(z,c) is the largest dis-similarity and D(z,c)+D(a,b) is the largest pairsum. In each case, we take D0>D(a,b) but smaller than the next-largest disparity. This increases D(z,c)+D(a,b) by k. We show that this transformation increases at least one of the other two pairsums by at least 2k, enabling one of the other two pairsums to “catch up” to D(z,c)+D(a,b). To do this, we need to show that for at least one of the other two pairsums, both terms are greater than D(a,b).

Given that D(z,c) is the largest dis-similarity, the second-largest dis-similarity cannot be D(a,b), since then (4.5) would hold. So the second-largest dis-similarity must share a point with the largest dis-similarity, and we can label the points so that this second-largest dis-similarity is D(z,a). D(a,b) cannot be the third-largest dis-similarity, since then all of (4.5) would hold. Thus, we may assume that D(z,c) is the largest dis-similarity, D(z,a) is the second-largest, and we consider the three possibilities for the third-largest dis-similarity, D(z,b),D(a,c), and D(b,c), in turn (Figure 6).

Figure 6.

Figure 6.

Three rank-orderings of the dis-similarities among four points consistent with falsification of the conjunction (4.5). In all cases, D(z,c) is largest (heavy line) and D(z,a) is second-largest (intermediate line). The third-largest dis-similarity can be either D(z,b),D(a,c), and D(b,c) (thin solid lines). See Appendix 3 for details.

If the third-largest dis-similarity is (z,b) : Either D(a,c)>D(a,b) or D(b,c)>D(a,b), since otherwise (4.5) would hold. Thus, both terms of at least one of D(z,b)+D(a,c) or D(z,a)+D(b,c) are greater than D(a,b).

If the third-largest dis-similarity is D(a,c):D(a,b) cannot be larger than both D(z,b) and D(b,c), because then, a conjunction like (4.5) would hold with b at the vertex (we would have D(a,b) the largest of the tent at b, and D(z,c) the largest of the base.) So at least one of D(z,b)>D(a,b) or D(b,c)>D(a,b). Thus, both terms of either pairsum D(z,b)+D(a,c) or D(z,a)+D(b,c) are greater than D(a,b).

We now extend the above argument to the case in which the conjunction (4.5) is false after allowing some of the dis-similarities to be equal (but these equalities involve at most one of the two components of a pairsum, as specified following eq. (4.5)). We need to show that under these circumstances, there is still a monotonic distortion of the dis-similarities into distances for which the four-point condition holds. Assume otherwise – i.e., that there is a “rogue” configuration of dis-similarities Q that does not allow for such a monotonic distortion. If there is such a rogue configuration Q, we can find, for sufficiently small ε, another configuration Qε with unique dis-similarities, whose dis-similarities within ε of Q at the ties, and equal to the dis-similarities in Q at the non-ties. The above construction yields a monotonic distortion dε(u,v)=FεDε(u,v) that converts the dis-similarities in Qε to distances that satisfy the four-point condition.

Now apply the limit of these monotonic transformations, limε0Fε to the dis-similarities in Q. (This limit must exist, and the convergence is uniform: if, two configurations Q1 and Q2 have dis-similarities that match to within an amount q, then the monotonic functions needed for Q1 and Q2 will differ by at most q). In this limit, the four-point conditions must hold, since they are simple linear functions of the distances, that hold for all sufficiently small ε>0. That F is monotonic follows from the monotonicity of each Fε, and the uniform convergence of Fε to F.

References

  1. Abraham I., Bartal Y., & Neiman O. (2015). Embedding metrics into ultrametrics and graphs into spanning trees with constant average distortion. SIAM J. Computing, 44(1), 160–192. doi: 10.1137/120884390 [DOI] [Google Scholar]
  2. Barthélemy J.-P., & Guénoche A. (1991). Trees and Proximity Representations. Chichester, England: John Wiley and Sons. [Google Scholar]
  3. Bujack R., Teti E., Miller J., Caffrey E., & Turton T. L. (2022). The non-Riemannian nature of perceptual color space. Proc Natl Acad Sci U S A, 119(18), e2119753119. doi: 10.1073/pnas.2119753119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Buneman P. (1974). A note on the metric properties of trees. Journal of Combinatorial Theory, 17, 48–50. [Google Scholar]
  5. Dabaghian Y., Memoli F., Frank L., & Carlsson G. (2012). A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Comput Biol, 8(8), e1002581. doi: 10.1371/journal.pcbi.1002581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. de Leeuw J., & Heiser W. (1982). Theory of Multidimensional Scaling. In Krishnaiah P. R., and Kanal L.N. (Ed.), Handbook of Statistics (Vol. 2, pp. 285–316): North-Holland. [Google Scholar]
  7. Dobson A. (1974). Unrooted trees for numerical taxonomy. J. Appl. Prob., 11, 32–42. [Google Scholar]
  8. Edelman S. (1998). Representation is the representation of similarities. Behav Brain Sci, 21(4), 449467; discussion 467–498. [DOI] [PubMed] [Google Scholar]
  9. Ferguson T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist., 1(2), 209–230. doi:DOI: 10.1214/aos/1176342360 [DOI] [Google Scholar]
  10. Gilchrist A. (2006). Seeing Black and White. New York, NY: Oxford University Press. [Google Scholar]
  11. Giusti C., Pastalkova E., Curto C., & Itskov V. (2015). Clique topology reveals intrinsic geometric structure in neural correlations. Proc Natl Acad Sci U S A, 112(44), 13455–13460. doi: 10.1073/pnas.1506407112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Guidolin A., Desroches M., Victor J. D., Purpura K. P., & Rodrigues S. (2022). Geometry of spiking patterns in early visual cortex: a topological data analytic approach. J R Soc Interface, 19(196), 20220677. doi: 10.1098/rsif.2022.0677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Hornik K. (2005). A CLUE for CLUster ensembles. J. Statistical Software, 14(12), 1–25. [Google Scholar]
  14. Kemp C., & Tenenbaum J. B. (2008). The discovery of structural form. Proc Natl Acad Sci U S A, 105(31), 10687–10692. doi: 10.1073/pnas.0802631105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kingdom F. A. A., & Prins N. (2016). Psychophysics: A Practical Introduction (Second Edition ed.). Cambridge, MA: Academic Press. [Google Scholar]
  16. Knoblauch K., & Maloney L. T. (2008). MLDS: Maximum Likelihood Difference Scaling in R. J. Statistical Software, 25(2), 1–26. doi: 10.18637/jss.v025.i02 [DOI] [Google Scholar]
  17. Kriegeskorte N., & Kievit R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends Cogn Sci, 17(8), 401–412. doi: 10.1016/j.tics.2013.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kruskal J. B., & Wish M. (1978). Multidimensional Scaling. Beverly Hills: Sage. [Google Scholar]
  19. Logvinenko A. D., & Maloney L. T. (2006). The proximity structure of achromatic surface colors and the impossibility of asymmetric lightness matching. Percept Psychophys, 68(1), 76–83. doi: 10.3758/bf03193657 [DOI] [PubMed] [Google Scholar]
  20. Madigan S. C., & Brainard D. H. (2014). Scaling measurements of the effect of surface slant on perceived lightness. Iperception, 5(1), 53–72. doi: 10.1068/i0608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Maloney L. T., & Yang J. N. (2003). Maximum likelihood difference scaling. J Vis, 3(8), 573–585. doi: 10.1167/3.8.5 [DOI] [PubMed] [Google Scholar]
  22. Murray R. F. (2021). Lightness perception in complex scenes. Annu Rev Vis Sci, 7, 417–436. doi: 10.1146/annurev-vision-093019-115159 [DOI] [PubMed] [Google Scholar]
  23. Radonjic A., Cottaris N. P., & Brainard D. H. (2015). Color constancy supports cross-illumination color selection. J Vis, 15(6), 13. doi: 10.1167/15.6.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sattath S., & Tversky A. (1977). Additive similarity trees. Psychometrika, 42(3), 319–345. [Google Scholar]
  25. Saxe A. M., McClelland J. L., & Ganguli S. (2019). A mathematical theory of semantic development in deep neural networks. Proc Natl Acad Sci U S A, 116(23), 11537–11546. doi: 10.1073/pnas.1820226116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Semmes S. (2007). An introduction to the geometry of ultrametric spaces. 10.48550/arXiv.0711.0709. [DOI] [Google Scholar]
  27. Shepard R. N. (1958). Stimulus and response generalization: tests of a model relating generalization to distance in psychological space. J Exp Psychol, 55(6), 509–523. doi: 10.1037/h0042354 [DOI] [PubMed] [Google Scholar]
  28. Singh G., Memoli F., Ishkhanov T., Sapiro G., Carlsson G., & Ringach D. L. (2008). Topological analysis of population activity in visual cortex. J Vis, 8(8),,11 11–18. doi: 10.1167/8.8.11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Treves A. (1997). On the perceptual structure of face space. Biosystems, 40(1–2), 189–196. doi: 10.1016/0303-2647(96)01645-0 [DOI] [PubMed] [Google Scholar]
  30. Tsogo L., Masson M. H., & Bardot A. (2000). Multidimensional Scaling Methods for Many-Object Sets: A Review. Multivariate Behav Res, 35(3), 307–319. doi: 10.1207/S15327906MBR3503_02 [DOI] [PubMed] [Google Scholar]
  31. Tversky A. (1977). Features of similarity. Psychol Rev, 84(4), 327–352. [Google Scholar]
  32. Tversky A., & Gati I. (1982). Similarity, separability, and the triangle inequality. Psychol Rev, 89(2), 123–154 [PubMed] [Google Scholar]
  33. Victor J. D., & Conte M. M. (2012). Local image statistics: maximum-entropy constructions and perceptual salience. J Opt Soc Am A Opt Image Sci Vis, 29(7), 1313–1345. doi: 10.1364/JOSAA.29.001313 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Victor J. D., Rizvi S. M., & Conte M. M. (2017). Two representations of a high-dimensional perceptual space. Vision Res, 137, 1–23. doi: 10.1016/j.visres.2017.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Victor J. D., Thengone D. J., Rizvi S. M., & Conte M. M. (2015). A perceptual space of local image statistics. Vision Res, 117, 117–135. doi: 10.1016/j.visres.2015.05.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wallach H. (1948). Brightness constancy and the nature of achromatic colors. J Exp Psychol, 38(3), 310–324. doi: 10.1037/h0053804 [DOI] [PubMed] [Google Scholar]
  37. Waraich S. A., & Victor J. D. (2022). A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments. J Vis Exp(181). doi: 10.3791/63461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Zaidi Q., Victor J., McDermott J., Geffen M., Bensmaia S., & Cleland T. A. (2013). Perceptual spaces: mathematical structures to neural mechanisms. J Neurosci, 33(45), 17597–17602. doi: 10.1523/jneurosci.3343-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Zhou Y., Smith B. H., & Sharpee T. O. (2018). Hyperbolic geometry of the olfactory space. Sci Adv, 4(8), eaaq1458. doi: 10.1126/sciadv.aaq1458 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES