Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2002 Apr 2;99(7):4178–4184. doi: 10.1073/pnas.032677199

The Pythagorean Theorem: I. The finite case

Richard V Kadison 1,
PMCID: PMC123622  PMID: 11929992

Abstract

The Pythagorean Theorem and variants of it are studied. The variations evolve to a formulation in terms of noncommutative, conditional expectations on von Neumann algebras that displays the theorem as the basic result of noncommutative, metric, Euclidean Geometry. The emphasis in the present article is finite dimensionality, both “discrete” and “continuous.”

1. Introduction and Theme

Most of us carry away from our earliest contact with elementary mathematics memories of two basic formulae from Euclidean Geometry: πr2, the “area” of a circle with radius r, and a2 + b2 = c2, the formula relating the lengths, a and b, of the two sides of a right triangle to the length, c, of the hypotenuse of that triangle. That last formula, the Pythagorean Theorem, is the most basic result of “metric” Euclidean Geometry.

In this article, we study that theorem and variants of it. Our study falls into two large parts: the case of “discrete dimensionality” and the case of “continuous dimensionality.” Each of these parts, in turn, falls into two parts: finite dimensionality and infinite dimensionality. The primary focus of this article is discrete dimensionality in the finite case, although we discuss the continuous case in the last section (where the meaning of the discrete-continuous division will become clearer). At the same time, in that section, we formulate the Pythagorean Theorem in terms of (noncommutative) conditional expectations and note its “semicommutative” nature. In this context (noncommutative, finite-continuous-dimensional, metric Euclidean Geometry), we prove a fully noncommutative version of the theorem. The next article in this series deals with discrete dimensionality in the infinite case. Arguments become more involved in that case.

Elementary, mostly finite-dimensional, variants of the Pythagorean Theorem are examined in the next section, some of them new. A converse, which we refer to as the Carpenter's Theorem, is introduced. The proof of this converse is carried out by operator–matrix methods in the third section. In this same section, we view the Pythagorean Theorem in terms of traces, in terms of indices, and in terms of stochastic matrices.

The fourth section contains a discussion of the finite-continuous case. The Carpenter's Theorem is left open in that case as a subject for later elucidation.

2. Elementary Variations

To begin with, the Pythagorean Theorem refers to “plane geometry.” Are there three-dimensional, n-dimensional, or even infinite-dimensional analogues of that theorem? Of course there are, and they are familiar—but first we must recast the theorem mildly. If we replace the two sides of the triangle by “orthogonal” axes and the hypotenuse by a vector x of length c, the “orthogonal projections” of that vector on the axes have lengths a and b satisfying a2 + b2 = c2, by virtue of the Pythagorean Theorem. This is our first variation.

By choosing vectors e1 and e2 of length 1 (unit vectors) along the positive (orthogonal) axes, the projections of x on these axes allow us to “expand” x in terms of the orthonormal basis {e1, e2} (for the plane). That is, we express x as the linear combination c1e1 + c2e2 of e1 and e2. In this case, |c1| = a, |c2| = b, and the length ∥x∥ of x is c, where a2 + b2 = c2. This is our second variation of the Pythagorean Theorem.

In this form, we can take the leap (our third variation) to Hilbert space ℋ of any dimension. With {ea}a∈𝔸 an orthonormal basis for ℋ, and x in ℋ, there is an expansion, x = ∑a∈𝔸 caea, where the equality refers to convergence of finite subsums to x in the “metric” of the Hilbert space. The inner product of vectors x and y in ℋ is denoted by 〈x, y〉, and the length (or norm) ∥x∥ of x is 〈x, x1/2. Convergence of ∑a∈𝔸 caea is over the “net” of finite subsets of 𝔸 (directed by inclusion). The Parseval equality tells us that ∥x2 = ∑a∈𝔸 |ca|2, which is a direct extension of the Pythagorean Theorem, to (Hilbert) space of any dimension.

In the context of “infinite-dimensional” Hilbert space, there is more to be said. Given a potential set of coefficients {ca}a∈𝔸, there is a (unique) vector x in ℋ with expansion ∑a∈𝔸 caea if and only if ∑a∈𝔸 |ca|2 converges (in which case ∑a∈𝔸 |ca|2 converges to ∥x2). Some aspect of this added information is present in the Pythagorean Theorem when that theorem is suitably formulated (our fourth variation): the positive numbers a and b are the lengths of the sides of a right triangle with hypotenuse of length c if and only if a2 + b2 = c2. Carpenters use this aspect to check that their work is “true.” We shall refer to this “converse” to the usual statement of the Pythagorean Theorem as the Carpenter's Theorem.

The “expansion” formulation of the Pythagorean Theorem involves projecting a vector onto orthogonal axes. We can reverse that and formulate the theorem (our fifth variation) in terms of the projections of vectors of equal length along the axes onto the line determined by a vector.

In this case, the lengths of the projections of the axis vectors of length c onto the line have lengths a and b such that a2 + b2 = c2, again as a result of the Pythagorean Theorem. It is not an essential restriction in this formulation to insist that c be 1. We are, then, projecting orthonormal basis vectors onto the line. Can something of this nature be said for orthonormal bases in higher-dimensional spaces? Our sixth variation follows.

Proposition 1.

If {ea}a∈𝔸is an orthonormal basis for the Hilbert space ℋ, then the sum of the squares of the lengths of the orthogonal projections of each eaon every one-dimensional subspace of is 1. If a real non-negative tais specified for each a and a∈𝔸 tInline graphic = 1, then a∈𝔸 taeais a unit vector x in that generates a one-dimensional subspace of on which each eahas projection of length ta.

Proof:

If x is a unit vector and 𝒱 is the one-dimensional subspace of ℋ spanned by x, then the orthogonal projection of ea on 𝒱 is 〈ea, xx and ∥〈ea, xx2 = |〈ea, x〉|2x2 = |〈ea, x〉|2. From Parseval's equality,

graphic file with name M2.gif

Because 〈ea, ∑a∈𝔸 taea〉 = ta, each ea has a projection on the one-dimensional space generated by ∑a∈𝔸 taea of length ta. ▪

Equivalently, from the Pythagorean Theorem, we can specify the distances sa from ea to the one-dimensional space subject to the condition that ∑a∈𝔸 1 − sInline graphic = 1. Of course, the question of orthogonal projections of basis elements may be asked when the projections are made onto a subspace of ℋ of dimension other than 1. What is the situation if, for example, 𝒱 is an m-dimensional subspace of ℋ? In this case, choosing an orthonormal basis {f1, … , fm} for 𝒱, we have that the projection of ea on 𝒱 is ∑Inline graphicea, fjfj of length whose square is ∑Inline graphic |〈ea, fj〉|2. Now, ∑a∈𝔸Inline graphic |〈ea, fj〉|2 converges, because all terms are real and non-negative, and

graphic file with name M7.gif

from Parseval's equality. We have proved our seventh variation.

Proposition 2.

The sums of the squares of the lengths of the projections of the elements of an orthonormal basis for a Hilbert space onto an m-dimensional subspace of is m.

Our eighth variation is an interesting, although small, alteration of Proposition 2. We emphasize it as our definitive (geometric) formulation of the finite-dimensional Pythagorean Theorem because it puts in evidence a property that will play an important role in our extension of the Carpenter's Theorem to infinite dimensions.

Proposition 3.

If a is the sum of the squares of the lengths of the projections of r elements of an orthonormal basis {e1, … , en} for an n-dimensional Hilbert space onto an m-dimensional subspace 0, and b is the sum of the squares of the projections of the remaining nr basis elements on the orthogonal complement ℋ′0, then

graphic file with name M8.gif

Proof:

If aj is the square of the length of the projection of ej on ℋ0, then 1 − aj is the square of the length of its projection on ℋ′0. Thus a = a1 + ⋯ + ar, b = 1 − ar+1 + ⋯ + 1 − an, and m = a1 + ⋯ + an from Proposition 2. It follows that

graphic file with name M9.gif

Another proof, one that does not make use of Proposition 2, which is not available, of course, in the infinite-dimensional case, follows. Let {e1, … , en} be an orthonormal basis for ℋ. Let {f1, ⋯ , fm} and {fm+1, ⋯ , fn} be orthonormal bases for ℋ0 and ℋ′0, respectively. The projection y of ej on ℋ0 is ∑Inline graphicej, fkfk, and ∥y∥2 = Inline graphic |〈ej, fk〉|2. Thus a = ∑Inline graphicInline graphic |〈ej, fk〉|2. The projection of ej on ℋ′0 is ∑Inline graphicej, fkfk, and the square of its length is ∑Inline graphic |〈ej, fk〉|2, which is 1 − ∑Inline graphic |〈ej, fk〉|2 because 1 = ∥ej2 = ∑Inline graphic |〈ej, fk〉|2, from Parseval's equality. Thus

graphic file with name M18.gif
graphic file with name M19.gif

and

graphic file with name M20.gif
graphic file with name M21.gif
graphic file with name M22.gif
graphic file with name M23.gif

We note, especially, that the difference ab is an integer however we split the basis for projection onto ℋ0 and ℋ′0. If we move a basis element from those projected onto ℋ′0 to those projected onto ℋ0, we increase the difference by 1; if we move a basis element in the opposite sense, we decrease the difference by 1, clearly not affecting the integrality of the difference. In the next section, we introduce matrix methods and give another proof.

Once again, we can ask whether the lengths of the projections of the basis elements can be specified subject to the condition that the sum of their squares is m (the Carpenter's Theorem for this case). That is, given such a specification, is there an m-dimensional subspace of ℋ on which the projections of the basis elements have those lengths? Equivalently, from the Pythagorean Theorem, can we find an m-dimensional subspace of ℋ from which the basis elements have specified distances not greater than 1, subject to the condition that subtracting their squares from 1 produces numbers that sum to m? The affirmative answer to these questions provides our ninth and tenth variations. Their proof requires more involved arguments.

3. Operator–Matrix Methods

We assume, first, that ℋ has finite dimension n, and that {e1, … , en} is an orthonormal basis for ℋ. Let ℋ0 be an m-dimensional subspace of ℋ and E the orthogonal projection of ℋ onto ℋ0. If (ajk) is the matrix of E relative to {ej}, then ajk = 〈Eek, ej〉 for all k and j. Since E = E* = E2,

graphic file with name M24.gif
graphic file with name M25.gif

and ∑Inline graphic ajj = ∑Inline graphicEej2. It follows that the sum of the squares of the lengths of the projections of the basis elements e1, … , en onto ℋ0 is the trace of E. Of course, there is a unitary transformation U of ℋ onto itself that maps ℋ0 onto the m-dimensional space generated by {e1, … , em}. The projection Em that has that space as its range has matrix (bjk) relative to {ej} such that b11, … , bmm are 1 and bm+1 m+1, … , bnn are 0. Since UEU−1 = Em, E and Em have the same trace m. This proves again our sixth variation, tr(E) = ∑Inline graphicEej2 = m, where “tr” is the functional that assigns to a matrix its usual (non-normalized) trace, the sum of its diagonal entries. It tells us, too, that another (our eleventh) variation of the Pythagorean Theorem is the assertion:

graphic file with name M29.gif

From these same considerations, we see that prescribing the squares of the lengths of the projections of the basis elements on an m-dimensional subspace of ℋ amounts to prescribing the diagonal of the matrix, relative to that basis, of the projection with that subspace as range. Our Carpenter's Theorem question, in this case, becomes:

Is an ordered n-tuple, 〈a1, … , anof numbers in [0, 1] with sum m the diagonal of some idempotent self-adjoint n × n matrix?

This has an affirmative answer. (Together with the ninth variation, it provides an extension, our twelfth variation, of the fourth variation.) For its proof, we make use of a variant of a combinatorial–geometric lemma used in ref. 1.

Definition 4:

With (a1, … , an) (= ã) a point in ℝn and Π the group of permutations of {1, … , n}, we let 𝒦ã be the (closed) convex hull of {(aπ(1), … , aπ(n)) (= π(ã)): π ∈ Π} (= Π(ã)). We refer to 𝒦ã as the permutation polytope generated by ã.

Lemma 5.

If a1a2 ≥ ⋯ ≥ an, b1b2 ≥ ⋯ ≥ bn, and a1 + ⋯ + an = b1 + ⋯ + bn, then the following are equivalent:

(i)

(b1, … , bn) (= b̃) ∈ 𝒦ã;

(ii)

b1a1, b1 + b2a1 + a2, … , b1 + ⋯ + bn−1a1 + ⋯ + an−1;

(iii)

There are points (aInline graphic, … , aInline graphic) (= ã1), … , (aInline graphic, … , aInline graphic) (= ãn) in 𝒦ã such that ã1 = ã, ãn = b̃, and ãk+1 = tãk + (1 − t)τ(ãk) for each k in {1, … , n − 1}, some transposition τ in Π, depending on k, and some t in [0, 1], depending on k.

Proof:

(i)→(ii). From the assumption that a1 ≥ ⋯ ≥ an, we conclude that a1 + ⋯ + ajaπ(1) + ⋯ + aπ(j), for each j in {1 … , n} and π in Π. Thus for each convex combination b̃ of points in Π(ã) and j in {1, … , n}, b1 + ⋯ + bja1 + ⋯ + aj.

(iii)→(i). As π(d̃) ∈ Π(ã) when d̃ ∈ Π(ã), π(c̃) ∈ 𝒦ã when c̃ ∈ 𝒦ã. Thus ã1 = ã ∈ 𝒦ã, ã2 = tã1 + (1 − t)τ(ã1) ∈ 𝒦ã, … , b̃ = ãn = t′ãn−1 + (1 − t′)τ′(ãn−1) ∈ 𝒦ã.

(ii)→(iii). If b1 < aj for all j in {2, … , n}, then bjb1 < aj for all such j, and b1 + ⋯ + bn < a1 + ⋯ + an, contrary to assumption. Let m be the smallest number in {2, … , n} such that amb1. Since amb1a1, there is a t in [0, 1] such that b1 = ta1 + (1 − t)am. Let τ be the transposition that interchanges 1 and m. Let ã1 be ã and ã2 be tã1 + (1 − t)τ(ã1). Then

graphic file with name M34.gif
graphic file with name M35.gif

As bm−1bm−2 ≤ ⋯ ≤ b1 < am−1 ≤ ⋯ ≤ a2, by choice of m,

graphic file with name M36.gif
graphic file with name M37.gif

If mjn − 1, then aInline graphic + ⋯ + aInline graphic = a1 + ⋯ + ajb1 + ⋯ + bj.

Suppose now that we have constructed ã1, … , ãj such that ãk+1 = tãk + (1 − t)τ(ãk) for each k in {1, … , j − 1} (t ∈ [0, 1] and τ is a transposition in Π depending on k), such that b1 = aInline graphic, … , bk−1 = aInline graphic for each k in {2, … , j} and

graphic file with name M42.gif
graphic file with name M43.gif
graphic file with name M44.gif

for each k in {1, … , j}. Then

graphic file with name M45.gif
graphic file with name M46.gif

Hence bjaInline graphic. In addition, for k in {1, ⋯ , j − 1},

graphic file with name M48.gif
graphic file with name M49.gif

Thus aInline graphicbnbj, because b1 + ⋯ + bn−1aInline graphic + ⋯ + aInline graphic. Let m be the smallest number in {j + 1, … , n} such that aInline graphicbj. Then

graphic file with name M54.gif

(∗)

Because aInline graphicbjaInline graphic, there is a t in [0, 1] such that bj = taInline graphic + (1 − t)aInline graphic. Let τ be the transposition that interchanges j and m, and let ãj+1 be tãj + (1 − t)τ(ãj). Then

graphic file with name M59.gif
graphic file with name M60.gif

If j + 1 = n, we are through. If not, we must show that b1 + ⋯ + bkaInline graphic + ⋯ + aInline graphic, for each k in {1, … , n − 1}, to carry the construction forward. If 1 ≤ kj, then

graphic file with name M63.gif

If j + 1 ≤ km − 1, then from (∗),

graphic file with name M64.gif
graphic file with name M65.gif
graphic file with name M66.gif

Finally, if mkn − 1, then

graphic file with name M67.gif

Theorem 6.

Let ϕ be the mapping that assigns to each self-adjoint n × n matrix (ajk) the point (a11, … , ann) (= ã) in n, 𝒦mbe the range of ϕ restricted to the set 𝒫mof projections of rank m, where m ∈ {0, … , n}, and 𝒦 be the range of ϕ restricted to the set 𝒫 of projections. Then ã ∈ 𝒦mif and only if 0 ≤ ajj ≤ 1, for each j and Inline graphic ajj = m, and ã ∈ 𝒦 if and only if 0 ≤ ajj ≤ 1, for each j, and Inline graphic ajj ∈ {0, … , n}.

Proof:

Let (ajk) (= A) be a self-adjoint matrix and U be the unitary matrix with ξ sin θ, sin θ at the j, j and k, k entries, respectively, −cos θ, ξ cos θ at the j, k and k, j entries, respectively, 1 at all diagonal entries other than j, j and k, k, and 0 at all other entries, where ξ is a complex number of modulus 1 such that Inline graphic = −ξajk. Then UAU−1 has ajj sin2 θ + akk cos2 θ at the j, j entry, ajj cos2 θ + akk sin2 θ at the k, k entry, and ahh at the hh entry when hj, k. Letting t be sin2 θ, τ be the transposition of {1, … , n} that interchanges j and k, and ãτ be (aτ(1),τ(1), … , aτ(n), τ(n)), we see that

graphic file with name M71.gif

Because VEV−1 ∈ 𝒫m for each unitary V, when E ∈ 𝒫m, we see that, when ã ∈ 𝒦m, so is tã + (1 − tτ, for each t in [0, 1] and each transposition τ of {1, … , n}.

As noted, tã + (1 − tτ ∈ 𝒦m when ã ∈ 𝒦m, for each t in [0, 1] and each transposition τ of {1, … , n}. From Lemma 5, 𝒦m contains the permutation polytope 𝒦ã of each ã in 𝒦m. Now the point ã whose first m coordinates are 1 and whose last nm coordinates are 0 is in 𝒦m. If b̃ = (b1, … , bn), 0 ≤ bj ≤ 1 for each j in {1, … , n} and ∑Inline graphic bj = m, then it follows that b1 ≤ 1, b1 + b2 ≤ 1 + 1, … , b1 + ⋯ + bmm, b1 + ⋯ + bm+1m + 0, … , b1 + ⋯ + bn−1m. Again, from our lemma, b̃ ∈ 𝒦ã ⊆ 𝒦m. Thus 𝒦m is as described in the statement. In particular, 𝒦m is convex.

Since 𝒦 = ∪Inline graphic 𝒦m, 𝒦 is as described in the statement. ▪

We present another proof of our twelfth variation (Theorem 6) and extend the information contained there slightly, to yield our thirteenth variation. Specifically, we prove the following result.

Theorem 7.

If a1, … , anis an ordered n-tuple of numbers in [0, 1] with sum a positive integer, then there is an idempotent self-adjoint n × n matrix with diagonal entries a1, … , anand all entries real.

Proof:

Our proof proceeds by induction on m, the sum of a1, … , an. In the case where m is 1, we let E1 be the projection matrix (acting on ℂn in the standard manner) that has range spanned by the vector (x =) (aInline graphic, … , aInline graphic). Let {ej} be the orthonormal basis for ℂn where ej is the n-tuple with 1 at the jth coordinate and 0 at all others. The matrix for E1 relative to this basis has 〈E1ek, ej〉 as its j, kth entry. Since

graphic file with name M76.gif

each entry of the matrix is a non-negative real number (positive, when no aj is 0, and perforce none is 1 in this case, unless n = 1). The jth diagonal entry is 〈E1ej, ej〉 (= aj ≥ 0), as desired.

We take the inductive step. Suppose our assertion has been established when a1, … , an has sum m − 1 (where m is an integer 2 or greater). Assume that a1 + ⋯ + an = m. Let k be the smallest integer j for which a1 + ⋯ + ajm − 1 and a be m − 1 − ∑Inline graphic ar. By inductive hypothesis, there is a self-adjoint idempotent E2 with matrix (ajr) relative to the basis {ej}, such that each ajr is real, with diagonal a1, … , ak−1, a, 0, … , 0. Let F2 be E2 with the k + 1, k + 1 entry replaced by 1. Each ajr with j or r greater than k is 0 (since E2 ≥ 0). Hence F2 is a projection. Let Wk(θ) be the unitary operator whose matrix relative to the basis {ej} has sin θ at the k, k and k + 1, k + 1 entries, −cos θ and cos θ at the k, k + 1 and k + 1, k entries, respectively, 1 at all other diagonal entries, and 0 at all other off-diagonal entries. Let p(k, θ, F2) be Wk(θ)F2Wk(θ)*. Relative to the basis {ej}, the matrix of p(k, θ, F2) has diagonal entries a1, … , ak−1, a sin2 θ + cos2 θ, a cos2 θ + sin2 θ, 0, … , 0. The j, r entry is ajr when both j and r do not exceed k − 1 and 0 when either j or r is greater than k + 1. The entries in the kth row of the matrix for p(k, θ, F2) are ak1 sin θ, … , akk−1 sin θ, a sin2 θ + cos2 θ, (a − 1) sin θ cos θ, 0, … , 0. The entries in the k + 1st row are ak1 cos θ, … , akk−1 cos θ, (a − 1) sin θ cos θ, a cos2 θ + sin2 θ, 0, … , 0. The entries in the kth column are a1k sin θ, … , ak−1k sin θ, a sin2 θ + cos2 θ, (a − 1) sin θ cos θ, 0, … , 0 and in the k + 1st column are a1k cos θ, … , ak−1k cos θ, (a − 1) sin θ cos θ, a cos2 θ + sin2 θ, 0, … , 0.

By choice of k, m − 1 ≤ ∑Inline graphic ar + ak, whence a = m − 1 − ∑Inline graphic arak ≤ 1. For an appropriate choice θ2 of θ, a sin2 θ2 + cos2 θ2 = ak, and

graphic file with name M80.gif

Let p(k, θ2, F2) be F3. Each entry in the matrix for F3 is real.

The projection p(k + 1, θ, F3) has as its diagonal entries a1, … , ak, (∑Inline graphic ar) sin2 θ, (∑Inline graphic ar) cos2 θ, 0, … , 0. Again, for an appropriate choice θ3 of θ, (∑Inline graphic ar) sin2 θ3 = ak+1. Thus the projection p(k + 1, θ3, F3) (= F4) has as its diagonal entries a1, … , ak+1, ∑Inline graphic ar, 0, … , 0. We continue with this construction, forming p(k + 2, θ, F4) next and so forth, until we consider p(n − 1, θ, Fnk+1). Choosing θnk+1 appropriately, we let Fnk+2 be the self-adjoint idempotent matrix p(n − 1, θnk+1, Fnk+1). The diagonal entries of the matrix for Fnk+2 are a1, … , an−1, an, and all entries are real. ▪

Remark 8.

When we constructed p(k, θ, F2) in the preceding argument, if F2 is replaced by A with the same matrix except that the k + 1, k + 1 entry is b rather than 1, then the matrix of p(k, θ, A) has diagonal entries a1, … , ak−1, a sin2 θ + b cos2 θ, a cos2 θ + b sin2 θ, 0, … , 0. The entries of the kth and k + 1st rows and columns remain the same except that “(a − 1)” becomes “(ab),” and “a sin2 θ + cos2 θ” and “a cos2 θ + sin2 θ” become “a sin2 θ + b cos2 θ” and “a cos2 θ + b sin2 θ.” All other entries remain the same. This general process of transforming a matrix by our unitary matrix so that two segments of the diagonal are altered by replacing their terminal and initial elements by convex combinations of the two in such a way that the sum of the original elements is the same as the sum of the replacements will be referred to as splicing.

We have applied this general construction once, when b is 1 and for the rest with 0 for b. With 1 for b, (a − 1) sin θ2 cos θ2 appears at the k + 1, k and k, k + 1 entries of F3. Because a < 1 and θ2 ∈ (0, π/2), in general these entries are negative, even though E1 has a matrix of non-negative real entries, and F2 may have all its entries real and non-negative. At a lecture on this topic, Frank Hansen raised the possibility of constructing our projection with specified diagonal so that all its entries are real and non-negative. This is accomplished in the case of a one-dimensional projection by the construction given in Proposition 1. It would be interesting to know whether this is possible in general, and whether the construction can be altered to produce such a projection.

Remark 9.

As noted at the end of Section 2, the question of whether there is an m-dimensional subspace of our n-dimensional Hilbert space from which the elements of a given orthonormal basis {e1, … , en} have distances r1, … , rn, respectively, is equivalent, by the Pythagorean Theorem, to the existence of such a subspace on which e1, … , en have orthogonal projections of lengths t1, … , tn, respectively, where rInline graphic + tInline graphic = 1. Because this latter question is answered affirmatively by Theorem 6 if and only if tInline graphic + ⋯ + tInline graphic = m, and ∑Inline graphic rInline graphic + tInline graphic = n, the former question is answered affirmatively if and only if 0 ≤ rj ≤ 1 and rInline graphic + ⋯ + rInline graphic = nm. This last variation, our fourteenth, is equivalent to the assertion that there is an “m-plane” through the origin tangent to each of the spheres S1, … , Sn with centers at e1, … , en and radii r1, … , rn, respectively, if and only if 0 ≤ rj ≤ 1 and rInline graphic + ⋯ + rInline graphic = n − m.

Remark 10.

Another proof of the formula of Proposition 3 was promised earlier. With the notation established in that proposition and its proof, let F be the projection of ℋ onto ℋ0 and E the projection with range spanned by {e1, … , er}. Then a = ∑Inline graphicFej, ej〉 = tr(EFE) and b = ∑Inline graphic 〈(IF)ej, ej〉 = tr((IE)(IF)(IE)). Thus

graphic file with name M98.gif
graphic file with name M99.gif
graphic file with name M100.gif

Formulated in matrix terms, this equality takes on the following form: If a is the sum of any r elements of the diagonal of an n × n matrix of a projection of rank m, and b is the sum of the result of subtracting each of the remaining nr diagonal elements from 1, then ab = mn + r. In these same matrix terms, a (= tr((FE)*FE)) is the trace of the principal upper r × r block of the matrix for F (relative to {ej}) and also the sum of the squares of the absolute values of the entries in the matrix for FE, that is, the sum of those squares for the entries in the first r columns of the matrix for F. This sum of squares is the square of the Hilbert–Schmidt norm of FE (and of EF). We write “∥FEInline graphic” for that sum. (More will be said about this in the infinite-dimensional case.) In this notation, our formula is ∥FEInline graphic − ∥(IF)(IE)∥Inline graphic = mn + r. Surprisingly (at first sight), ∥EFEInline graphic − ∥(IE)(IF)(IE)∥Inline graphic is also mn + r. To prove this, note that ∥EFEInline graphic is the sum of the squares of the absolute values of the matrix entries of the principal upper r × r block of the matrix for F, and ∥(IE)(IF)(IE)∥Inline graphic is the same sum for the principal lower (nr) × (nr) block of IF; their difference is ∥FEInline graphic − ∥(IE)(IF)∥Inline graphic (at the same time ∥(IE)(IF)∥Inline graphic is ∥(IF)(IE)∥Inline graphic). In addition,

graphic file with name M112.gif

by a straightforward trace computation of the type we used in proving the formula tr(EFE) − tr((IE)(IF)(IE)) = mn + r.

With the notation established in this remark, if we assume that the ranges of E and F and of their complements have intersections (0), then we may view ab (= mn + r) as the index of E(IF). To see this, note that the null space of E(IF) is F(ℋ) ∨ ((IF)(ℋ) ∧ (IE)(ℋ)), which is F(ℋ), by assumption. (See ref. 3, proposition 2.5.14.) The null space of (IF)E (= [E(IF)]*) is (IE)(ℋ) ∨ (E(ℋ) ∧ F(ℋ)), which is (IE)(ℋ), by assumption. Thus the index of the operator E(IF) is m − (nr) (= mn + r).

Remark 11.

Another approach to proving our formula “ab = mn + r” results from stochastic–matrix methods. We describe the stochastic matrices, introducing some terminology and establishing some basic facts that will be useful to us. Although our present interest is the finite discrete case, these stochastic–matrix considerations will reappear in the infinite case.

For later use, we develop the basics in the infinite case as well as the finite. We deal with matrices having complex entries and the property that each row and each column sums to r. If r is 1 and all entries are non-negative real numbers, the matrices are the well-studied doubly stochastic matrices (the entries representing stationary transition probabilities from one state of a discrete Markov process to another). If an “r-sum” matrix has n rows and m columns (with n and m finite and r non-zero), then n = m for summing each row, and then all the sums yields nr as the sum of all entries while summing each column, and then all those sums yields mr as the sum of all matrix entries.

We say that the submatrix A0 of a matrix A whose rows are indexed by a set 𝔸 and whose columns are indexed by a set 𝔹 consisting of those entries in the rows corresponding to a given subset 𝔸0 of 𝔸 and, at the same time, in the columns corresponding to a subset 𝔹0 of 𝔹 is a block (in A, the 𝔸0, 𝔹0 block). The complementary block A0 to A0 is the 𝔸′0, 𝔹′0 block in A, where 𝔸′0 = 𝔸∖𝔸0 and 𝔹′0 = 𝔹∖𝔹0. The weight w(A0) of the block A0 is the sum of its entries. In the case where A0 and hence A have an infinite number of entries, this sum is taken over the net of finite subsums, directed by inclusion, provided that net converges. If the entries of A are non-negative real numbers, r is positive, and A is infinite, then w(A) is ∞, for each row sums to r and there are an infinite number of rows. Of course, w(A0) is finite when A0 is a finite block. In this case, the sum of the entries in the (finite number of) rows and columns corresponding to A0 is finite, whence w(A0), the sum of the remaining entries in A, is ∞ (still under the assumption that A is an infinite matrix). Despite these observations, there are infinite blocks A0, with infinite complements A0, such that w(A0) and w(A0) are both finite. The article on the infinite discrete case to follow this article will contain a description of a method for generating such blocks.

The differences of the weights of complementary blocks of doubly stochastic matrices are intimately related to the Pythagorean Theorem. To describe that relation, we note first that each pair of orthonormal bases {ej}j∈ℤ0 and {fj}j∈ℤ0 of a Hilbert space ℋ, where ℤ0 = ℤ+ ∪ ℤ, ℤ+ are the positive integers, and ℤ are their negatives, gives rise to a doubly stochastic matrix. If ajk = |〈ej, fk〉|2, then ∑k∈ℤ0 ajk = ∥ej2 = 1 for each j in ℤ0, from Parseval's equality, because ej = ∑k∈ℤ0ej, fkfk. Symmetrically, ∑j∈ℤ0 ajk = ∥fk2 = 1 for each k in ℤ0. Thus (ajk) is a doubly stochastic infinite matrix. If U is the unitary operator on ℋ such that Ufj = ej for each j in ℤ0, then 〈Ufj, fk〉 = 〈ej, fk〉 = ukj, the k, j entry of the matrix for U corresponding to the basis {fj}. Thus |ukj|2 = ajk.

In the case of finite doubly stochastic matrices, we derive a formula relating the weights of complementary blocks (a “Pythagorean Theorem” for doubly stochastic matrices) that provides us with another proof of our formula, ab = mn + r.

Proposition 12.

If A is an n × n doubly stochastic matrix and A0is a block in A with p rows and q columns, then

graphic file with name M113.gif

Proof:

The sum of the p rows of A corresponding to the p rows of A0 is p, and the sum of the nq columns of A corresponding to the columns of A0 is nq. The difference of these sums, pn + q, is w(A0) − w(A0). ▪

Given an orthonormal basis {e1, … , en} for the n-dimensional Hilbert space ℋ and an m-dimensional subspace ℋ0 with orthogonal complement ℋ′0, choose orthonormal bases {f1, … , fm} and {fm+1, … , fn} for ℋ0 and ℋ′0, respectively, and let ajk be |〈ej, fk〉|2. As noted in the infinite-dimensional case, (ajk) is a doubly stochastic matrix A, an n × n matrix, in this case. If A0 is the r × m block whose entries are ajk with j in {1, … , r} and k in {1, … , m}, and F is the projection of ℋ onto ℋ0, then Fej is ∑Inline graphicej, fkfk and (IF)ej is ∑Inline graphicej, fkfk. Thus ∥Fej2 is the sum of the jth row of A0, when 1 ≤ jr, and ∥(IF)er+j2 is the sum of the jth row of A0, when 1 ≤ jnr. Thus these sums are aj and 1 − ar+j, respectively, where ap is the pth diagonal entry of the matrix for F relative to the basis {e1, … , en}. It follows that ∑Inline graphic aj − ∑Inline graphic 1 − aj = w(A0) − w(A0) = mn + r from Proposition 12. Again, with a the sum of the squares of the lengths of the projections of the r elements e1, … , er of {e1, … , en} onto ℋ0 and b the sum of the squares of the lengths of the projections of the remaining nr basis elements onto ℋ′0, ab = mn + r.

4. Finite Continuous Dimensionality

The Pythagorean and Carpenter's Theorems, in the form of Proposition 2 and its operator–matrix variant (referring to the trace of a rank m projection), deal with projections on an n-dimensional Hilbert space ℋ and the diagonals of their matrices with respect to a given orthonormal basis. Denoting by “ℬ(ℋ)” the algebra of all (bounded) operators on ℋ (also when ℋ is infinite dimensional) and by “𝒜” the algebra of all operators in ℬ(ℋ) with diagonal matrices relative to the given basis, we have that 𝒜 is a maximal abelian self-adjoint subalgebra of ℬ(ℋ) (a “masa,” that is, if TA = AT for each A in 𝒜, then T ∈ 𝒜, and A* ∈ 𝒜 when A ∈ 𝒜). The masas are precisely the subalgebras of ℬ(ℋ) whose matrices are diagonal relative to some fixed orthonormal basis for ℋ. For our purposes, orthonormal bases and masas are interchangeable. The mapping Φ that assigns to T in ℬ(ℋ) the element Φ(T) in the masa 𝒜 corresponding to the diagonal of the matrix for T (relative to the orthonormal basis associated with 𝒜) has special properties. It is linear [from ℬ(ℋ) onto 𝒜], maps positive operators to positive operators, and maps the identity operator I in ℬ(ℋ) to I. With A and B in 𝒜, we have that Φ(ATB) = AΦ(T)B. A mapping such as Φ is said to be a conditional expectation [of ℬ(ℋ) onto 𝒜]. For this Φ, tr(TA) = tr(Φ(T)A) for each A in 𝒜. In particular, tr(T) = tr(Φ(T)), for each T in ℬ(ℋ). Conversely, if tr(T) = tr(Φ′(T)) when T ∈ ℬ(ℋ), for a conditional expectation Φ′ of ℬ(ℋ) onto 𝒜, then, again, tr(TA) = tr(Φ′(TA)) = tr(Φ′(T)A), for each A in 𝒜. Thus tr([Φ(T) − Φ′(T)]A) = 0, for each A in the algebra 𝒜, and tr([Φ(T) − Φ′(T)][Φ(T) − Φ′(T)]*) = 0. It follows that Φ(T) − Φ′(T) = 0 for each T in ℬ(ℋ) and Φ = Φ′. When the conditional expectation Φ has the property that tr(T) = tr(Φ(T)), for each T in ℬ(ℋ), we say that Φ lifts the trace [from 𝒜 to ℬ(ℋ)]. We have just proved that there is a unique conditional expectation of ℬ(ℋ) onto a masa that lifts the trace.

In these terms, with E a projection in ℬ(ℋ), tr(Φ(E)) is the sum of the squares of the lengths of the projections onto the range of E of the basis vectors corresponding to 𝒜, where tr is the unique linear functional on ℬ(ℋ) such that tr(I) = n and tr(AB) = tr(BA), for all A and B in ℬ(ℋ), and tr(E) is the rank m of E. Thus the equality

graphic file with name M118.gif

(∗)

is the Pythagorean Theorem as expressed in Proposition 2. In these same terms, the Carpenter's Theorem states that if A ∈ 𝒜, 0 ≤ AI, and tr(A) = m, then there is a projection E in ℬ(ℋ) (necessarily of rank m), such that Φ(E) = A.

Trace considerations will play a role when we discuss the Pythagorean Theorem for the case of an infinite-dimensional projection in the next article, although there is no trace functional defined on all of ℬ(ℋ) when ℋ is infinite dimensional. There are, however, subalgebras of ℬ(ℋ), the factors of type II1, that serve as an infinite-dimensional generalization of ℬ(ℋ) when ℋ is finite dimensional that are, in many ways, a more appropriate replacement than the infinite-dimensional ℬ(ℋ). For one thing, these factors have a (unique) trace functional defined on them. For another, they are simple algebras, whereas the infinite-dimensional B(ℋ) is not. They can be characterized as the simple algebras consisting of all operators commuting with a self-adjoint operator algebra and admitting a trace.

Examples of factors ℳ of type II1 are provided by (countably) infinite (discrete) groups G, each of whose conjugacy classes, other than that of the unit e of G, is infinite (i.c.c groups). Let ℋ be l2(G), the Hilbert space of complex-valued functions ϕ on G such that ∑gG |ϕ(g)|2 < ∞, provided with the inner product: 〈ϕ, ψ〉 = ∑gG ϕ(g)Inline graphic. If Rhϕ(g) = ϕ(gh), for each ϕ in ℋ and g in G, then Rh is a unitary operator on ℋ (right translation by h). The family {T:TRh = RhT, hG} [those operators in ℬ(ℋ) commuting with all Rh], denoted by “ℒG” is a factor of type II1 (the “left von Neumann group algebra” of G).

Let ℳ be a factor of type II1 and τ the unique “tracial state” on ℳ [characterized as a linear functional on ℳ such that τ(I) = 1 and τ(AB) = τ(BA), for each A and B in ℳ, but possessing many other properties]. Each spectral projection E for a self-adjoint A in ℳ is a limit on vectors (“strong-operator” limit) of a sequence pn(A), where pn is a polynomial function on the reals. Thus TE = ET when TA = AT, and E ∈ ℳ. It follows that ℳ is generated by (the “norm closure” of the linear span of) the projections in ℳ. Restricted to these projections, τ is a “dimension function”—τ(E) being the dimension of the range of E “relative to ℳ.” In the case of ℬ(ℋ), where ℋ has finite dimension n, we used “tr” in place of τ, and tr(I) is n as is appropriate, because there are minimal projections in ℬ(ℋ). There are no minimal projections in a factor of type II1 and no “natural” projection to which to assign trace (“dimension”) 1 other than I. The structural properties of factors ℳ of type II1 allow us to conclude that for each real number a in [0, 1], there are projections E in ℳ such that τ(E) = a; that is, the range of the dimension function on ℳ is the entire closed unit interval [0, 1]. Thus the factors of type II1 provide us with a natural extension of the finite-dimensional ℬ(ℋ) to a central simple algebra in which each of the projections has finite “rank” and the dimensions of the projections form a “continuous” range of values.

An orthonormal basis relative to ℳ is precisely what we arrived at in the case of ℬ(ℋ), with ℋ finite dimensional, that is, a masa 𝒜 in ℳ. In this case, there is a conditional expectation Φ of ℳ onto 𝒜 that lifts the trace, although it is more complicated to construct than passing to the diagonal of a matrix (see p. 403 of ref. 2). The paraphrased version of (∗),

graphic file with name M120.gif

(∗∗)

is the Pythagorean Theorem for the case of finite continuous dimensionality (that is, in a factor of type II1). The Carpenter's Theorem for this case asserts that each A in 𝒜 such that 0 ≤ AI is Φ(E) for some projection E in ℳ. This will be proved in a later article.

A factor ℳ of type II1 may be thought of and studied as a noncommutative algebra of (bounded) measurable functions on a (noncommutative) measure space, the projections in ℳ serving as the “characteristic” (or “indicator”) functions on the measure space. In the case of a classical measure space, the algebra of bounded measurable functions on the space is (isomorphic to) a masa in some ℬ(ℋ). Our Pythagorean Theorem describes a property (“lifting the trace”) of a mapping (conditional expectation) from the projections in ℳ to a masa 𝒜 in ℳ. So that theorem describes a certain (trace, that is, integral) property of a mapping from a noncommutative, finite, continuous measure algebra (ℳ) to a commutative measure algebra (𝒜). In that sense, it is a semicommutative result in the metric Euclidean geometry of spaces with finite continuous dimensionality.

Let 𝒩 be a von Neumann subalgebra of a factor ℳ of type II1 (that is, 𝒩 is a self-adjoint subalgebra of ℳ consisting of all operators that commute with some other self-adjoint algebra). By techniques akin to those used to prove classical Radon–Nikodým results (suitably modified to apply to the case of noncommutative measure spaces), it was shown (in 1950) that there is a (unique) conditional expectation Φ of ℳ onto 𝒩 that lifts the trace. Thus τ(E) = τ(Φ(E)) for each projection E in ℳ. If 𝒩 is noncommutative, for example, if it is a subfactor of ℳ, then the domain and range of Φ are noncommutative. In that case, the equality, τ(E) = τ(Φ(E)), is a (fully) noncommutative version of the Pythagorean Theorem. Again, the Carpenter's Theorem would describe the range of Φ restricted to the projections in ℳ. There is even a version of Proposition 3 that is valid in a factor ℳ of type II1. If 𝒜 is a masa in ℳ, E is a projection in 𝒜, and F is a projection in ℳ, then

graphic file with name M121.gif

The computation of Remark 10 applies to prove this.

The Pythagorean investigation can be extended to include C* algebras with faithful tracial states and their C* subalgebras. Under what conditions are there trace-lifting conditional expectations, and what are their ranges when restricted to the projections in the algebra?

Footnotes

Hansen, F., May 3, 2000, Copenhagen.

References

  • 1.Kadison R, Pedersen G K. Math Scand. 1985;57:249–266. [Google Scholar]
  • 2.Kadison R, Ringrose J. Fundamentals of the Theory of Operator Algebras. Vol. 4. Providence, RI: Am. Math. Soc.; 1998. [Google Scholar]
  • 3.Kadison R, Ringrose J. Fundamentals of the Theory of Operator Algebras. Vol. 1. Providence, RI: Am. Math. Soc.; 1997. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES