Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 15.
Published in final edited form as: Proc Mach Learn Res. 2025 Jul;267:55511–55532.

Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications

Maria Despoina Siampou 1,*, Jialiang Li 2,*,, John Krumm 1, Cyrus Shahabi 1, Hua Lu 3,§
PMCID: PMC13078377  NIHMSID: NIHMS2157897  PMID: 41988512

Abstract

Encoding geospatial objects is fundamental for geospatial artificial intelligence (GeoAI) applications, which leverage machine learning (ML) models to analyze spatial information. Common approaches transform each object into known formats, like image and text, for compatibility with ML models. However, this process often discards crucial spatial information, such as the object’s position relative to the entire space, reducing downstream task effectiveness. Alternative encoding methods that preserve some spatial properties are often devised for specific data objects (e.g., point encoders), making them unsuitable for tasks that involve different data types (i.e., points, polylines, and polygons). To this end, we propose Poly2Vec, a polymorphic Fourier-based encoding approach that unifies the representation of geospatial objects, while preserving the essential spatial properties. Poly2Vec incorporates a learned fusion module that adaptively integrates the magnitude and phase of the Fourier transform for different tasks and geometries. We evaluate Poly2Vec on five diverse tasks, organized into two categories. The first empirically demonstrates that Poly2Vec consistently outperforms object-specific baselines in preserving three key spatial relationships: topology, direction, and distance. The second shows that integrating Poly2Vec into a state-of-the-art GeoAI workflow improves the performance in two popular tasks: population prediction and land use inference.

1. Introduction

The increasing availability of geospatial data from sources such as satellites, ground-based sensors, and crowdsourced platforms like OpenStreetMap (OSM)1 (Lee & Kang, 2015; Jokar Arsanjani et al., 2015; Basiri et al., 2019), combined with the recent advancements in machine learning (ML) (Vaswani, 2017; Bommasani et al., 2021), has fueled significant progress in geospatial artificial intelligence (GeoAI) (Smith, 1984; Couclelis, 1986; Janowicz et al., 2020; Gao et al., 2023). GeoAI leverages ML models to analyze geospatial objects, such as points of interest (POIs), building footprints, and vehicle trajectories, thereby extracting valuable insights that enable a variety of decisionmaking applications, including transportation network optimization (Li et al., 2017; Mirowski et al., 2018), urban planning (Zhang et al., 2021; Wu et al., 2022), energy management (Sun et al., 2020), and improved emergency response strategies (Kyrkou et al., 2022), to name a few.

A fundamental step in GeoAI pipelines is the transformation of geospatial data into latent representations that can be easily processed by ML models, a step formally known as encoding. A common approach to encoding converts coordinate-based geospatial data into formats compatible with established feature extraction models. Although effective for specific tasks, this conversion often discards crucial spatial information, significantly limiting the generalizability of these models. For example, building footprints are frequently rasterized into images and processed with visionbased models for urban prediction tasks (Li et al., 2023; Balsebre et al., 2024). While this approach captures object shapes, it neglects important spatial relationships, such as the relative positioning and alignment of objects within the area. Similarly, POIs that are represented as text, by using attributes like category as input to language-based models, capture semantic relationships but omit precise spatial locations (Huang et al., 2022). As a result, these approaches are application-specific and struggle to generalize across tasks that require a deeper understanding of spatial relationships.

To address the aforementioned limitations, spatially explicit encoding techniques have been proposed. These methods preserve crucial spatial properties, while remaining compatible with downstream ML models. For instance, Theory (Mai et al., 2020) encodes the absolute positions of POIs using sinusoidal functions with varying frequencies. Xu et al. (2018) directly encodes the coordinates of trajectories using multi-layer perceptons and feeds it to a GRU, capturing their sequential nature. For polygons, NUFTspec (Mai et al., 2023b) maps geometries into the spectral domain, effectively preserving key polygon properties such as topology awareness. However, the design of these methods inherently limits their applicability, as they only capture the properties of the specific geospatial object they are devised for. This restricts their generalizability in tasks involving mixed geospatial object types, such as land use classification, where integrating points (e.g., POIs) and polygons (e.g., buildings) requires simultaneously preserving their spatial properties as well as relationships between them.

In this work, we introduce Poly2Vec, a polymorphic encoding framework that unifies the representation of 2D geospatial objects, including points, polylines, and polygons. At its core, Poly2Vec leverages the Fourier transform to encode essential spatial properties, transforming the input geometries 2 into the frequency domain. Given that this transformation results in complex-valued features, the magnitude and phase components are extracted. As shown in Figure 1, these components complement each other: the magnitude reflects spatial extent, being uniform for points with no shape and varying for polygons and polylines, while the phase highlights directionality, such as the alignment of a polyline. To combine these components into a single representation, Poly2Vec incorporates a learned fusion module that adaptively balances their contributions based on the task and geometry type, producing a real-valued geometry embedding that ensures compatibility with ML models.

Figure 1:

Figure 1:

Visualization of the Fourier transform magnitude and phase of (a) road segment, (b) building, and (c) POI.

We formally define four key properties, shape preservation, direction preservation, distance preservation, and task flexibility, as essential criteria for evaluating the effectiveness of geometry encoding. These properties ensure the produced embeddings accurately capture the essential geometry characteristics while remaining versatile across different tasks. To demonstrate that Poly2Vec preserves these properties, we conduct a two-part evaluation. First, we evaluate Poly2Vec on spatial reasoning tasks, showing that it outperforms the state-of-the-art specialized baselines by up to 17% for topological classification, 26% for directional classification, and 75% for distance estimation. Second, we show that integrating Poly2Vec into a state-of-the-art GeoAI workflow reduces prediction error by 14% and 5% in population prediction and land use inference.

In summary, our contributions are:

• We introduce Poly2Vec, the first encoding framework that unifies the representation of various 2D geometries.

• We propose a 2D continuous Fourier transform-based encoding approach to capture crucial spatial properties, including shape, distance, and direction.

• We design a learned fusion strategy to adaptively combine Fourier magnitude and phase for diverse objects and tasks.

• Our experiments show that Poly2Vec preserves crucial geometry encoding properties, demonstrating its versatility in handling diverse geospatial objects, and task-flexibility when integrated into state-of-the-art GeoAI pipelines.

2. Preliminaries

2.1. Problem Formulation

Definition 1 (Geospatial Object).

A geospatial object gR2 is represented by an array PRN×2, where each row is a point (x, y), and N is the total number of points. The type of geometry (e.g., point, polyline, or polygon) is determined by the organization and relationships among these points.

Polymorphic Encoding of Geospatial Objects.

Given a dataset of geospatial objects G={g}RN×2, the goal is to define an encoding function eθ(g):RN×2Rd, parameterized by θ, that maps each geometry g to a d-dimensional vector v, termed as geometry embedding. The embedding dimension d remains constant across different geometry types, making eθ polymorphic. The encoding is intended to capture the following key properties.

Property 1 (Shape Preservation).

For any geometry gG, its embedding v, should capture its structural characteristics: shape and boundary for polygons, length for polylines, and the lack of spatial extent for points.

Property 2 (Direction Preservation).

For any geometries gi, gjG, eθ should ensure their embeddings vi, vj reflect their relative orientation.

Property 3 (Distance Preservation).

For any geometries gi, gjG, the similarity of their embeddings vi, vj should monotonically decrease as their spatial distance gigj increases, and vice versa.

Property 4 (Task Flexibility).

The encoder eθ should facilitate multiple tasks without requiring modifications.

Properties 1-3 ensure that v captures all essential spatial information, while Property 4 guarantees flexibility for use across GeoAI models. Section 4 empirically demonstrates that our proposed eθ satisfies these properties.

2.2. 2D Continuous Fourier Transform Properties

A key component of our encoding approach is the computation of the 2D continuous Fourier transform (CFT) 3. For a given 2D function f(x,y), its Fourier transform is denoted as {f(x,y)}=F(u,v)4 and is formally defined as:

F(u,v)=f(x,y)ej2π(ux+vy)dxdy (1)

where j=1 and u, v are the frequency samples.

We now summarize Fourier transform properties relevant to our approach following (Gaskill, 1978).

Linearity.

The Fourier transform of a sum of functions denoted as fi(x,y), is the sum of their corresponding Fourier transforms Fi(u,v):

{i=1naifi(x,y)}=i=1naiFi(u,v),aiC (2)

Affine Transformation.

For an affine-transformed function f(Ax+τ), where x=[x,y]T, its Fourier transform is:

{f(Ax+τ)}=1det(A)ej2πτTATuF(ATu) (3)

where u=[u,v]T, AR2×2 is the affine matrix, and τR2 is the translation vector.

Hermitian Symmetry.

For real-valued functions f(x,y), F(u,v) satisfies F(u,v)=F(u,v), where F(u,v) denotes the complex conjugate.

Magnitude and Phase.

The Fourier Transform F(u,v) is a complex-valued function composed of a real part, Re(F(u,v)), and an imaginary part, Im(F(u,v)). The magnitude z(u,v) and phase ϕ(u,v) are defined as:

z(u,v)=Re(F(u,v))2+Im(F(u,v))2 (4)
ϕ(u,v)=atan2(Im(F(u,v)),Re(F(u,v))) (5)

3. Methodology

Figure 2 illustrates our proposed Poly2Vec, which uniformly encodes arbitrary geospatial objects for GeoAI applications. We first describe how the Fourier transform is derived for each geometry type, and then outline the learned fusion module for deriving the final geometry embeddings.

Figure 2:

Figure 2:

Overview of Poly2Vec.

3.1. 2D Continious Fourier Transform of Geometries

3.1.1. Fourier Transform of a Point

A point p=(xp,yp)R2 is modeled as a 2D Dirac delta function, which represents the point as a distribution concentrated entirely at (xp, yp) and can be expressed as:

fp(x,y)=δ(xxp,yyp) (6)

To that extent, the Fourier transform of fp(x,y) is given by:

Fp(u,v)=ej2π(xpu+ypv) (7)

where (u, v) are the frequency components.

The Fourier transform magnitude for any point is constant, zp(u,v)=1, while the phase ϕp(u,v) encodes its location.

As shown in Figure 2, deriving the Fourier transform for polylines and polygons involves additional steps. Polylines are divided into line segments, and polygons are triangulated into non-overlapping triangles. The Fourier transform is computed for each component by affine transforming them to their canonical geometry, and the linearity property of Eq. (2) is used to compute the Fourier transform of the original geometry 5. Details for polylines and polygons are specified below, with derivation details in Appendix A.2.

3.1.2. Fourier Transform of a Polyline

We begin by deriving the Fourier transform of a canonical line segment and then generalize to any arbitrary line segments. Consider the canonical line segment lc, which extends from a=(12,0) to b=(12,0) in R2, as shown in Figure 2b. Then, lc can be represented as:

flc(x,y)=rect(x)δ(y) (8)

where δ(y) represents a Dirac delta function ridge along the x-axis, and rect(x) restricts the ridge to the interval x12.

The Fourier transform of flc(x,y) is given by:

Flc(u,v)=sinc(u) (9)

Now consider an arbitrary line segment l with endpoints q=(xq,yq) and r=(xr,yr). To compute the Fourier transform of l, we map it to the canonical line segment lc, using the affine transformation property. To compute this, we first introduce an auxiliary point c=(12,1) to the structure of lc so that it is not colinear with ab. This point maps to another auxiliary point s introduced in the structure of the arbitrary line segment l. The auxiliary point s is defined as s=r+n, where n=(yqyr,xrxq)T, representing a 90° clockwise rotation of the vector rq. Note that the line segments qr and rs have the same length.

Given the points q, r, s and a, b, c we then construct the affine transformation matrix A=[abc][qrs]1. By applying Eq. (3), the Fourier transform of an arbitrary line segment l, with endpoints q, r, is expressed as:

Fl(u,v)=1det(A)ej2πτTATuFlc(ATu)=qr2ej2π(q+r2)usinc(uT(rq)) (10)

At (u,v)=(0,0), the Fourier transform is Fl(0,0)=qr2, the squared length of the line segment.

Finally, following Eq. (2), the Fourier transform of an arbitrary polyline pl is computed as:

Fpl(u,v)=i=1TlFli(u,v) (11)

where Fli(u,v) is the Fourier transform of the i-th line segment and Tl is the total number of line segments. The term ai=1, since the line segments are non-overlapping.

3.1.3. Fourier Transform of a Polygon

To compute the Fourier transform of a polygon we decompose it into a set of non-overlapping triangles using standard triangulation techniques 6. We thus begin with the Fourier transform of a canonical isosceles right triangle and then generalize to its computation for arbitrary triangles.

Consider the canonical isosceles right triangle Δc with vertices a=(0,0), b=(1,0), and c(1,1), represented as:

fΔc(x,y)={1,if0x1and0yx,0,otherwise.} (12)

The Fourier transform of fΔc(x,y) is then given by 7:

FΔc(u,v)=010xej2π(ux+vy)dydx=14π2uv(u+v)[((u+v)cos(2πu))](ucos(2π(u+v))v)j((u+v)sin(2πu))[(usin(2π(u+v)))] (13)

Next, we compute the Fourier Transform of an arbitrary triangle Δ, with vertices q=(xq,yq), r=(xr,yr), and s=(xs,ys), by mapping it to the canonical triangle using the affine transformation property (Figure 2b). The affine transformation matrix is defined as A=[abc][qrs]1.

By substituting the vertices of Δ into A and applying Eq. (3), the Fourier Transform of the triangle FΔ(u,v) can be calculated. In this computation, the determinant of A,det(A)=12α, where α is the area of the triangle Δ.

Finally, the Fourier transform of an arbitrary polygon pg, given the linearity property of Eq. (2), can be computed as:

Fpg(u,v)=i=1TpgFΔi(u,v) (14)

where FΔi(u,v) is the Fourier transform of the i-th triangle, and Tpg is the total number of extracted triangles. The term ai=1, since the triangles are non-overlapping.

Building on the Fourier transform computation described earlier, we can now extract the frequency representation of a given geometry g, expressed as a spatial function fg(x,y) over R2 as, Fg=[F1,F2,,FW]TCW, where W is the number of frequency components, and Fi=F(ui,vi) represents the value of the Fourier transform evaluated at the specific frequency coordinates (ui, vi).

To sample the frequency components, we employ a geometric series sampling strategy (Mai et al., 2020; 2023b), which balances low and high-frequency components to capture both global and local details. We also experimented with learned frequencies in Appendix A.3.2 but found that the two strategies produced nearly identical results.

3.2. Learned Fusion of Fourier Transform Features

Given that Fg consists of complex values, we decompose it in two real-valued vectors of the magnitude z and the phase ϕ, computed as in Eqs. (4) and (5), respectively. This transformation ensures the representation is compatible with downstream ML models, which typically operate on real-valued inputs. Furthermore, the magnitude z captures the intensity of contributions at different frequencies, reflecting the geometry’s size and overall shape, while the phase ϕ encodes positional and orientational information of the geometry’s features (Zahn & Roskies, 1972).

While the final geometry embedding can be created by simply concatenating z and ϕ, their relative importance should vary with the geometry type and the downstream task. For instance, the magnitude of points is always 1, whereas it encodes the shape and size of polygons. Therefore, when encoding points, the phase should contribute more than the magnitude in the representation. To this end, Poly2Vec adaptively learns the importance of magnitude and phase through two separate transformations z=hz(z) and ϕ=hϕ(ϕ), where hz:RWRW and hϕ:RWRW are separate MLPs for z and ϕ respectively.

Finally, the transformed vectors zRW and ϕRW are concatenated and passed through a final MLP h:RWRd to produce the final geometry embedding v=h([z;ϕ])Rd, which can be inputted to any machine learning model M, such that M(v)y, where y represents task-specific outputs. We will empirically verify that v preserves the key properties in Section 4.

4. Experiments

In this section, we conduct experiments to evaluate the effectiveness of Poly2Vec across four key research questions:

[RQ1] Does Poly2Vec effectively preserve the critical geometric properties of shape, direction, and distance?

[RQ2] How does Poly2Vec perform in comparison to baseline encoding methods tailored for specific object types?

[RQ3] Can integrating Poly2Vec into existing workflows lead to improvements in their performance?

[RQ4] Does learned fusion boost Poly2Vec performance?

4.1. Spatial Reasoning Tasks

This section addresses RQ1 and RQ2, empirically evaluating Poly2Vec’s ability to preserve the properties of Section 2.1, against specialized baselines. We categorize these evaluations as spatial reasoning tasks, which are fundamental to broader applications like geospatial question answering (GeoQA), relying on precise spatial understanding (Punjani et al., 2018; Papamichalopoulos et al., 2024).

Datasets.

We evaluate two OSM datasets from Singapore and New York, containing POIs (points), main roads (polylines), and buildings (polygons).

Baselines.

We include three categories of baselines: point encoders: (i) Direct, directly utilizing coordinates (Chu et al., 2019), (ii) Tile, a discretization method (Berg et al., 2014), (iii) Wrap, a coordinate wrapping mechanism (Mac Aodha et al., 2019), (iv) Grid, inspired by position encoding (Mai et al., 2020), and (v) Theory, a multi-scale encoder (Mai et al., 2020). All point encoders are extended to other geometries handling them as sequences of points, following Rao et al. (2020); Xu et al. (2018). polyline encoder: (i) T2Vec a classic trajectory encoder (Li et al., 2018). polygon encoders: (i) ResNet1D (Mai et al., 2023b) and (ii) NUFTspec (Mai et al., 2023b).

Input geometry coordinates are normalized to [−1, 1] × [−1, 1]. More experimental details are in Appendix A.4.

4.1.1. Topological Relationship Classification

This task classifies topological relationships defined by the DE-9IM model (Clementini et al., 1993) for geospatial object pairs. Supported relationships are in Table 3.

Table 3:

Topological relationships of geometry pairs.

Geometry Pair Topological Relationships (a relationship b)
point-polyline disjoint, intersects
point-polygon disjoint, contains
polyline-polyline disjoint, intersects
polyline-polygon disjoint, touches, intersects, within
polygon-polygon disjoint, touches, intersects, contains, within, equals
Settings.

The geometry embeddings of each pair are concatenated, passed through a 2-layer MLP with NC output units (number of relationships). We adopt cross-entropy loss for optimization. Performance is measured by accuracy, precision, recall, and F1-score. Accuracy results are presented in Table 1, with the rest in Appendix A.4.6.

Table 1:

Model accuracy on topological relationship classification. Best and second best are highlighted.

Methods Singapore
New York
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
ResNet1D - - - - 0.4570.017 - - - - 0.4520.033
NUFTspec - - - - 0.6020.009 - - - - 0.5850.008
t2vec - - 0.728 0.023 - - - - 0.8070.121 - -
Direct 0.8230.013 0.8430.005 0.7330.007 0.3680.010 0.3570.018 0.8460.011 0.9090.018 0.7450.008 0.4950.009 0.4460.023
Tile 0.7900.021 0.7000.010 0.5050.005 0.4590.013 0.4110.009 0.6590.013 0.7830.007 0.5020.009 0.4940.038 0.4050.005
Wrap 0.8860.003 0.8800.008 0.7160.011 0.4760.010 0.3490.004 0.8860.006 0.8800.017 0.7330.009 0.5500.011 0.3810.007
Grid 0.8460.004 0.8440.004 0.6970.031 0.4580.004 0.3350.012 0.8220.039 0.8910.004 0.7390.009 0.5160.008 0.3810.031
Theory 0.8920.003 0.9000.005 0.7190.008 0.4500.010 0.4610.041 0.8970.008 0.9090.008 0.7340.008 0.5910.006 0.4550.041
Poly2Vec 0.9550.007 0.9490.002 0.8120.010 0.5090.008 0.7020.006 0.9530.003 0.9800.002 0.8300.004 0.6410.062 0.6840.008
Results.

From Table 1, we observe that Poly2Vec consistently outperforms all baselines across all experiments. Unlike specialized encoders that excel only for specific pairs, Poly2Vec’s performance is consistent across all geometries, highlighting its versatility and generalization capabilities. The second-best performing models vary by geometry type, with T2Vec for polylines and NUFTspec for polygons. This shows that simply extending point encoders to handle all geospatial objects is not adequate, as it fails to preserve characteristics like the object’s shape and position, leading to decreased performance. Finally, all models perform better when detecting relationship is a binary classification (e.g., point-polyline in Table 3), compared to multi-classification (e.g., polygon-polygon in Table 3). This is expected, as the latter requires capturing fine-grained spatial nuances, posing greater difficulty. In summary, these results emphasize the importance of preserving shape (Property 1), and distance (Property 3) in geometry embeddings. Poly2Vec’s ability to do so, along with its unified framework, enables it to consistently outperform baselines.

4.1.2. Directional relationship classification

This task classifies the directional relationships defined by the 16-compass direction model of two geospatial objects.

Settings.

We follow the same setting as in Section 4.1.1, with Nc=16, and report the same metrics. Accuracy results are in Table 2, with the rest in Appendix A.4.6.

Table 2:

Model accuracy on directional relationship classification. Best and second best are highlighted.

Methods Singapore
New York
point-
point
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
point-
point
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
ResNet1D - - - - - 0.8190.010 - - - - - 0.7470.010
NUFTspec - - - - - 0.8070.008 - - - - - 0.6980.017
t2vec - - - 0.2680.075 - - - - - 0.2490.032 - -
Direct 0.8800.006 0.8410.007 0.8440.006 0.8200.002 0.8300.005 0.7520.017 0.8770.004 0.7660.005 0.8360.008 0.6530.007 0.7840.004 0.6940.004
Tile 0.2530.001 0.2680.002 0.2730.008 0.3260.010 0.4540.001 0.3940.003 0.2450.009 0.2580.005 0.3160.005 0.2170.001 0.4660.001 0.3490.012
Wrap 0.8610.018 0.8040.009 0.8030.004 0.7810.002 0.8310.002 0.7780.001 0.8090.004 0.6690.001 0.7490.018 0.5960.019 0.7720.002 0.6020.006
Grid 0.8820.007 0.7280.007 0.7710.003 0.6990.001 0.6410.016 0.5340.138 0.8680.002 0.5900.003 0.6460.049 0.4380.004 0.7520.001 0.4850.079
Theory 0.9120.014 0.8670.009 0.8580.004 0.8340.012 0.8600.006 0.7350.044 0.8920.017 0.7600.007 0.8260.008 0.6840.009 0.7750.005 0.5550.012
Poly2Vec 0.9320.006 0.9350.032 0.9250.002 0.9060.010 0.9070.007 0.8330.006 0.9090.012 0.8910.004 0.8830.013 0.8630.007 0.8760.009 0.7850.003
Results.

From Table 2, we observe that Poly2Vec consistently outperforms all baselines across all experiments. This demonstrates its ability to effectively preserve the direction (Property 2) among diverse geometry types. While polygon encoders outperform the extended point encoders also in this task, T2Vec underperforms. This is due to T2Vec’s strategy of assigning coordinates to grid cells during encoding, which is effective for trajectory-related tasks, but introduces discretization artifacts that affect angular relationships. A similar limitation is observed in the performance of Tile, which also relies on discretizing points into grid cells. In contrast, Poly2Vec encodes geometries holistically, preserving their relative orientation and avoiding these pitfalls.

4.1.3. Distance Estimation

This task evaluates whether geometry embeddings preserve pairwise distances (Property 3).

Settings.

The original distance is estimated by the Euclidean distance of the geometry embeddings. The mean squared error (MSE) is adopted as loss function. We compare the differences between the predicted and original distances in Figure 3 and report the mean absolute error (MAE) in Appendix A.4.6.

Figure 3:

Figure 3:

Distance scatter plots of point-polygon pairs on Singapore dataset for different encoders.

Results.

Figure 3 depicts that the predicted distances generated by Poly2Vec are closely aligned with the original distances, whereas the predicted distances from other point encoders appear more scattered. This highlights Poly2Vec’s superior ability to preserve spatial distance relationships across various geometry types. Methods like Direct are overly simplistic, while approaches such as Tile, Grid, and Wrap introduce location distortions through discretization or periodic transformations, affecting the distance preservation. By leveraging the Fourier transform, Poly2Vec effectively captures both the positions and relative spatial relationships of the geometry pairs, enabling it to implicitly encode distance as a core property into its embeddings. A table reporting the Mean Absolute Error across baselines, and the additional figures, is provided in Appendix A.4.6.

4.2. Integration In an End-to-End GeoAI Pipeline

The section addresses RQ3, demonstrating the benefits of integrating Poly2Vec into an existing GeoAI workflow.

Dataset.

We utilize the same dataset as in Section 4.1. The regions for both cities are extracted using the administrative boundaries of Singapore Subzones and NYC Census Tracts.

Baseline.

We adopt RegionDCL (Li et al., 2023), an unsupervised urban region representation learning framework that uses buildings and POIs from OSM for land use inference (predicting urban functional distributions) and population prediction (estimating region population). RegionDCL encodes buildings by transforming their footprints into images and extracting features using ResNet18 while using categorical features for POIs. To address the loss of location information, RegionDCL employs a distance-biased transformer, which introduces a bias in the self-attention mechanism to prioritize closer objects.

Settings.

We evaluate three variants: (1) RegionDCL, the original framework, (2) RegionDCL w/o distance-bias removes the distance-biased term, and (3) RegionDCL w/Poly2Vec removes the distance-biased term and replaces the encodings with Poly2Vec. The training and evaluation strategies remain unchanged across the variants following the original work. For land use inference, we report L1-distance, KL-divergence, and cosine similarity metrics. For population prediction, we report MAE, root mean squared error (RMSE), and coefficient of determination (R2).

Results.

The results for both tasks are presented in Table 4. Removing the distance-bias term from RegionDCL leads to a noticeable drop in performance, emphasizing the importance of encoding the spatial location and alignment of objects for accurate land use and population predictions. When Poly2Vec is added, the performance improves significantly. This shows that Poly2Vec can adequately capture the shape and orientation of objects, similar to the initial image-based features, while also benefiting from the inclusion of object’s location. Overall, Poly2Vec encodes spatial information directly into its embeddings, removing the need for additional mechanisms like the distance-bias term. This improves performance while simplifying the pipeline, showcasing the task flexibility of Poly2Vec and its potential for effective integration into GeoAI workflows.

Table 4:

Comparison of methods for Land Use Classification and Population Prediction. Best values are highlighted.

Land Use Classification
Methods Singapore
New York
L1 ↓ KL ↓ Cosine ↑ L1 ↓ KL ↓ Cosine ↑
RegionDCL 0.498 ± 0.038 0.294 ± 0.047 0.879 ± 0.021 0.418 ± 0.012 0.229 ± 0.013 0.912 ± 0.006
RegionDCL w/o distance-bias 0.558 ± 0.043 0.369 ± 0.067 0.844 ± 0.023 0.439 ± 0.012 0.244 ± 0.012 0.904 ± 0.005
RegionDCL w/ Poly2Vec 0.484 ± 0.021 0.278 ± 0.025 0.881 ± 0.012 0.397 ± 0.010 0.212 ± 0.011 0.923 ± 0.007
Population Prediction
Methods Singapore
New York
MAE ↓ RMSE ↓ R2 MAE ↓ RMSE ↓ R2
RegionDCL 5807.54 ± 522.74 7942.74 ± 779.44 0.427 ± 0.108 5020.20 ± 216.63 6960.51 ± 282.35 0.575 ± 0.039
RegionDCL w/o distance-bias 6018.94 ± 641.71 8214.58 ± 931.11 0.385 ± 0.087 5293.04 ± 277.31 7348.86 ± 374.62 0.532 ± 0.030
RegionDCL w/ Poly2Vec 4957.58 ± 506.02 6874.47 ± 851.73 0.561 ± 0.117 4602.75 ± 179.66 6393.38 ± 279.70 0.621 ± 0.037

4.3. Ablation Study

This section addresses RQ4, highlighting the benefits of the proposed learned fusion module.

Settings.

We include three variants: (1) w/mag uses only the Fourier transform magnitude, (2) w/phase uses only the phase, and (3) w/concat combines both via concatenation.

Results.

As shown in Figure 4, among the variants, w/ mag performs the worst across all tasks, particularly in directional relationship classification, as the Fourier transform magnitude primarily captures shape, which is insufficient on its own to address these tasks. In contrast, w/ phase, which encodes location information, performs better since relative location, here, is more crucial. Combining both through w/ concat shows improvements, highlighting the importance of integrating both shape and location information. In contrast, Poly2Vec outperforms all variants by employing a learned fusion strategy that adaptively balances the contribution of magnitude and phase based on the task and geometry type. Particularly, this strategy benefits Poly2Vec more in tasks such as point-related distance estimation, where points lack spatial extent, and thus magnitude should contribute significantly less than the phase containing location information.

Figure 4:

Figure 4:

Ablation study for the point-polygon dataset.

5. Related Work

Existing geometry encoding approaches often focus on one shape type, with point encoders receiving the most attention. Direct point encoding methods simply feed raw coordinates into neural networks but fail to capture details of location distributions (Xu et al., 2018; Chu et al., 2019). Discretization methods assign points to predefined grid cells, as seen in approaches leveraging location context for image classification (Tang et al., 2015; Berg et al., 2014), but struggle with fixed resolution and imprecise representations. Sinusoidal methods encode normalized coordinates using sinusoidal functions, such as Wrap, which captures cyclic patterns (Mac Aodha et al., 2019). Extensions like multi-scale encoder (Zhong et al., 2019) introduce multiple sinusoidal scales. Theory improves this by computing the dot product of coordinates with unit vectors separated by 120° (Mai et al., 2020). There are also point encoders that jointly model location and neighborhood features (Qi et al., 2017; Yin et al., 2019; Zhou & Tuzel, 2018).

Unlike points, there are no dedicated approaches for encoding polylines in their generic form. The closest relevant work lies in trajectory encoding, where trajectories are often represented as ordered sequences of points. Most such approaches rely on discretization. For instance, Li et al. (2018) uses grid-based encoding, training an RNN on degraded data to infer missing information and embedding grid cells to capture relative spatial positions. Other approaches directly use coordinates, leveraging sequential models (i.e. RNNs) to process the encodings (Feng et al., 2018; Xue et al., 2021; Rao et al., 2020; Xu et al., 2018), but require strict sequential ordering and may overlook geometric relationships. Polygon encoding has gained significant attention. Veer et al. (2018) employ elliptic Fourier descriptors to approximate polygon outlines and utilize bidirectional LSTM and 1D CNNs to encode vertex sequences. Mai et al. (2023b) used a 1D ResNet architecture with circular padding for loop origin invariance. Other approaches use the non-uniform Fourier transform (NUFT) to map polygons to the spectral domain, converting them back into images via inverse Fourier transforms (IDFT), though this suffers from the limitations of grid-based approaches (Jiang et al., 2019a;b). Mai et al. (2023b) refine this approach by omitting the IDFT. PolygonGNN (Yu et al., 2024) encodes multipolygons, modeling their shape details and inter-polygonal relationships through heterogeneous visibility graphs.

While effective for specific geometry types, existing approaches are devised for specific geospatial objects. Encoding heterogeneous coordinate-based data remains a challenge, as current methods, in such cases, either use separate encoders for different object types, thereby adding complexity, or convert geometries into known formats (i.e., image, text), leading to a loss of spatial precision. This limitation is particularly critical for GeoAI models that aim to incorporate coordinate-based geospatial data as an additional modality (Zhang et al., 2024; Mai et al., 2023a). Poly2Vec addresses this gap by uniformly encoding points, polylines, and polygons within the same framework, offering a level of versatility not demonstrated by prior methods.

6. Conclusion and Future Work

We proposed Poly2Vec, a unified encoding framework for geospatial objects that preserves essential spatial properties, including topology, directionality, and distance. By outperforming object-specific baselines and improving downstream tasks like population prediction and land use inference, Poly2Vec demonstrates its versatility and effectiveness in GeoAI pipelines. Future work will explore extending Poly2Vec to higher-dimensional geometries, including 3D shapes, and its integration into Geo-Foundation models as a unified representation for coordinate data modalities.

Impact Statement.

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. Our improved representation of 2D geometry for deep models could lead to more accurate, versatile GeoAI applications, leading to better understanding the Earth and improvements for the environment, transportation efficiency, and access equity.

A. Appendix

A.1. Geospatial Objects Definitions

Definition 2 (Point).

A point is a zero-dimensional geometric entity in R2, defined by a single coordinate (x, y), where x, yR. A point represents a specific location in the plane but has no extent, size, nor dimension.

Definition 3 (Line Segment).

A line segment is a one-dimensional geometric object in R2, defined as a straight line segment between two distinct endpoints p1=(x1,y1) and p2=(x2,y2).

Definition 4 (Polyline).

A polyline is a one-dimensional object in R2, represented by an array PRN×2, where each row is a point pi=(xi,yi). It consists of connected line segments formed by consecutive points pi and pi+1 for 1i<N, with p1pN.

Definition 5 (Polygon).

A polygon is a two-dimensional geometric object in R2, represented as a closed sequence of points forming its boundary. It is defined by an array PRN×2, where each row corresponds to a point (xi,yi)R2 and (x1,y1)=(xN,yN).

A.2. Analytical Calculations of Fourier Transform

A.2.1. Fourier Transform of a Point

By representing a point p=(xp,yp)R2 as a 2D Dirac delta function fp(x,y)=δ(xxp,yyp) the Fourier ransform of fp(x,y) can be derived as follows:

Fp(u,v)={fp(x,y)}=fp(x,y)ej2π(ux+vy)dxdy=δ(xxp,yyp)ej2π(ux+vy)dxdy=ej2π(xpu+ypv)

where (u, v) are the frequency components.

A.2.2. Fourier Transform of a Polyline

Canonical line segment.

We express the canonical line segment lc extending from a=(12,0) to b=(12,0), as flc(x,y)=rect(x)δ(y). where rect(x) restricts the ridge to x12, and δ(y) represents a Dirac delta function along the x-axis. The Fourier transform of flc(x,y) is :

Flc(u,v)={flc(x,y)}=flc(x,y)ej2π(ux+vy)dxdy=rect(x)δ(y)ej2π(ux+vy)dxdy

Using the sifting property of the Dirac delta function, the integral over y evaluates to the value of the integrand at y=0:

Flc(u,v)=rect(x)ej2πuxej2πv(0)=rect(x)ej2πuxdx=sinc(u)

where (u, v) are the frequency components and v=0.

Arbitrary line segment.

We consider an arbitrary line segment l with endpoints q=(xq,yq) and r=(xr,yr), to compute the Fl(u,v), we map it to the canonical line segment lc using affine transformation. For this purpose, we introduce an auxiliary point c=(12,1) at the structure of lc so that it is not colinear with ab. This point maps to another auxiliary point s introduced at the arbitrary line segment l. The auxiliary point s is defined as s=r+n, where n=(yqyr,xrxq)T, representing a 90° clockwise rotation of the vector rq. Note that the vectors qr and rs are the same length.

Given all the above, the affine transformation matrix A is defined as:

A=[a1b1c1a2b2c2001]

Then the values of A are computed as follows:

A[qrs]=[abc]A=[abc][qrs]1=[121212001001][xqxrxr+yqyryqyryr+xrxq111]1=D[xq+xryq+yr(xq2+yq2xr2yr2)2yqyrxq+xryqxr+xqyr001D]

where

D=det(A)=1(xqxr)2+(yqyr)2

is the determinant of A.

Following the affine Fourier transform property from Eq. (3), the Fourier transform of an arbitrary line segment l with endpoints (xq, yq) and (xr, yr) is:

Fl(u,v)={flc(x,y)}=1det(A)ej2πcTATuF(ATu)

which can be rewritten as:

Fl(u,v)==1Dej2π(x0u+y0v)F(b2ua2vD,b1u+a1vD) (15)

where x0=1D(b1c2b2c1) and y0=1D(a2c1a1c2).

By substituting the specific values into Eq. (15), Fl(u,v) can be simplified to:

Fl(u,v)=1(xqxr)2+(yqyr)2[]ej2π(xq+xr2u+yq+yr2v)[sinc((xrxq)u+(yryq)v)]

A.2.3. Fourier Transform of a Polygon

Isosceles canonical right triangle.

The canonical isosceles right triangle Δc with vertices a=(0,0), b=(1,0), and c=(1,1), is represented by the function fΔc(x,y) which equals 1 inside the triangle and 0 otherwise.

The Fourier transform of fΔc(x,y) is computed as:

FΔc(u,v)={fΔc(x,y)}=fΔc(x,y)ej2π(ux+vy)dydx=010xej2π(ux+vy)dydx=011j2πv(ej2π(u+v)xej2πux)dx=1j2πv[01ej2π(u+v)xdx01ej2πuxdx]=14π2v(u+v)[(u+v)ej2πuuej2π(u+v)v] (16)

Using Euler’s formula (ejθ=cosθ+jsinθ), we can expand Eq. (16) to:

FΔc(u,v)=14π2uv(u+v)[((u+v)cos(2πu))](ucos(2π(u+v))v)j((u+v)sin(2πu))[(usin(2π(u+v)))]

This equation is undefined for some values of (u, v). We present the Fourier transform for each special case:

  • FΔc(0,0)=12

  • FΔc(0,v)=14π2v2(j2πv+cos(2πv)jsin(2πv)1)

  • FΔc(u,0)=14π2u2[(cos(2πu)+2πusin(2πu)1)][j(sin(2πu)2πucos(2πu))]

  • FΔc(v,v)=14π2v2(j2πv+cos(2πv)+(jsin(2πv)1)

Arbitrary triangle.

We calculate the Fourier transform of an arbitrary triangle Δ, with vertices q, r, s by using the affine transformation property. To that extent the affine transformation matrix A is defined as:

A=[a1b1c1a2b2c2001]

Then the values of A are computed as follows:

A[qrs]=[abc]A=[abc][qrs]1=[011001111][xqxrxr+yqyryqyryr+xrxq111]1=D[ysyrxrxsyq(xsxr)+xq(yrys)yqyrxrxqxq(yryqxr00D]

where

D=1xq(yrys)+xr(ysyq)+xs(yqyr)

is the determinant of A.

If the area of Δ is α, then D=12α.

Finally the Fourier transform FΔ(u,v) can be calculated by substituting the affine transform parameters into Eq. (3).

For the case of (0, 0) we get that :

FΔ(0,0)=1DFΔc(0,0)=12D=α

which is the area of Δ.

A.3. Frequency Sampling Strategy

A.3.1. Geometric Sampling

We sample frequencies as a geometric series to balance the contribution of low and high-frequency frequency components. Formally,

fi=fminρi,i=0,1,,W1

where fi is the i-th frequency, fmin, fmax correspond to the minimum and maximum frequencies and W is the number of sampled frequencies in each dimension. ρi is the step ratio and is defined as ρi=(fmaxfmin)1(W1).

Using this sequence, we construct a 2D meshgrid of frequencies, denoted as (U, V), centered around zero. Due to the Hermitian symmetry property of the Fourier transform, we only compute frequencies for half of the plane.

While uniform sampling is an alternative, previous studies suggest geometric sampling is better suited for tasks like ours, as it naturally balances the significance of low- and high-frequency components (Mai et al., 2020; 2023b).

A.3.2. Additional Experiments on learned frequency sampling

Figure 5:

Figure 5:

The effect of frequency sampling strategy on point-polygon pairs.

To investigate whether learning the frequency values would improve performance, we conducted an experiment where the frequencies were treated as learnable parameters and optimized alongside the model. Our results are reported in Figure 5. We observe that learning the frequencies does not yield significant improvements over fixing the frequencies in any of the tasks. This suggests that the geometric sampling approach is sufficiently effective for balancing low- and high-frequency contributions, and learning the frequencies does not provide additional benefits for the tasks considered.

A.4. Supplementary Experimental Study

A.4.1. Dataset Details

We utilized publicly available OpenStreetMap (OSM) datasets for Singapore and New York, obtained from Geofabrik 8 in .osm.pbf format. Geospatial objects, including POIs, roads, and buildings, were extracted using OSM-specific tags (amenity, shop, tourism, leisure for POIs, motorway, trunk, primary, secondary for roads, and building for buildings). Region partitions were derived from Singapore Subzones 9 and NYC Census Tracts 10. Dataset statistics are presented in Table 5.

Table 5:

Statistics of the Singapore and New York datasets.

City # POIs # roads # buildings # regions
Singapore 4,347 45,634 109,877 304
New York 14,943 139,512 1,153,088 2,324

Labels for the land use classification task were sourced from the Singapore Master Plan 201911 and NYC Map-PLUTO 12. Following previous approaches (Li et al., 2023), we merge the fine-grained land use classes into five major categories, including Residential, Industrial, Commercial, Open Space, and Others. Population estimation labels were obtained from WorldPop13 for both cities.

For the remaining tasks, the labels are generated manually. Specifically, for the topological classification task, the number of relationships depends on the types of objects being compared. Point/polyline, point/polygon, and polyline/polyline pairs can belong to one of two classes: disjoint or not disjoint. Polyline/polygon pairs, however, have four distinct relationship classes, while polygon/polygon pairs include six classes, following the DE9IM model. To eliminate redundancy, we remove equivalent relationships such as within and contains, keeping only one representative relationship from each pair of equivalents. To create a balanced dataset across all relationship classes, we generate geometry pairs by slightly adjusting the positions of the original geospatial objects and randomly selecting 5,000 pairs for each class within a group.

For the directional relationship classification task, we classify the spatial relationships between two geometries into one of 16 compass directions based on their angular relationship. These 16 classes are derived from the cardinal and intercardinal directions: north, northeast, east, southeast, south, southwest, west, northwest, and their boundary counterparts (e.g., north-northeast, east-northeast). Labels are computed based on the relative orientation of the geometries’ centroids. Similar to the topological classification task, we randomly select 5,000 pairs for each directional class to ensure a balanced dataset.

For the distance estimation task, labels are computed using the actual spatial distance between the centroids of the two geometries. The spatial distance is calculated using Euclidean distance for planar geometries, for topological and directional relationship classification. We randomly select 10,000 geometry pairs for this task.

A.4.2. Baselines

We now describe the baseline methods used to evaluate Poly2Vec.

1. Point encoders

• Direct: Feeds directly the geometry’s input coordinates to the downstream model, without any encoding mechanism.

• Tile: Partitions the study area into a uniform grid with cells of size c. Each grid cell is assigned an embedding, which serves as the encoding for the points assigned to that cell (Berg et al., 2014; Adams et al., 2015; Tang et al., 2015).

• Wrap: Uses a wrapping mechanism [sin(πp); cos(πp)] to encode a point p (Mac Aodha et al., 2019).

• Grid: Follows the Transformer’s position encoding model (Vaswani, 2017), representing spatial positions through multi-scale sine and cosine transformations. At each scale s, the encoding is given by PEs(g)(p)=[cos(pλmingsS1),sin(pλmingsS1)], where g=λmaxλmin controls the frequency range. The final encoding concatenates these multi-scale representations, capturing spatial structures across different resolutions (Mai et al., 2020).

• Theory: Encodes spatial positions using dot products with unit vectors separated by 120°. At each scale s, the encoding is given by PEs,j(t)(p)=[cos(p,ajλmingsS1),sin(p,ajλmingsS1)] j{1,2,3}, where a1=[1,0]T, a2=[12,32]T, and a3=[12,32]T are unit vectors spaced at 120°. The final encoding concatenates these multi-scale representations across all vectors (Mai et al., 2020).

2. Polyline encoders

• T2Vec: First uniformly partitions the whole space into grid cells, and map each trajectory point into the grid cell. Through this tokenization, each trajectory is converted to a sequence of grid cell IDs. Then adopts a GRU encoder to encode the sequence and an end-to-end training paradigm that amis to reconstruct the original trajectories from the distorted/downsampled ones (Li et al., 2018).

3. Polygon encoders

• ResNet1D: Adapts the 1D variant of the Residual Network (ResNet) architecture, incorporating circular padding to effectively encode the exterior vertices of polygons (Mai et al., 2023b).

• NUFTspec: Transforms polygons into the spectral domain using the Non-Uniform Fourier Transformation (NUFT) and j-simplex meshes and then learns polygon embeddings from these spectral features using MLPs (Mai et al., 2023b).

A.4.3. Hyperparameter Configuration

The coordinates of the input geometries are normalized to lie within the range [−1, 1] × [−1, 1], based on the bounding box of the corresponding area of interest. We set the minimum frequency fmin=0.1, the maximum frequency fmax=1.0 and W=10, resulting in 210 frequencies. We set the final size of the geometry embedding v to d=32. All the MLPs consist of two layers with ReLU activation functions.

Hyperparameters of spatial reasoning tasks.

For training on the spatial reasoning tasks, we utilize the AdamW optimizer and set the learning rate lr=104 and weight decay wd=108. The batch size is set to 128, and the downstream models were trained for 20 epochs. The training, validation, and testing ratios for the datasets corresponding to these tasks is 60:20:20. All experiments were run 5 times and we report average performances and standard deviation.

Hyperparameters of GeoAI tasks.

We follow the same hyperparameters as presented by Li et al. (2023), to keep our comparison consistent.

Hyperparameters of other baselines.

The implementation of baselines follows the corresponding papers, along which each method’s specific hyperparameters. The rest of hyperparameters related to downstream tasks are kept consistent with our approach.

A.4.4. Experimental Environment

Our experiments are performed on a cluster node equipped with an 18-core Intel i9-9980XE CPU, 125 GB of memory, and two 11 GB NVIDIA GeForce RTX 2080 Ti GPUs. Furthermore, all neural network models are implemented based on PyTorch version 2.3.0 with CUDA 11.8 using Python version 3.9.19.

A.4.5. Training Details of Evaluation Tasks

We use cross entropy loss to train the downstream model on the topological and directional relationship classification tasks. The loss is defined as:

CE(θ)=1Ni=1Nc=1Cyi,clog(y^i,c),

where N is the number of samples, C is the number of classes (C=2 for binary classification), yi,c{0,1} is the one-hot encoded ground-truth label for class c, and y^i,c[0,1] is the predicted probability for class c.

For the distance preservation task, the model is evaluated using the mean squared error (MSE) loss, defined as:

MSE(θ)=1Ni=1N(yiy^i)2,

where yi is the ground-truth distance for the i-th sample, and y^i is the predicted distance.

We note that for the population prediction and land use classification tasks, Poly2Vec is used as input to the pretrained urban region representation model RegionDCL (Li et al., 2023), and thus we follow the same training and evaluation procedure as was originally presented by the authors.

A.4.6. Supplementary Results of Spatial Reasoning Tasks

We report each model’s performance on topological and directional relationship classification in Table 7 and Table 8, respectively, including Precision, Recall, and F1. MAE for the distance preservation task is provided in Table 6. We also present the rest of the distance scatter Figures 10,11,12,13,14. We overall observe similar trends as in the main evaluation.

Table 6:

Overall model performance on distance estimation. Best and second best are highlighted.

Dataset Model point-
point
point-
polyline
point-
polygon
Singapore Direct 0.088±0.041 0.093±0.013 0.084±0.021
Tile 0.252±0.002 0.177±0.007 0.157±0.001
Wrap 0.085±0.009 0.106±0.012 0.102±0.007
Grid 0.087±0.006 0.107±0.003 0.108±0.002
Theory 0.065±0.019 0.083±0.027 0.079±0.028
Poly2Vec 0.016±0.001 0.043±0.011 0.029±0.009
New York Direct 0.075±0.017 0.126±0.041 0.115±0.033
Tile 0.271±0.005 0.170±0.004 0.189±0.004
Wrap 0.106±0.003 0.148±0.001 0.146±0.009
Grid 0.073±0.001 0.124±0.004 0.118±0.011
Theory 0.068±0.008 0.089±0.074 0.102±0.061
Poly2Vec 0.030±0.007 0.049±0.004 0.042±0.021

A.4.7. Supplementary Ablation Studies

We’ve shown the effect of learned fusion on point-polygon tasks in Section 4.3. We demonstrate its effect on the rest of spatial reasoning tasks in Figures 6, 7, 8, and 9. We again observe similar trends as reported in the main evaluation.

Figure 6:

Figure 6:

Ablation study on point-point pairs.

Figure 7:

Figure 7:

Ablation study on point-polyline pairs.

Figure 8:

Figure 8:

Ablation study on polyline-polygon pairs.

Figure 9:

Figure 9:

Ablation study on polygon-polygon pairs.

Table 7:

Overall model Performance on topological relationship classification. Best and second best are highlighted.

Metric Methods Singapore
New York
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
Precision Resnet1D - - - - 0.3980.018 - - - - 0.4210.051
NUFTspec - - - - 0.5880.041 - - - - 0.5620.032
t2vec - - 0.7680.021 - - - - 0.7450.012 - -
Direct 0.8590.007 0.8310.017 0.6370.032 0.4150.037 0.328 0.04 0.8350.032 0.9330.007 0.6610.032 0.4980.003 0.4390.024
Tile 0.7350.039 0.7050.056 0.5050.007 0.4900.006 0.4390.005 0.6640.018 0.7890.005 0.5020.009 0.4940.074 0.4180.005
Wrap 0.8740.011 0.8650.015 0.6450.009 0.4530.028 0.4050.010 0.8790.015 0.9150.006 0.6550.013 0.5860.005 0.4050.010
Grid 0.7990.037 0.8410.010 0.6260.027 0.4050.066 0.2880.013 0.7680.034 0.9040.015 0.6580.014 0.5130.012 0.3550.017
Theory 0.9030.037 0.8740.004 0.6510.009 0.4320.018 0.4780.023 0.8860.044 0.8930.017 0.7180.007 0.6020.008 0.4310.009
Poly2Vec 0.9130.007 0.9240.017 0.7790.001 0.5060.013 0.6940.007 0.9210.016 0.9790.021 0.7450.002 0.6310.017 0.6980.006
Recall Resnet1D - - - - 0.4550.011 - - - - 0.4520.035
NUFTspec - - - - 0.5720.032 - - - - 0.5920.029
t2vec - - 0.7320.024 - - - - 0.7180.032 - -
Direct 0.7920.012 0.8380.027 0.9970.019 0.4140.031 0.4500.014 0.8380.035 0.8880.004 0.9870.22 0.4970.003 0.4310.003
Tile 0.8940.035 0.6950.074 1.00.001 0.4630.008 0.4130.004 0.6590.009 0.7690.011 1.000.001 0.4990.039 0.4050.004
Wrap 0.9030.005 0.9010.033 0.9920.007 0.4770.012 0.3800.006 0.8940.030 0.8420.031 0.9860.005 0.5510.008 0.3800.006
Grid 0.9210.035 0.8480.014 0.9800.016 0.4650.007 0.3390.013 0.9330.045 0.8810.004 0.9950.002 0.5140.012 0.3820.035
Theory 0.9860.028 0.9330.007 0.9720.012 0.4510.012 0.4670.015 0.9230.044 0.9120.017 0.7820.007 0.6150.008 0.4120.009
Poly2Vec 1.00.000 0.9740.023 1.00.000 0.4980.007 0.6970.003 1.00.000 0.9890.032 1.00.000 0.6380.009 0.6970.007
F1 Resnet1D - - - - 0.3990.017 - - - - 0.3990.041
NUFTspec - - - - 0.5740.013 - - - - 0.5810.021
t2vec - - 0.7320.002 - - - - 0.7410.007 - -
Direct 0.8240.006 0.8340.031 0.7770.022 0.4020.027 0.3140.014 0.8360.004 0.9100.003 0.7920.027 0.4630.003 0.4030.013
Tile 0.8050.013 0.6940.017 0.6710.004 0.4120.009 0.3840.005 0.6610.008 0.7790.004 0.6680.008 0.4530.061 0.3690.003
Wrap 0.8880.005 0.8820.009 0.7810.008 0.4500.020 0.3390.006 0.8860.009 0.8760.019 0.7870.010 0.5170.005 0.3390.006
Grid 0.8550.007 0.8440.002 0.7640.015 0.4110.026 0.2670.018 0.8420.032 0.8920.006 0.7920.009 0.4630.046 0.3220.038
Theory 0.9380.014 0.9030.004 0.7880.007 0.4380.012 0.4250.006 0.8830.044 0.8910.017 0.7260.007 0.5490.059 0.4190.009
Poly2Vec 0.9550.011 0.9480.008 0.8310.002 0.4830.013 0.6820.003 0.9590.008 0.9840.012 0.8540.002 0.5880.012 0.6790.005
Table 8:

Overall model Performance on directional relationship classification. Best and second best are highlighted.

Metric Methods Singapore
New York
point-
point
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
point-
point
point-
polyline
point-
polygon
polyline-
polyline
polyline-
polygon
polygon-
polygon
Precision NUFTresnet - - - - - 0.8280.009 - - - - - 0.7830.010
NUFTspec - - - - - 0.8320.021 - - - - - 0.7150.014
t2vec - - - 0.2270.021 - - - - - 0.2320.012 - -
Direct 0.8820.006 0.8460.006 0.8470.005 0.8250.002 0.8130.005 0.7650.014 0.8800.003 0.7670.004 0.8430.002 0.6870.003 0.7940.003 0.7740.001
Tile 0.2590.001 0.2600.026 0.2860.038 0.3700.005 0.4660.001 0.4150.010 0.2930.001 0.2790.013 0.3220.005 0.2800.005 0.4960.002 0.3760.026
Wrap 0.8630.003 0.8100.007 0.8060.004 0.7900.002 0.8350.002 0.7890.001 0.8090.004 0.6840.002 0.7590.016 0.6100.021 0.7810.001 0.6670.007
Grid 0.8840.007 0.7330.007 0.7750.002 0.7080.001 0.6530.015 0.5450.144 0.8720.002 0.6050.001 0.6700.040 0.4410.003 0.7660.003 0.5140.074
Theory 0.9080.017 0.8720.012 0.8630.004 0.8150.012 0.8380.006 0.7290.044 0.8810.017 0.7740.007 0.8090.008 0.6920.009 0.7890.005 0.5380.012
Poly2Vec 0.9280.016 0.9420.012 0.9180.004 0.9110.013 0.8980.021 0.8300.007 0.9210.006 0.8890.016 0.8750.004 0.8890.013 0.8530.007 0.7920.009
Recall NUFTresnet - - - - - 0.8190.010 - - - - - 0.7470.010
NUFTspec - - - - - 0.7920.003 - - - - - 0.6850.004
t2vec - - - 0.2160.023 - - - - - 0.2530.032 - -
Direct 0.8790.006 0.8410.006 0.8450.006 0.8200.002 0.8300.005 0.7520.017 0.8770.004 0.7660.005 0.8360.002 0.6530.007 0.7840.003 0.6940.003
Tile 0.2530.001 0.2690.002 0.2730.008 0.3240.001 0.4540.001 0.3950.003 0.2480.001 0.2570.004 0.3160.005 0.2170.001 0.4660.001 0.3480.012
Wrap 0.8610.003 0.8040.009 0.8030.004 0.7820.003 0.8310.002 0.7790.001 0.8100.004 0.6690.001 0.7590.016 0.5980.018 0.7720.002 0.6020.006
Grid 0.8820.002 0.7290.007 0.7720.002 0.6990.001 0.6410.016 0.5330.139 0.8680.002 0.5900.002 0.6470.050 0.4370.002 0.7520.003 0.4830.078
Theory 0.8830.024 0.8670.009 0.8550.004 0.8630.012 0.5020.012 0.8970.014 0.7830.021 0.7910.007 0.8230.008 0.7090.009 0.8030.005 0.5670.012
Poly2Vec 0.9460.017 0.9470.021 0.9330.011 0.9030.008 0.8380.022 0.8260.007 0.9230.017 0.8940.012 0.8860.024 0.8780.013 0.8750.011 0.7930.012
F1 NUFTresnet - - - - - 0.8210.010 - - - - - 0.7560.010
NUFTspec - - - - - 0.8020.028 - - - - - 0.6670.023
t2vec - - - 0.2190.007 - - - - - 0.2520.018 - -
Direct 0.8800.006 0.8410.006 0.8450.006 0.8210.002 0.8400.005 0.7540.016 0.8760.004 0.7690.005 0.8380.002 0.6560.009 0.7840.004 0.7120.002
Tile 0.2150.001 0.2260.005 0.2470.015 0.3090.003 0.4470.001 0.3880.004 0.2360.001 0.2120.011 0.2880.012 0.1930.002 0.4390.002 0.3390.018
Wrap 0.8610.003 0.8040.009 0.8030.004 0.7820.002 0.8310.002 0.7800.001 0.8090.004 0.6680.002 0.7520.017 0.5900.021 0.7690.002 0.6130.005
Grid 0.8820.007 0.7280.007 0.7720.002 0.6980.001 0.6400.017 0.5300.150 0.8680.002 0.5880.002 0.6490.049 0.4090.003 0.7490.003 0.4600.077
Theory 0.9030.015 0.8520.009 0.8550.004 0.8420.012 0.8450.006 0.7410.044 0.8840.017 0.7520.007 0.8120.008 0.6680.009 0.7560.025 0.5370.22
Poly2Vec 0.9280.015 0.9270.032 0.9180.029 0.9010.017 0.8990.016 0.8270.022 0.8920.012 0.8830.014 0.9030.013 0.8770.004 0.8320.003 0.7690.019
Figure 10:

Figure 10:

Distance scatters of point-polygon pairs on NewYork dataset for different encoders.

Figure 11:

Figure 11:

Distance scatters of point-polyline pairs on Singapore dataset for different encoders.

Figure 12:

Figure 12:

Distance scatters of point-polyline pairs on NewYork dataset for different encoders.

Figure 13:

Figure 13:

Distance scatters of point-point pairs on Singapore dataset for different encoders.

Figure 14:

Figure 14:

Distance scatters of point-point pairs on NewYork dataset for different encoders.

Footnotes

2

We refer to geometries and geospatial objects interchangeably.

3

We use Fourier Transform and CFT interchangeably.

4

For compactness, we use F(u,v) to describe the CFT.

5

The same methodology can be adopted to compute the CFT of multi-polygons.

6

We adopt Constraint Delauney triangulation in this paper.

7

Special cases where u, v, and u+v approach zero are handled separately, and presented in Appendix A.2.3.

References

  1. Adams B, McKenzie G, and Gahegan M Frankenplace: interactive thematic mapping for ad hoc exploratory search. In Proceedings of the 24th international conference on world wide web, pp. 12–22, 2015. [Google Scholar]
  2. Balsebre P, Huang W, Cong G, and Li Y City foundation models for learning general purpose representations from openstreetmap. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pp. 87–97, 2024. [Google Scholar]
  3. Basiri A, Haklay M, Foody G, and Mooney P Crowdsourced geospatial data quality: Challenges and future directions, 2019. [Google Scholar]
  4. Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs DW, and Belhumeur PN Birdsnap: Large-scale fine-grained visual categorization of birds. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2011–2018, 2014. [Google Scholar]
  5. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. [Google Scholar]
  6. Chu G, Potetz B, Wang W, Howard A, Song Y, Brucher F, Leung T, and Adam H Geo-aware networks for fine-grained recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0, 2019. [Google Scholar]
  7. Clementini E, Di Felice P, and Van Oosterom P A small set of formal topological relationships suitable for enduser interaction. In International symposium on spatial databases, pp. 277–295. Springer, 1993. [Google Scholar]
  8. Couclelis H. Artificial intelligence in geography: Conjectures on the shape of things to come. The professional geographer, 38(1):1–11, 1986. [Google Scholar]
  9. Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, and Jin D Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 world wide web conference, pp. 1459–1468, 2018. [Google Scholar]
  10. Gao S, Hu Y, and Li W Handbook of geospatial artificial intelligence, 2023. [Google Scholar]
  11. Gaskill JD Linear systems, Fourier transforms, and optics. John Wiley & Sons, 1978. [Google Scholar]
  12. Huang J, Wang H, Sun Y, Shi Y, Huang Z, Zhuo A, and Feng S Ernie-geol: A geography-and-language pre-trained model and its applications in baidu maps. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3029–3039, 2022. [Google Scholar]
  13. Janowicz K, Gao S, McKenzie G, Hu Y, and Bhaduri B Geoai: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond, 2020. [Google Scholar]
  14. Jiang C, Lansigan D, Marcus P, Nießner M, et al. Ddsl: Deep differentiable simplex layer for learning geometric signals. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8769–8778, 2019a. [Google Scholar]
  15. Jiang C, Wang D, Huang J, Marcus P, Nießner M, et al. Convolutional neural networks on non-uniform geometrical signals using euclidean spectral transformation. arXiv preprint arXiv:1901.02070, 2019b. [Google Scholar]
  16. Jokar Arsanjani J, Zipf A, Mooney P, and Helbich M An introduction to openstreetmap in geographic information science: Experiences, research, and applications. OpenStreetMap in GIScience: Experiences, research, and applications, pp. 1–15, 2015. [Google Scholar]
  17. Kyrkou C, Kolios P, Theocharides T, and Polycarpou M Machine learning for emergency management: A survey and future outlook. Proceedings of the IEEE, 111 (1):19–41, 2022. [Google Scholar]
  18. Lee J-G and Kang M Geospatial big data: challenges and opportunities. Big Data Research, 2(2):74–81, 2015. [Google Scholar]
  19. Li X, Zhao K, Cong G, Jensen CS, and Wei W Deep representation learning for trajectory similarity computation. In 2018 IEEE 34th international conference on data engineering (ICDE), pp. 617–628. IEEE, 2018. [Google Scholar]
  20. Li Y, Yu R, Shahabi C, and Liu Y Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926, 2017. [Google Scholar]
  21. Li Y, Huang W, Cong G, Wang H, and Wang Z Urban region representation learning with openstreetmap building footprints. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1363–1373, 2023. [Google Scholar]
  22. Mac Aodha O, Cole E, and Perona P Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9596–9606, 2019. [Google Scholar]
  23. Mai G, Janowicz K, Yan B, Zhu R, Cai L, and Lao N Multi-scale representation learning for spatial feature distributions using grid cells. arXiv preprint arXiv:2003.00824, 2020. [Google Scholar]
  24. Mai G, Huang W, Sun J, Song S, Mishra D, Liu N, Gao S, Liu T, Cong G, Hu Y, et al. On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798, 2023a. [Google Scholar]
  25. Mai G, Jiang C, Sun W, Zhu R, Xuan Y, Cai L, Janowicz K, Ermon S, and Lao N Towards generalpurpose representation learning of polygonal geometries. GeoInformatica, 27(2):289–340, 2023b. [Google Scholar]
  26. Mirowski P, Grimes M, Malinowski M, Hermann KM, Anderson K, Teplyashin D, Simonyan K, kavukcuoglu k., Zisserman A, and Hadsell R Learning to navigate in cities without a map. In Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, and Garnett R (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/e034fb6b66aacc1d48f445ddfb08da98-Paper.pdf. [Google Scholar]
  27. Papamichalopoulos M, Papadakis G, Mandilaras G, Siampou M, Mamoulis N, and Koubarakis M Threedimensional geospatial interlinking with jedai-spatial. Journal of Web Semantics, 81:100817, 2024. [Google Scholar]
  28. Punjani D, Singh K, Both A, Koubarakis M, Angelidis I, Bereta K, Beris T, Bilidas D, Ioannidis T, Karalis N, et al. Template-based question answering over linked geospatial data. In Proceedings of the 12th workshop on geographic information retrieval, pp. 1–10, 2018. [Google Scholar]
  29. Qi CR, Su H, Mo K, and Guibas LJ Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017. [Google Scholar]
  30. Rao J, Gao S, Kang Y, and Huang Q Lstm-trajgan: A deep learning approach to trajectory privacy protection. arXiv preprint arXiv:2006.10521, 2020. [Google Scholar]
  31. Smith TR Artificial intelligence and its applicability to geographical problem solving. The Professional Geographer, 36(2):147–158, 1984. [Google Scholar]
  32. Sun J, Zheng Y, Hao J, Meng Z, and Liu Y Continuous multiagent control using collective behavior entropy for large-scale home energy management. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 922–929, 2020. [Google Scholar]
  33. Tang K, Paluri M, Fei-Fei L, Fergus R, and Bourdev L Improving image classification with location context. In Proceedings of the IEEE international conference on computer vision, pp. 1008–1016, 2015. [Google Scholar]
  34. Vaswani A. Attention is all you need. Advances in Neural Information Processing Systems, 2017. [Google Scholar]
  35. Veer R. v., Bloem P, and Folmer E Deep learning for classification tasks on geospatial vector polygons. arXiv preprint arXiv:1806.03857, 2018. [Google Scholar]
  36. Wu S, Yan X, Fan X, Pan S, Zhu S, Zheng C, Cheng M, and Wang C Multi-graph fusion networks for urban region embedding. arXiv preprint arXiv:2201.09760, 2022. [Google Scholar]
  37. Xu Y, Piao Z, and Gao S Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5275–5284, 2018. [Google Scholar]
  38. Xue H, Salim F, Ren Y, and Oliver N Mobtcast: Leveraging auxiliary trajectory forecasting for human mobility prediction. Advances in Neural Information Processing Systems, 34:30380–30391, 2021. [Google Scholar]
  39. Yin Y, Liu Z, Zhang Y, Wang S, Shah RR, and Zimmermann R Gps2vec: Towards generating worldwide gps embeddings. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 416–419, 2019. [Google Scholar]
  40. Yu D, Hu Y, Li Y, and Zhao L Polygongnn: Representation learning for polygonal geometries with heterogeneous visibility graph. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4012–4022, 2024. [Google Scholar]
  41. Zahn CT and Roskies RZ Fourier descriptors for plane closed curves. IEEE Transactions on computers, 100(3): 269–281, 1972. [Google Scholar]
  42. Zhang M, Li T, Li Y, and Hui P Multi-view joint graph representation learning for urban region embedding. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp. 4431–4437, 2021. [Google Scholar]
  43. Zhang W, Han J, Xu Z, Ni H, Liu H, and Xiong H Urban foundation models: A survey. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 6633–6643, 2024. [Google Scholar]
  44. Zhong ED, Bepler T, Davis JH, and Berger B Reconstructing continuous distributions of 3d protein structure from cryo-em images. arXiv preprint arXiv:1909.05215, 2019. [Google Scholar]
  45. Zhou Y and Tuzel O Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4490–4499, 2018. [Google Scholar]

RESOURCES