Skip to main content
Springer logoLink to Springer
. 2026 Mar 3;2026(1):19. doi: 10.1186/s13636-026-00449-2

The trajectoRIR database: room acoustic recordings along a trajectory of moving microphones

Stefano Damiano 1,✉,#, Kathleen MacWilliam 1,#, Valerio Lorenzoni 1, Thomas Dietzen 2, Toon van Waterschoot 1
PMCID: PMC13061821  PMID: 41970287

Abstract

Data availability is essential in the development of acoustic signal processing algorithms, especially when it comes to data-driven approaches that demand large and diverse training datasets. For this reason, an increasing number of databases have been published in recent years, including either room impulse responses (RIRs) or audio recordings during motion. In this paper we introduce the trajectoRIR database, an extensive, multi-array collection of both dynamic and stationary acoustic recordings along a controlled trajectory in a room. Specifically, the database contains moving-microphone recordings and stationary RIRs that spatially sample the room acoustics along an L-shaped trajectory. This combination makes trajectoRIR unique and applicable to a wide range of tasks, including sound source localization and tracking, spatially dynamic sound field reconstruction, auralization, and system identification. The recording room has a reverberation time of 0.5s, and the three different microphone configurations employed include a dummy head, with additional reference microphones located next to the ears, 3 first-order Ambisonics microphones, two circular arrays of 16 and 4 channels, and a 12-channel linear array. The motion of the microphones was achieved using a robotic cart traversing a 4.62m-long rail at three speeds: [0.2, 0.4, 0.8] m/s. Audio signals were reproduced using two stationary loudspeakers. The collected database features 8648 stationary RIRs, as well as perfect sweeps, speech, music, and stationary noise recorded during motion. Python functions are provided to access the recorded audio and retrieve the associated geometric information.

Keywords: Room acoustic database, Room impulse responses, Moving microphone arrays, Sound field reconstruction, Dynamic acoustic scenes

Introduction

Multi-microphone signal processing is becoming essential for an increasing number of room acoustics applications, ranging from telepresence, virtual room navigation, sound-zoning, and spatial audio reproduction, to assistive hearing for hearing impaired people and robot audition. These applications inherently involve dynamic acoustic scenes in which both listeners (microphones) and emitters (sound sources) are free to move in the environment, usually being a room. In this context, several challenges arise, including room parameter estimation [1], sound source localization and tracking [2], auralization [3], speech enhancement [4], echo cancellation [5], and sound field estimation [6].

Broad data availability is a key requirement for developing and evaluating algorithms, especially given the recent surge in machine and deep learning techniques that achieve state-of-the-art results when tackling the aforementioned challenges [2, 4, 79]. Room acoustics research involves both stationary scenes, where sources and microphones are in fixed positions within a room, or dynamic scenes, where they are free to move in the space.

To target applications involving stationary settings, static audio recordings or room impulse responses (RIRs) are required, which can be either real or simulated. High-quality datasets containing music [10], speech [1013], and babble or cocktail-party noise [12, 14] have been collected, but they fail to represent dynamic acoustic scenes. Additionally, several recorded RIR datasets involving various microphone array configurations exist [1024] and are widely used to generate synthetic data. Simulations are, in fact, an inviting tool for generating large amounts of data under arbitrary acoustic conditions, including variations in array configurations, room parameters, signal types, and source or microphone trajectories. In room acoustics research, an established way to generate synthetic data is to convolve source signals with RIRs. This enables the creation of complex acoustic scenes involving a spatial distribution of sound sources. In this case, the adopted RIRs are either recorded, resulting in a higher realism at the cost of a more limited control over room parameters, or simulated, allowing to create arbitrary rooms and acoustic scenes at the price of a more limited physical accuracy. As a drawback, simulated environments often deviate from real-world conditions, which may result in algorithms that do not robustly generalize to real scenarios.

To address use-cases involving dynamic settings, different datasets that capture dynamic acoustic scenes also exist, tailored for either audio [2530] or multi-modal applications [3133]. These corpora are designed for the task of sound source localization and tracking and include either speech, noise, or environmental sound events collected using different array geometries. These geometries comprise planar [2730], circular [30, 31], cubic [26], and spherical [27] arrays, as well as ambisonic microphones [28, 29], dummy heads (DH) [27, 32, 33], and robot heads [25]. Since these datasets are designed for tasks involving motion, they do not contain stationary RIRs along the motion path. Such RIRs are essential for applications like sound field interpolation or room parameter estimation involving dynamic scenes, where synthetic data remains a common alternative [1, 6]. However, while there exist established procedures to fairly accurately synthesize these types of signals in static conditions under mild physical assumptions [34, 35], synthesizing audio under motion is a challenging task as it requires either a precise spatiotemporal room acoustic model, or RIR interpolation from real recorded data.

For several applications, including time-variant RIR estimation [1], spatially dynamic sound field reconstruction [30], auralization [3], and the evaluation of dynamic audio simulations, there is a clear need for recordings of audio under motion with corresponding stationary RIRs along the motion path. While some existing datasets provide either stationary or dynamic audio data, a database containing matching recordings for both remains unavailable.

In an effort to bridge this gap and fuse the potential of RIR databases and audio recordings during motion, in this paper we introduce trajectoRIR: an extensive, multi-array database of stationary and dynamic acoustic recordings performed along a trajectory in a reverberant room. At the core of the trajectoRIR database lies a smoothed-L-shaped trajectory built using a rail system, on which a robotic cart is used to move microphones in a precise and reproducible manner. Both stationary RIRs, recorded at finely-spaced positions along the trajectory, and audio recordings during motion of the cart are included in the database. The recordings are obtained using three microphone array configurations: the first one (MC1) consists of a DH with in-ear omnidirectional microphones, two omnidirectional microphones located next to the ear canals, a 16-channel uniform circular array located around the DH at the same height of the ear canals, and a second 4-channel uniform circular array located above the head. The second one (MC2) is similar to MC1, only without the DH. The third configuration (MC3) consists of three first-order ambisonics (FOA) microphones and a 12-channel uniform linear array (ULA). These configurations have been chosen because they include standard microphone arrays that are widely employed in room acoustics and have been used for the collection of several existing RIR databases. This diversity thus enables the potential use of the proposed database in combination with other data collections, which makes it attractive for training data-driven algorithms [7].

The recordings were performed in the Alamire Interactive Laboratory (AIL) [36] located in Heverlee, Belgium, at the Park Abbey where the recording room has a reverberation time of 0.5s [14], and two static loudspeaker positions were used for the recordings. For stationary recordings, a total of 8648 RIRs were collected at equally spaced positions along the trajectory, with inter-position distances depending on the microphone configuration used. For the moving microphone recordings, the robotic cart was moved in a single direction at three different constant speeds in the walking speed range. During this movement, several signals were played from the speakers, including a piano piece, a drum beat, female speech, a white noise signal, and two perfect sweep signals covering different frequency ranges. A total of 108 multi-channel recordings were obtained, 36 per microphone array configuration. In addition, the ego-noise of the cart and rail system was recorded and added to the database in order to allow for the estimation of noise statistics and to promote the development of ego-noise reduction algorithms. Alongside the recorded audio, extensive metadata is provided for all the recordings in accompanying csv files, including geometrical information, speed information for the moving recordings, and temperature data.

To showcase the collected data, a systematic evaluation on the use-case of time-variant RIR estimation is introduced in Sect. 8. In this evaluation, RIRs along the motion trajectory are estimated using (i) the (sparse) stationary RIR recordings collected at fixed positions; (ii) a moving-microphone recording; (iii) a combination of both. Results indicate that the RIRs reconstructed from a combination of stationary and moving audio data agree with recorded data the best overall, confirming the importance of the proposed database featuring matching recordings.

The trajectoRIR database contains a total of 3.4h of audio recorded at 48kHz in 24bit, for a total size of 7.47GB. All material, together with source signals and Python scripts to use the data and retrieve geometrical information, is available at [37].

The rest of the paper is organized as follows. In Sect. 2, we give an overview the AIL room. In Sect. 3, we describe the recording equipment used. In Sect. 4, we introduce the rail system and the robotic cart used to obtain the measurements. In Sect. 5, we describe the microphone arrays and loudspeaker configurations used to capture the data. In Sect. 6, we detail the input signals used for the recordings and provide information on the recording configurations for the stationary and moving scenarios, and in Sect. 7 we give instructions to access the data, retrieve geometry information using the Python scripts, and provide examples of recorded signals. In Sect. 8 we present a systematic evaluation of the database on the use-case of time-variant RIR estimation. We finally summarize the database in Sect. 9.

Room description

The AIL [36] room, also used in the MYRiAD database [14] and shown in Fig. 1, is a laboratory located in the Saint Norbert’s gate of the Park Abbey in Heverlee, Belgium. The laboratory is approximately a shoebox-shaped room of dimension [6.4,6.9,4.7]m, with the exception of a staircase leading to the upper floor of the building, and has an approximate volume of 208m3. The floor and ceiling are made of wood, and thin line plastered brick walls surround the room. The two shortest walls are each interrupted by two windows of around 3.3m2. On the longest sides there are two wide passages to adjacent rooms, that were closed off by curtains during all recordings. The staircase has a plastered housing with railing made of glass and wooden stairs. The room has a reverberation time T20=0.5s, estimated according to the ISO 3382-1 standard as detailed in [14]. The room also has permanent audio equipment installed, but this was not used for this database. Details on the hardware used for the recordings are given in Sects. 4 and 5.

Fig. 1.

Fig. 1

View of the recording setup in the AIL room, with the MC2 array configuration

Recording equipment

A detailed list of the equipment employed for recording the database is provided in Table 1. The recording chains used for the moving and RIR recordings are similar. The input signal was sent to the loudspeakers using Adobe Audition for the RIR recordings, and via a Python script for the recordings during motion. The Python script jointly controlled the audio playback and the motion of the robotic cart, as detailed in [38]. In both cases, the loudspeaker signal was sent via USB to the RME Digiface, routed to the RME M-32 DA using the ADAT protocol, and finally sent to the Genelec 8030 CP loudspeakers. Microphone signals were sent to an RME Micstasy, converted to ADAT, and routed to the RME Digiface, where they were acquired using Adobe Audition for the RIR recordings and the Python script for recordings during motion. In both cases, the acquired signals were stored as audio files in the wav format. The total measured latency of the system was compensated in the RIRs. Both MATLAB and Python scripts were used for post-processing. Details on the cart and rail system used for the recordings during motion are provided in Sect. 4.

Table 1.

List of recording equipment used for creating the database

Type Product Mic. Config.
Hardware Reproduction Loudspeakers Genelec 8030 CP
DA-converters RME M-32 DA
Acquisition Microphones DPA 4090 MC1, MC2, MC3
AKG CK32 MC1, MC2
Neumann KU-100 DH MC1
Rode NT-SF1 MC3
AD-converters/pre-amplifiers RME Micstasy
Digital interface RME Digiface USB audio interface
Apple iMac
Software Reproduction/acquisition Adobe Audition
Python
Post-processing MATLAB
Python

Spatial setup

In this section, we describe the spatial setup of the trajectory and the loudspeakers used to record the database. Throughout the paper, all the provided geometrical information will be relative to the rail system defining the trajectory. The origin of the coordinate system is defined at the start of the trajectory, with the y axis pointing towards the front (i.e., the direction of motion at the start position, see Fig. 2). In Fig. 3, we sketch the positioning of the rail within the room and report the room dimensions, the distances of the origin from the walls of the room, and the orientation of the trajectory. This information has been retrieved using measurements of the distances between two reference points on the rail and four reference points on the back wall of the room (the one opposite to the staircase) with known coordinates, exploiting Euclidean distance matrices [39]. The geometry of the system can be visualized using the accompanying code described in Sect. 7.

Fig. 2.

Fig. 2

Scheme of the trajectory built using the rail system and used to record the database. The direction of the cartesian axes is also reported for reference (the actual coordinate system is centered in P1, as indicated by the red arrow). Loudspeaker positions are labeled SL (loudspeaker left) and SR (loudspeaker right). All indicated dimensions are approximate and only for illustrative purposes: accurate geometrical information is provided in the database

Fig. 3.

Fig. 3

Positioning of the rail system within the AIL room. The absolute coordinates of all positions and of the two loudspeakers can be retrieved using the geometrical information provided in the database

Rail system configuration

The trajectoRIR database relies on the use of a rail system to define a trajectory in the AIL room, along which all (stationary and moving) recordings are performed. The rail system follows the modular design introduced in [38]: all rail components are built using medium-density fiberboard (MDF), cut with a thickness of 6mm using a Trotec Laser Cutter. The track consists of modular blocks that can be assembled to create a custom trajectory shape, along which a robotic cart moves carrying the microphone arrays used for the recordings. The CAD files used to create all the rail and cart blocks are available at [40].

For the trajectoRIR database, we assembled a smooth L-shaped trajectory consisting of two straight segments connected by a curved segment, as depicted in Fig. 2. The trajectory is positioned at the center of the AIL room, with the straight segments oriented such that they are not parallel to any of the walls, as depicted in Fig. 3. The rail system has a width of approximately 16.95cm, and all distances reported in this paper are measured at the outer margin of the outer rail. The trajectory spans from the right end to the left end, covering a total length of approximately 4.4m measured at the center of the rail. Note that the total length of the rail is longer, as a margin between the last sampling position and the end of the rail track is required to allow the cart to move. The straight segments have a total length of approximately 1.41m, whereas the curved segment has a radius of approximately 1m and covers an angle of 90°.

The rail is mounted on a series of 10 equally spaced stands, that ensure the stability of the system. The height of the rail and cart system is set to 1.26m. The microphone arrays are designed using MDF supports, mounted on top of the DH in the MC1 configuration, and on a rod in the MC2 and MC3 configurations. To both record RIRs and track the movement of the robotic cart during moving recordings, we manually marked a series of 92 positions along the trajectory, spaced by approximately 5cm. Details of the microphone array setups are provided in Sect. 5, whereas position and speed labels are provided in Table 6.

Table 6.

Position and speed labels

Microphone config. Position label Description
STAT MC1, MC2 P[n] n {1,3,5,,91}
MC3 P[n] n {1,2,3,,92}
Speed label Description
MOV V1 0.2m/s
V2 0.4m/s
V3 0.8m/s

Loudspeaker configurations

Two loudspeakers are placed in the AIL room as sound sources for the recordings. They are located at fixed positions on opposite sides of the trajectory and are named according to their respective sides. The left loudspeaker (SL) and right loudspeaker (SR) are positioned within the inner and outer region of the curve, respectively. This positioning ensures a direct path from SL to the left ear of the DH and from SR to the right ear. Moreover, due to the shape of the curve, the microphones move closer to SR while symmetrically moving away from SL along the first rectilinear segment before the curve, and vice versa afterward. Both loudspeakers are oriented towards the curved part of the rail trajectory. According to the directivity patterns provided by the manufacturer, a maximum drop of 5dB in the direction of the two ends of the trajectory (i.e., P1 and P92) is expected at high frequencies. A scheme of the location of the two loudspeakers with respect to the trajectory is shown in Fig. 2. The position of the two loudspeakers can be extracted using the accompanying code. Both loudspeakers are located at a height of 1.58m, measured from the floor to the top of the lowest cone.

Microphone configurations

In this section, we describe the microphone arrays used for the recordings along the trajectory. All microphone and loudspeaker labels are summarized in Table 2, and the coordinates of all microphone positions along the trajectory can be retrieved using the accompanying code, as will be detailed in Sect. 7. The exact positioning of the microphones on the cart has been performed using a crossline laser. All microphone array supports are mounted at the same height of 1.782m. For MC1, this corresponds to the height from the floor to the top of the DH and resembles the height of an average person. A picture of the mounted system, including the MC2 array configuration, is shown in Fig. 1.

Table 2.

Microphone and loudspeaker labels

Mic. type Label Description
Microphones Dummy head DHL Left ear
DHR Right ear
Reference microphones RFL Reference microphone left
RFR Reference microphone right
Circular microphone array UCA[n] With index [n] as depicted in Fig. 4a
[n] {1,2,,16}
Crown array CR[n] With index [n] as depicted in Fig. 4a
[n] {1,2,,4}
Linear microphone array ULA[n] With index [n] as depicted in Fig. 4c
[n] {1,2,,12}
Ambisonics A[n]_[a][b] With index [n] as depicted in Fig. 4c
[n] {1,2,3}; individual capsules indexed
by [a][b], [a] {L,R}, [b] {F,B}
Loudspeakers SL Left loudspeaker
SR Right loudspeaker

MC1 and MC2

The first two microphone configurations have similar characteristics, with the only difference being that the DH is present in MC1 but not in MC2. The configurations contain:

  • The in-ear microphones of the DH (only for MC1);

  • Two reference (RF) DPA 4090 microphones located in front of the ear canals of the DH and at a horizontal distance of 1cm;

  • A uniform circular array (UCA) of 16 DPA 4090 microphones, with a radius of 20cm at the height of the ear canals;

  • A “crown array” (CR), consisting of 4 AKG CK32 microphones placed uniformly on a circle with radius 10.5cm, and a height corresponding to the top of the DH.

The two configurations are mounted on an MDF support at the same height, corresponding to the top of the DH in MC1. The DPA 4090 microphones face downwards, with their capsules aligned at the same height as the in-ear microphones of the DH, 164.8cm above the floor. The AKG CK32 microphones are mounted facing upwards, with the capsules at a height of 181.6cm.

Figure 4a and b show the photos and schematics of the two configurations, with the DH facing towards 0° (the nose of the DH is marked by a red triangle).

Fig. 4.

Fig. 4

Pictures (left column) and top view of the polar plots (right column) of the MC1 (a), MC2 (b), and MC3 (c) microphone array configurations

MC3

The third configuration, MC3, consists of a ULA with 12 DPA 4090 microphones, each having an inter-microphone distance of 5cm, and a circular array of three Rode NT-SF1 Ambisonics microphones with a radius of 20cm, as depicted in Fig. 4c. The configuration is mounted on an MDF support with all microphones facing downwards. The DPA 4090 capsules and the centers of the Rode NT-SF1 microphones are positioned at a height of 164.8cm above the ground, consistent with MC1 and MC2. The Rode NT-SF1 microphones are oriented radially, with the front microphones facing outwards. Note that the four capsules are identified by their direction (right-front, left-front, right-back, left-back, defined for the microphone being oriented upwards and viewed from the front).

Array installation on cart

The configurations are mounted on the cart with the direction 0° pointing towards the direction of movement. In other words, the schematics plotted in Fig. 4 represent relative coordinates, that are translated and rotated with respect to the absolute Cartesian coordinate system during movement. The absolute coordinates of the microphones are available in the metadata, as discussed in Sect. 7.2. This results in the DH emulating a person walking along the trajectory, with the left (right) ear on the same side as the left (right) loudspeaker. Due to a horizontal offset between the top and the bottom mounting screws of the dummy head, MC1 is centered 1.85cm in front of the mounting point on the cart, resulting in a small offset of the RIR measurement positions between the MC1 and MC2 configurations.

Recorded signals

The trajectoRIR database contains 3.4 h of audio recordings, with a total size of 1.25 GB for stationary recordings and 6.22 GB for moving recordings. All recordings were captured at a sampling rate of 48kHz with a 24-bit resolution per sample. All recordings were made in the AIL room, where two sides of the room were enclosed with heavy curtains to reduce reflections and external noise. Note that the T20=0.5s of the room was estimated from recordings made with the curtains, the effect is therefore already included in the estimate [14].

The stationary recordings (STAT) involved capturing RIRs between each loudspeaker (SL and SR) and the various microphone configurations at marked positions along the trajectory. The recordings during motion (MOV) captured six distinct source signals from each loudspeaker while the microphone array configurations moved along the trajectory at three different speeds. To ensure similar signal levels, gains were applied during post-processing to all microphone signals such that the reverberant tails of the measured RIRs were approximately equal across microphones. Furthermore, system latencies between all recorded and source signals were compensated as detailed in Sects. 6.1 and 6.2.2.

A summary of all recorded and computed signals is provided in Table 3. The subsequent sections detail the recording and processing methodologies for both the RIRs and moving microphone signals.

Table 3.

Signals recorded and computed in the database

Recording set Signal Type Source Acquisition Label
STAT Sine sweep Meas. Generated Playback + record
RIR RIR. Sine sweeps Computed [41] RIR
MOV Perfect sweep—1kHz Meas. Generated Playback + record PS1
Perfect sweep—8kHz Meas. Generated Playback + record PS8
White noise Noise Generated Playback + record WN
Female speaker Speech [42] Playback + record SP
Drums Music [43] Playback + record DR
Piano Music Recorded Playback + record PI

Room impulse responses

To obtain the RIRs, two exponential sine sweeps were played sequentially by one speaker at a time and recorded using the various microphone configurations at each marked position along the track. Microphone array configurations MC1 and MC2 sampled 46 positions along the trajectory, spaced approximately 10cm apart, while configuration MC3 sampled 92 positions with a finer spacing of approximately 5cm. In total, 8648 RIRs were captured across all configurations and loudspeaker-microphone combinations. The playback and recording processes were managed using Adobe Audition, with each microphone in the relevant configuration assigned to a separate channel.

The recorded sweeps were processed using cross-correlation to compute the RIRs, following the method outlined in [41]. From the two resulting sets of RIRs computed from the two recorded sweeps, we selected only one for the database according to the following procedure. The sets of RIRs were convolved with the original sweeps, and the set yielding the closest match between the result of this convolution and the recorded sweeps was chosen, with the normalized misalignment serving as the criterion. Additionally, system latency was measured using a loop-back signal and compensated for in the RIRs during post-processing.

Figure 5 shows a stacked plot of the RIRs computed at the 92 positions along the trajectory using the ULA1 microphone of the MC3 configuration and SL. The varying distance between the microphone and the loudspeaker is clearly visible from the varying time of arrival of the direct component at the different positions.

Fig. 5.

Fig. 5

Stacked plot of RIRs along the trajectory, computed using the ULA1 microphone of the MC3 configuration and the SL loudspeaker

Recordings during motion

In the recordings captured during motion, the microphone array configurations traveled along the trajectory illustrated in Fig. 2, moving at three constant speeds: 0.2m/s, 0.4m/s, and 0.8m/s. The movement of the cart was controlled via a Python-based web server as described in [38]. For each combination of microphone configuration, loudspeaker, and speed, six different source signals were played and recorded. These source signals comprised a piano piece (PI), a drum track (DR), a female speech signal (SP), a white noise signal (WN), and two perfect sweeps [44] (PS1 and PS8) extending up to 1kHz and 8kHz, respectively.

In total, 108 multichannel recordings were made, with 36 recordings for each microphone configuration. To support noise analysis, additional recordings were made capturing only the mechanical noise from the cart’s movement (labeled ‘Ego-noise’) at the three speeds for the MC3 microphone configuration. Figure 6 shows spectrograms of the recorded ego-noise on microphone ULA1, highlighting the non-stationarity of the background noise across different speeds. Figure 7 displays the power spectral density (PSD) of all recorded signals in MC3, including ego-noise, computed using the Welch method. The results indicate that the majority of the noise energy is concentrated below 250 Hz, and that the recording at 0.4m/s exhibits the lowest noise floor.

Fig. 6.

Fig. 6

Spectrograms of the ‘Ego-noise’ recording on microphone ULA1 for MC3 at a 0.2m/s, b 0.4m/s, and c 0.8m/s

Fig. 7.

Fig. 7

PSD of the various recorded signals on microphone ULA1, using SL and MC3 moving at a 0.2m/s, b 0.4m/s, and c 0.8m/s

Measurement position timestamps

In order to track the microphone positions along the trajectory in relation to time within the recorded signal, an iPad was mounted on the moving cart to record slow-motion video of the rail below at 240 frames per second (FPS). Using DaVinci Resolve, timestamps were manually added to the footage whenever a fixed reference point on the cart reached an RIR measurement position along the trajectory. The audio track from the video was extracted and a reference microphone was chosen for each microphone configuration (UCA1 for MC1 and MC2, and A1RF for MC3). Cross-correlation was then used to estimate the delay between the audio track from the video and the corresponding reference microphone signal. This delay was used to map the video timestamps and hence the passages of the RIR measurement positions on the timeline of the microphone signals. The recorded signal was then truncated to end at 25 s for v=0.2m/s, 15 s for v=0.4m/s, and 10 s for v=0.8m/s, well after the cart reached its final position. Figure 8 illustrates a truncated recorded signal overlaid with the trajectory position timestamps. The timestamps are stored in the mov_timestamps.csv file.

Fig. 8.

Fig. 8

Drum signal from SL recorded using microphone UCA1 in MC1 moving at 0.2 m/s. The vertical lines indicate the timestamps corresponding to specific positions along the trajectory

Latency compensation

The system latency during recordings under motion unexpectedly differed from the measured latency in the RIR recordings. This discrepancy arose from the use of the Python script in [38] instead of Adobe Audition and resulted in a delay in the start of the microphone recording relative to the intended start time. As this latency was not measured during the recording session, it had to be estimated from the recorded data to ensure temporal alignment between the recorded signal and the source signal. Because the system latency was found to be consistent across all recordings, we applied a single shift to obtain time-aligned signals, as described in the following paragraphs.

We start by using a variant of the state-space model outlined in [1] to estimate the so-called time-variant RIR h(km) which relates the source signal, denoted by x(k), to the corresponding recorded signal, denoted by y(k). Here, k is the discrete-time index and m indexes the time shift of the RIR samples. Since RIRs are linear time-invariant (LTI) for any fixed source–receiver location, a “time-variant” RIR in this work simply reflects that the microphone’s location changes over time. Thus, a time-variant RIR is effectively a location-variant RIR. Let l denote the microphone’s one-dimensional location index along its trajectory. If the microphone occupies one location per time step k, we can set l=k, as in [1].

The model is formulated in state-space form as follows:

h(k)=h(k-1)+w(k), 1
y(k)=xT(k)h(k)+v(k), 2

where v(k) represents a noise term, w(k) represents process noise modeling errors and

x(k)=x(k)x(k-1)x(k-M+1)T, 3
h(k)=h(k,0)h(k,1)h(k,M-1)T. 4

An estimate h^(k) of the RIR h(k) is then obtained from Eqs. (1)–(2) using a Kalman filter [45]. Considering that, for alignment purposes, we are primarily interested in obtaining the delay between the direct source components in the estimated and measured RIRs (rather than the detailed structure of later reflections), this method is suitable because the direct path is typically the strongest and most temporally distinct feature in an RIR, and the procedure above reliably captures this peak in the estimated RIR. This estimation process was carried out only for the white noise signals, as they provided the best and most consistent RIR estimates.

Next, let kn denote the time indices corresponding to the timestamps of the RIR measurement positions along the trajectory, where n{1,3,5,,91} for MC1 and MC2, and n{1,2,3,,92} for MC3. The estimated and measured RIRs at time index kn are given by h^(kn) and h(kn), respectively. The delay τmax corresponding to the maximum normalized cross-correlation (NCC) between the estimated and measured RIRs, h^(kn) and h(kn), is computed as

τmax=argmaxτmh(kn,m)h^(kn,m-τ)h(kn)2h^(kn)2.

This delay was consistently found to be approximately the same number of samples across all RIR measurement positions, microphone channels, source speeds, and microphone configurations. Consequently, all recorded signals were shifted by this amount to obtain time-aligned signals,

y~(k)=y(k-τmax).

The aligned signal y~(k) is thus synchronized with both the source signal x(k) and the measured RIRs h(kn). Note that y~(k) is the version of the recorded signal stored in the database. For notational convenience, in the remainder of this paper we still refer to this calibrated version as just y(k).

Temperature recordings

At the time of each signal recording, the ambient room temperature was documented and is provided in the files temperature_STAT.csv and temperature_MOV.csv, corresponding to the RIR and recordings during motion, respectively. This information is important because temperature directly affects acoustic properties, and the recordings were captured over several days with no control over room temperature, resulting in observable temperature variations. Although temperature differences across the entire dataset are expected, it is important that the RIR measurements and recordings during motion associated with the same microphone configuration were captured under similar temperature conditions. In some cases, however, mismatches in temperature were observed within these configuration-specific pairs. For example, in the recordings for MC2, the room temperature during the RIR recordings ranged from 17.2C to 17.9C, whereas during the moving recordings it ranged from 19.4C to 20.6C. Depending on the required level of accuracy, such inconsistencies may need to be corrected and can be done so using the methods described in [4648]. Moreover, temperature fluctuations should be taken into account when trying to reproduce the exact same measurements.

Using the database

This section is devoted to discussing how to use the database: specifically, the file path structure is presented in Sect. 7.1 and the code provided to retrieve geometrical information and load the signals is introduced in Sect. 7.2.

File path structure

As illustrated in Table 4, the files in the database are organized in three root folders: /audio/, containing all the audio files in wav format, /meta/, containing all the metadata in csv format, and /tools/, containing the source code. The audio files are arranged in three subfolders: the loudspeaker signals are stored in the SRC/directory, the RIR recordings are located in the STAT/directory and the recordings during motion are contained in the MOV/directory. The RIR recordings are further divided by loudspeaker (SL or SR), recording configuration (MC1, MC2, and MC3), and position of the array on the trajectory. The recordings during motion, instead, are divided by signal type (according to the labels provided in Table 3), loudspeaker, microphone array configuration and speed (V1, V2, and V3). In both cases, the file name encodes the specific microphone within the configuration.

Table 5.

Scripts to load and use the database

Function name Description (more details can be found in the documentation)
load_RIR_data Load RIRs recorded using a chosen loudspeaker, an arbitrary subset of microphones within a microphone array configuration, and an arbitrary set of positions along the trajectory. Geometry and temperature information are also loaded.
load_mov_data Load recordings during motion using a chosen loudspeaker, speed, audio signal, and an arbitrary subset of microphones within a microphone array configuration. Timestamps and temperature information are also loaded.
load_coordinates Load and optionally plot microphone and loudspeaker coordinates.

Table 4.

File path structure of the database

Root Signala
Source signal /audio/ SRC/ [s].wav
Root Rec set Signala Speakerb Mic. Conf.c Pos./Speedd File nameb
Mic signal /audio/ STAT/ RIR/ [l]/ [m]/ P[n]/ [u].wav
MOV/ [s]/ [l]/ [m]/ V[a]/ [u].wav
Root File name
Metadata file /meta/ temperature_STAT.csv
temperature_MOV.csv
mov_timestamps.csv
cart_pose.csv
mic_coordinates.csv
Root Language Functione
Code file /tools/ Python/ [f].py

a The signal label [s] takes the forms as defined in Table 3

b The speaker labels [l] and the microphone label [u] take the forms as defined in Table 2

c The position [n] and speed information [a] take the form as defined in Table 6

d The microphone configuration label [m] takes the form as defined in Fig. 4

e The script or function names [f] take the forms as defined in Table 5

The provided metadata is organized in five tables, stored in the/meta/directory. The temperature_STAT.csv and temperature_MOV.csv tables contain the temperature in Celsius degrees registered at the measurement time for the RIR and recordings during motion, respectively. The mov_timestamps.csv table contains, for each recording during motion, the timestamps corresponding to the time instants when the moving cart passed on each of the 46 (or 92, for MC3) positions marked on the rail. The cart_pose.csv table contains the position of the two loudspeakers, and the position of the cart (i.e., the mounting point of the microphone array) as well as its horizontal rotation angle for each of the 92 marked positions. Finally, the mic_coordinates.csv table contains the Cartesian coordinates of each microphone at each of the marked positions on the rail (Table 6).

The folder /tools/contains Python scripts for accessing audio data and retrieving geometrical information from the tables described above. Further details are provided in Sect. 7.2, and accompanying examples of code usage are provided as well in the /tools/directory.

Retrieving geometry information

The overall positions of the central axis of the cart, which served as a mounting point for the different microphone configurations, were first measured relative to the Cartesian coordinate system originating at the start of the trajectory (see Fig. 2). In particular, for each position the pose of the cart was measured, indicating both its position and a rotation angle with respect to the orientation at the first position of the trajectory.

Relative microphone positions were measured with respect to the center of the array for each configuration. By combining the cart pose with the relative microphone positions through shifting and rotation matrix multiplications, all microphone geometry could be retrieved and is provided in the mic_coordinates.csv file. The absolute positions of all microphones for the MC1, MC2, and MC3 configurations can be retrieved through the script load_coordinates provided in /tools/. The script also provides plotting functionality to visualize the (selected) positions on the trajectory.

Use cases

As previously mentioned, accurate modeling of acoustic propagation is important for a range of tasks, including auralization, speech processing, and soundfield analysis. For a stationary source and microphone, this requires knowledge of the RIR at the microphone location. When the microphone moves, the relevant model becomes a time-variant RIR h(k), as defined in Sect. 6.2.2, where each time index k corresponds to the RIR at the microphone’s instantaneous location along its trajectory.

In practice, measuring an RIR at every time step is infeasible. As a result, existing approaches generally estimate h(k) using one of the following three types of information: (i) a sparse set of stationary RIR measurements, (ii) a moving-microphone recording, or (iii) a hybrid combination of both. This section presents representative use cases for each scenario and illustrates how the trajectoRIR dataset supports their evaluation. Such approaches require calibrated timestamps of the RIR measurement positions to associate the microphone location with the correct sample index in the recorded signal.

All examples in this section use downsampled 16 kHz audio from the MC3 microphone configuration, with the cart moving at a constant velocity of 0.8 m/s. Under this constant-speed condition, each discrete time index corresponds to an equally spaced spatial sample. The time-variant RIR vector h(k) is defined as in Eq. (4) and measured RIRs are available at indices kn, for n=1,,92, and denoted by h(kn).

For the purpose of this section, we focus exclusively on the early part of each RIR, using truncated RIRs that contain only the early reflections. This is the part most relevant for applications such as speech processing and motion-aware acoustic modeling.

The remainder of this section is organized as follows. The individual estimation approaches are first briefly reviewed in Sects. 8.1 and 8.2, after which they are evaluated using the trajectoRIR database in Sect. 8.3.

Estimation of time-variant RIRs from sparse RIR measurements

The first scenario addresses estimation of h(k) when only sparsely sampled RIRs along the trajectory are known. A standard approach is linear interpolation between consecutive measured RIRs h(kn) and h(kn+1). To ensure consistent alignment, we use dynamic-time-warping (DTW) based matching [49] to identify the time indices of the direct component and prominent reflections in each pair of consecutive RIRs. Denote these as mr(kn) and mr(kn+1). The interpolated time indices at any k[kn,kn+1] are then computed as

mr(k)=mr(kn)+β(k)(mr(kn+1)-mr(kn)). 5

where β(k)[0,1] satisfies β(kn)=0 and β(kn+1)=1, and is computed from the relative direct source–microphone distances. Reflection amplitudes are interpolated analogously, yielding the estimate h^(k) for all intermediate time indices between measured RIR positions.

Estimation of time-variant RIRs using moving-microphone signals

We next consider approaches formulated in a state-space representation, where the observation equation relates the unknown time-variant RIR to the recorded microphone signal and the state equation models how h(k) evolves over time. Two cases are examined: (i) a purely data-driven method that relies solely on the moving-microphone recording, and (ii) a hybrid method that uses the microphone recording together with a physical model of the evolution of the direct path and early reflections. The parameters of this physical model are extracted from sparsely available RIR measurements.

Purely data-driven Kalman-filter estimation

When no RIR measurements are available, the time-variant RIR can be estimated from the known source and moving-microphone signals (x(k),y(k)). We adopt the state-space model in Eqs. (1)–(2), where the state equation (1) is a first order Markov process of the form h(k)=αh(k-1)+w(k) with α=1. All temporal variability is absorbed into the process noise w(k), yielding a fully data-driven formulation without structural priors. A Kalman filter is applied to obtain the adaptive estimate h^(k). This approach is closely related to adaptive echo-path tracking [50] and can reliably estimate early RIR components even without physical modeling included.

Hybrid Kalman-filter estimation with sparse RIR measurements

We now incorporate a physical model into the state equation by defining a time-variant transition matrix A(k). This would lead to replacing the state equation in (1) with

h(k)=A(k)h(k-1)+w(k), 6

To this end, we follow the method proposed in [51] and let the trajectory be partitioned into linear segments s=1,,S, each spanning

Ks={ks|st,,ks|en}. 7

where ks|st and ks|en are, respectively, the start and end time indices of the segment and correspond to measured RIR positions. Within each segment, it is assumed that individual reflection TOA intervals do not overlap. For kKs\{ks|st}, we aim to approximate a fixed transition matrix, As, for that segment such that

h(k)Ash(k-1), 8

leading to the piecewise-constant transition model

A(k)=As,kKs\{ks|st}. 9

The transition matrices As, serve as a room acoustic prior assuming sound propagation according to the image source model (ISM) and are estimated by interpolating TOAs of the direct and reflected components between the measured RIRs available at the segment boundaries as detailed in [1, 51]. This method therefore requires both a measured moving-microphone signal and calibrated sparse RIR measurements. As in 8.2.1, a Kalman filter is used to obtain an estimate h^(k).

Evaluation of estimated time-variant RIRs

We now evaluate the three methods introduced above: (1) linear interpolation (algorithm LI) as in Sect. 8.1, (2) the purely data-driven Kalman filter (algorithm KF-α) as in Sect. 8.2.1, and (3) the hybrid Kalman filter (algorithm KF-A(l)) as in 8.2.2. Two criteria are used: (i) accuracy of synthesized moving-microphone signals, and (ii) agreement between estimated and measured RIRs at known measurement position indices kn that were not used during estimation.

The synthesized moving-microphone signal y^(k) is obtained using the respective estimated time-variant RIR h^(k) as

y^(k)=xT(k)h^(k). 10

and accuracy is quantified via maximum normalized cross-correlation between y^(k) and y(k), restricted to a lag of ±1 samples to compensate for minor timing shifts. The resulting correlation coefficient is defined as

ρ=maxτ{-1,0,1}k(y^(k+τ)-y^¯τ)(y(k)-y¯)k(y^(k+τ)-y^¯τ)2k(y(k)-y¯)2, 11

where y¯ is the mean of the measured signal y(k) and y^¯τ is the mean of the shifted estimate y^(k+τ).

To assess individual RIR accuracy, local time alignment is first obtained by selecting the lag that maximizes the normalized cross-correlation:

λmax(kn)=argmaxλ{-1,0,1}h(kn)·h^(kn-λ)h(kn)2h^(kn-λ)2. 12

The normalized misalignment (NM) is then computed as

NMdB(kn)=20log10h^(kn-λmax(kn))-h(kn)2h(kn)2, 13

Experiment 1

The first experiment examines how the spacing of RIR measurement positions affects the accuracy of synthesized moving-microphone signals in the context of the interpolation technique described in 8.1. Using the linear segment n=131, we apply algorithm LI for six different excitation signals (WN, SP, PI, DR, PS8, PS1) and eight RIR measurement spacings {5,10,15,25,30,50,75,150}cm.

Figure 9 shows the resulting correlation coefficients for each different spacing used, obtained according to Eq. (11). Overall, it can be observed that for all excitation signals, the correlation coefficient tends to decrease as the spacing between known RIRs increases, and generally remains relatively low. This indicates that interpolation from sparse RIR measurements alone does not adequately reproduce moving-microphone signals and that interpolation accuracy reduces if spacing increases. Physical effects that are not captured by stationary RIR measurements, including mechanical noise, also contribute to the recorded signal.

Fig. 9.

Fig. 9

Correlation coefficient computed using Eq. (11) between measured and synthesized moving-microphone signals using algorithm LI

Experiment 2

Since interpolation yields limited accuracy in moving-microphone signal synthesis, as seen in Experiment 1, we now additionally consider the two methods that make use of a moving-microphone recording. Specifically, here the recorded WN signal is used as the observation signal in Eq. (2) for the algorithms KF-α and KF-A(l), while the interpolation method LI is still included for comparison. For simplicity, we focus on the linear section of the full curved trajectory from k1 to k34, although the same methods can be applied to the entire trajectory, as demonstrated in [51]. Both LI and KF-A(l) require sparsely measured RIRs as an input. In this experiment, we use the RIRs at the subset of measurement position indices n¯=[1,2,3,5,9,13,17,21,23,25,28,32,34]. Algorithm KF-α relies on the moving-microphone recording alone. Both Kalman filter–based methods are initialized with h(k1).

For all methods, the normalized misalignment defined in Eq. (13) is computed at measurement indices not used during estimation (i.e., excluding n¯), and the results are shown in Fig. 10. Table 7 presents the correlation coefficient between synthesized and measured moving-microphone signals for each method and excitation signal, computed using Eq. (11). It can be seen in Fig. 10 that the algorithm LI generates time-variant RIRs that most closely match the known stationary measurements, yet as shown in Table 7 it yields the poorest synthesized microphone signals. Conversely, algorithm KF-α provides the most accurate synthesized signals while producing RIR estimates that deviate the most from the stationary measurements. The hybrid model, algorithm KF-A(l), offers a middle ground, achieving synthesized microphone signals that are almost as well correlated as those produced by KF-α, while providing more accurate estimates at the stationary RIR positions. This results in the most consistent performance across both evaluation metrics.

Fig. 10.

Fig. 10

Normalized misalignment between estimated and measured RIRs computed according to Eq. (13). The vertical dashed lines indicate the measurement positions of the RIRs used in LI and KF-A(l)

Table 7.

Correlation coefficient computed using Eq. (11) between synthesized and measured moving-microphone signals

KF-α KF-A(l) LI
WN 0.9306 0.80724 0.58673
DR 0.86285 0.76470 0.38405
PI 0.59173 0.59080 0.48307
SP 0.73884 0.73793 0.52498
PS1 0.83686 0.78301 0.70912
PS8 0.69060 0.61251 0.51761

Use cases summary

Overall, the results highlight the importance of jointly using moving-microphone recordings and sparse stationary RIRs for reliable time-variant RIR estimation. The dataset thus allows testing of algorithms aimed at dynamic auralization, acoustic modeling, and audio rendering in motion-dependent scenarios.

Conclusion

In this paper, we introduced trajectoRIR, a database of room acoustics recordings obtained with microphones moving on a controlled trajectory in a room. The database contains both RIRs, obtained at a set of equally spaced positions along the trajectory, and audio recordings during motion, obtained with the microphones moving along a trajectory at a constant speed. In both stationary recordings and recordings during motion, two loudspeakers were used, located on opposite sides of the trajectory and positioned at equal distances from the start and end points. Three microphone array configurations were used, including two circular arrays, a linear array, a dummy head with two additional reference microphones positioned next to the ear canals, and three first-order ambisonics microphones. For recordings during motion, three speeds were used, as well as six different signals, including sweeps, speech, noise, and music. Python scripts for accessing audio data, coordinates, and temperature are provided and described.

The collection of matched stationary RIRs and recordings of audio during motion along the same controlled trajectory promotes the adoption of trajectoRIR for applications including time-variant RIR estimation, spatially dynamic sound field reconstruction, auralization, and evaluation of dynamic audio simulations. Example use-cases related to time-variant RIR estimation have been discussed in Sect. 8.

Currently, only a single room and geometric setup has been considered. However, the proposed measurement setup is reproducible and, thanks to the modularity of the rail system, additional data with different geometrical configurations (e.g., room, trajectory, microphone and loudspeaker configurations) could be collected in the future, to extend this database. The database is available at [37].

Acknowledgements

The authors would like to thank Dante Van Oeteren and Koen Eelen for their Master’s thesis work, that lead to the design of the robotic cart and rail system that enabled the collection of the database presented in this paper.

Abbreviations

AIL

Alamire Interactive Laboratory

CR

Crown array

DH

Dummy head

DR

Drum track

FOA

First-Order Ambisonics

FPS

Frames per second

MC1

Microphone configuration 1

MC2

Microphone configuration 2

MC3

Microphone configuration 3

MDF

Medium-density fiberboard

MOV

Recordings during motion

NCC

Normalized cross-correlation

PI

Piano track

PS1

Perfect sweeps up to 1 kHz

PS8

Perfect sweeps up to 8 kHz

PSD

Power spectral density

RF

Reference microphone

RIR

Room impulse response

SL

Left loudspeaker

SP

Speech track

SR

Right loudspeaker

SRC

Source signal

STAT

Stationary recordings

UCA

Uniform circular array

ULA

Uniform linear array

WN

White noise track

Authors’ contributions

SD, KM, VL, TD, and TVW jointly designed the recording setup and methodology. SD, KM, VL, and TD acquired and post-processed the audio data, and compiled the database. SD and KM drafted the manuscript. All authors read and reviewed the final manuscript.

Funding

This research work was carried out at the ESAT Laboratory of KU Leuven, in the frame of KU Leuven internal funds C3/23/056 and C14/21/075, FWO SBO Project S005525N, and FWO Research Project G0A0424N. The research leading to these results has received funding from the Flemish Government (AI Research Program), from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No. 956369 and No. 956962, and from the European Research Council under the European Union’s Horizon 2020 research and innovation program/ERC Consolidator Grant: SONORA (no. 773268). This paper reflects only the authors’ views and the Union is not liable for any use that may be made of the contained information.

Data availability

The database, including metadata and accompanying code is publicly available at [37]. The CAD files used to create all the rail and cart blocks, as well as the Python code to operate the robotic cart is publicly available at [40].

Declarations

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Stefano Damiano and Kathleen MacWilliam contributed equally to this work.

References

  • 1.K. MacWilliam, T. Dietzen, R. Ali, T. van Waterschoot, State-space estimation of spatially dynamic room impulse responses using a room acoustic model-based prior. Front. Signal Process. (2024). 10.3389/frsip.2024.1426082 [Google Scholar]
  • 2.D. Diaz-Guerra, A. Miguel, J.R. Beltran, Robust sound source tracking using SRP-PHAT and 3D convolutional neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 300–311 (2021). 10.1109/TASLP.2020.3040031 [Google Scholar]
  • 3.R. Ali, A. Christian, Source-time dominant modeling of the doppler shift for the auralization of moving sources. Acta Acust. (2025). 10.1051/aacus/2024073 [Google Scholar]
  • 4.C. Quan, X. Li, Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers. arXiv:2403.07675 (2024). 10.48550/arXiv.2403.07675
  • 5.M. Nophut, S. Preihs, J. Peissig, Velocity-controlled kalman filter for an improved echo cancellation with continuously moving microphones. J. Audio Eng. Soc. 72, 33–43 (2024). 10.17743/jaes.2022.0116 [Google Scholar]
  • 6.F. Katzberg, R. Mazur, M. Maass, P. Koch, A. Mertins, A compressed sensing framework for dynamic sound-field measurements. IEEE/ACM Trans. Audio Speech Lang. Process. 26(11), 1962–1975 (2018). 10.1109/TASLP.2018.2851144 [Google Scholar]
  • 7.T. van Waterschoot, in Proc. 11th Convent. Europ. Acoust. Assoc. Forum Acusticum/EuroNoise 2025. Deep, data-driven modeling of room acoustics: literature review and research perspectives (Málaga, 2025), pp. 4065–4072. 10.61782/fa.2025.0264
  • 8.Z. Zheng, P. Peng, Z. Ma, X. Chen, E. Choi, D. Harwath, in Proc. 41st Int. Conf. Machine Learn. (ICML). BAT: Learning to Reason about Spatial Sounds with Large Language Models (Vienna, 2024). 10.48550/arXiv.2402.01591
  • 9.Y. He, A. Cherian, G. Wichern, A. Markham, in Proc. 41st Int. Conf. Machine Learn. (ICML). Deep Neural Room Acoustics Primitive (Vienna, 2024), pp. 17842–17857. 
  • 10.J.K. Nielsen, J.R. Jensen, S.H. Jensen, M.G. Christensen, in Proc. 14th Int. Workshop Acoust. Signal Enhancement (IWAENC). The single- and multichannel audio recordings database (SMARD) (Juan-les-Pins, 2014), pp. 40–44. 10.1109/IWAENC.2014.6953334
  • 11.J. Eaton, N.D. Gaubitch, A.H. Moore, P.A. Naylor, in Proc. 2015 IEEE Workshop Appls. Signal Process. Audio Acoust. (WASPAA). The ACE challenge 2014; Corpus description and performance evaluation (New Paltz, 2015), pp. 1–5. 10.1109/WASPAA.2015.7336912
  • 12.W.S. Woods, E. Hadad, I. Merks, B. Xu, S. Gannot, T. Zhang, in Proc. 2015 IEEE Workshop Appls. Signal Process. Audio Acoust. (WASPAA). A real-world recording database for ad hoc microphone arrays (New Paltz, 2015), pp. 1–5. 10.1109/WASPAA.2015.7336915
  • 13.R. Sheelvant, B. Sharma, M. Madhavi, R.K. Das, S.R.M. Prasanna, H. Li, in Proc. 22nd Conf. Oriental COCOSDA Int. Committee Co-ordination Standardisation Speech Databases Assessment Techniques (O-COCOSDA). RSL2019: A Realistic Speech Localization Corpus (IEEE, Cebu, 2019), pp. 1–6. 10.1109/O-COCOSDA46868.2019.9060842
  • 14.T. Dietzen, R. Ali, M. Taseska, T. van Waterschoot, MYRiAD: a multi-array room acoustic database. EURASIP J. Audio Speech Music Process. (2023). 10.1186/s13636-023-00284-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.R. Stewart, M. Sandler, in Proc. 2010 IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). Database of omnidirectional and B-format room impulse responses (Dallas, 2010), pp. 165–168. 10.1109/ICASSP.2010.5496083
  • 16.E. Hadad, F. Heese, P. Vary, S. Gannot, in Proc. 14th Int. Workshop Acoust. Signal Enhancement (IWAENC). Multichannel audio database in various acoustic environments (Juan-les-Pins, 2014). 10.1109/IWAENC.2014.6954309
  • 17.J. Čmejla, T. Kounovský, S. Gannot, Z. Koldovský, P. Tandeitnik, in Proc. 28th European Signal Process. Conf. (EUSIPCO). Mirage: Multichannel database of room impulse responses measured on high-resolution cube-shaped grid (2021), pp. 56–60. 10.23919/Eusipco47968.2020.9287646
  • 18.A.V. Venkatakrishnan, P. Pertila, M. Parviainen, in Proc. 29th European Signal Process. Conf. (EUSIPCO). Tampere University Rotated Circular Array Dataset (Dublin, 2021), pp. 201–205. 10.23919/EUSIPCO54536.2021.9616072
  • 19.S. Koyama, T. Nishida, K. Kimura, T. Abe, N. Ueno, J. Brunnstrom, in Proc. 2021 IEEE Workshop Appls. Signal Process. Audio Acoust. (WASPAA). MESHRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods (New Paltz, 2021), pp. 1–5. 10.1109/WASPAA52581.2021.9632672
  • 20.T. McKenzie, L. McCormack, C. Hold, Dataset of spatial room impulse responses in a variable acoustics room for six degrees-of-freedom rendering and analysis. arXiv:2111.11882 (2021). 10.48550/arXiv.2111.11882
  • 21.S. Zhao, Q. Zhu, E. Cheng, I.S. Burnett, A room impulse response database for multizone sound field reproduction (L). J. Acoust. Soc. Amer. 152(4), 2505–2512 (2022). 10.1121/10.0014958 [DOI] [PubMed] [Google Scholar]
  • 22.F. Miotello, P. Ostan, M. Pezzoli, L. Comanducci, A. Bernardini, F. Antonacci, A. Sarti, in Proc. 2024 IEEE Int. Conf. Acoust. Speech Signal Process. Workshops (ICASSPW). HOMULA-RIR: A room impulse response dataset for teleconferencing and spatial audio applications acquired through higher-order microphones and uniform linear microphone arrays (2024), pp. 795–799. 10.1109/ICASSPW62465.2024.10626753
  • 23.A. Kujawski, A.J.R. Pelling, E. Sarradj, MIRACLE–a microphone array impulse response dataset for acoustic learning. EURASIP J. Audio Speech Music Process. 2024(1), 32 (2024). 10.1186/s13636-024-00352-8 [Google Scholar]
  • 24.L. Treybig, F. Klein, G. Stolz, S. Werner, S.V.A. Gari, in Proc. 50th Jahrestagung für Akustik (DAGA). A High Spatial Resolution Dataset of Spatial Room Impulse Responses for Different Acoustic Room Configurations (Hannover, 2024), pp. 248–250
  • 25.W. He, P. Motlicek, J.M. Odobez, in Proc. 2018 IEEE Int. Conf. Robotics Automation (ICRA). Deep Neural Networks for Multiple Speaker Detection and Localization (IEEE, Brisbane, 2018), pp. 74–79. 10.1109/ICRA.2018.8461267
  • 26.M. Strauss, P. Mordel, V. Miguet, A. Deleforge, in Proc. 2018 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS). DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization (Madrid, 2018), pp. 1–8. 10.1109/IROS.2018.8593581
  • 27.C. Evers, H.W. Lollmann, H. Mellmann, A. Schmidt, H. Barfuss, P.A. Naylor, W. Kellermann, The LOCATA challenge: acoustic source localization and tracking. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 1620–1643 (2020). 10.1109/TASLP.2020.2990485 [Google Scholar]
  • 28.A. Politis, S. Adavanne, T. Virtanen, A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. arXiv:2006.01919 (2020). 10.48550/arXiv.2006.01919
  • 29.A. Politis, K. Shimada, P. Sudarsanam, S. Adavanne, D. Krause, Y. Koyama, N. Takahashi, S. Takahashi, Y. Mitsufuji, T. Virtanen, STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events. arXiv:2206.01948 (2022). 10.48550/arXiv.2206.01948
  • 30.J. Brunnström, M.B. Møller, T. van Waterschoot, M. Moonen, J. Østergaard, in Proc. 11th Convent. Europ. Acoust. Assoc. Forum Acusticum/EuroNoise 2025. Experimental validation of sound field estimation methods using moving microphones (Malaga, 2025), pp. 4111–4118. 10.61782/fa.2025.0379
  • 31.G. Lathoud, J.M. Odobez, D. Gatica-Perez, in Machine Learning for Multimodal Interaction. AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking, vol. 3361 (Springer, Berlin Heidelberg, 2005), pp. 182–195. 10.1007/978-3-540-30568-2_16
  • 32.A. Deleforge, R. Horaud, in Latent Variable Analysis and Signal Separation. A Latently Constrained Mixture Model for Audio Source Separation and Localization, vol. 7191 (Springer, Berlin Heidelberg, 2012), pp. 372–379. 10.1007/978-3-642-28551-6_46
  • 33.X. Alameda-Pineda, J. Sanchez-Riera, J. Wienke, V. Franc, J. Čech, K. Kulkarni, A. Deleforge, R. Horaud, RAVEL: an annotated corpus for training robots with audiovisual abilities. J. Multimodal User Interfaces 7(1–2), 79–91 (2013). 10.1007/s12193-012-0111-y [Google Scholar]
  • 34.J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Amer. 65(4), 943–950 (1979). 10.1121/1.382599 [Google Scholar]
  • 35.M. Vorländer, Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality (RWTHedition (Springer, Cham, 2020). 10.1007/978-3-030-51202-6 [Google Scholar]
  • 36.T. van Waterschoot, KU Leuven ESAT-STADIUS Audio Research Labs (2022). https://lirias.kuleuven.be/3940173. Accessed 27 Feb 2026
  • 37.S. Damiano, K. MacWilliam, V. Lorenzoni, T. Dietzen, T. van Waterschoot, Data repository for the trajectoRIR database: room acoustic recordings along a trajectory of moving microphones (2025). 10.5281/zenodo.15564430
  • 38.D. Van Oeteren, K. Eelen, Development of a mechatronic lab system for dynamic acoustic experiments. Master’s thesis, KU Leuven, Faculty of Engineering Technology, Leuven (2023).
  • 39.I. Dokmanic, R. Parhizkar, J. Ranieri, M. Vetterli, Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Signal Process. Mag.e 32(6), 12–30 (2015). 10.1109/MSP.2015.2398954 [Google Scholar]
  • 40.D. Van Oeteren, K. Eelen. thesisstadius. (2023). GitHub repository: https://github.com/DanteVanOeteren/thesisstadius/tree/main. Accessed 30 Sep 2024
  • 41.M. Holters, T. Corbach, U. Zölzer, in Proc. 12th Int. Conf. Digital Audio Effects (DAFx09). Impulse Response Measurement Techniques and their Applicability in the Real World (Como, 2009), pp. 108–112
  • 42.J. Yamagishi, C. Veaux, K. MacDonald. Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit (version 0.92) (2019). 10.7488/ds/2645
  • 43.Anti-Everything. Federation day. children of a globalised world. Musical Album, ISRC: TTA101100005 (2011)
  • 44.C. Antweiler, A. Telle, P. Vary, G. Enzner, in Proc. 2012 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’12). Perfect-sweep NLMS for time-variant acoustic system identification (2012), pp. 517–520. 10.1109/ICASSP.2012.6287930
  • 45.D. Simon, Optimal state estimation: Kalman, H infinity, and Nonlinear Approaches. John Wiley & Sons. 2006.
  • 46.B.N. Postma, B.F. Katz, Correction method for averaging slowly time-variant room impulse response measurements. J. Acoust. Soc. Am. 140(1), EL38–EL43 (2016). 10.1121/1.4955006 [DOI] [PubMed] [Google Scholar]
  • 47.K. Prawda, S.J. Schlecht, V. Välimäki, in Proc.of Forum Acusticum 2023. Time variance in measured room impulse responses (2023), pp. 1–8. 10.61782/fa.2023.0398 [DOI]
  • 48.S.S. Bhattacharjee, J.R. Jensen, M.G. Christensen, Sound speed perturbation robust audio: impulse response correction and sound zone control. IEEE Trans. Audio Speech Lang. Process. 33, 2008–2020 (2025). 10.1109/TASLPRO.2025.3570949 [Google Scholar]
  • 49.G. Kearney, C. Masterson, S. Adams, F. Boland, in Proc. 17th Europ. Signal Process. Conf. (EUSIPCO). Dynamic time warping for acoustic response interpolation: Possibilities and limitations (Glasgow, 2009), pp. 705–709
  • 50.G. Enzner, in Proc. 18th Europ. Signal Process. Conf. (EUSIPCO). Bayesian inference model for applications of time-varying acoustic system identification (Aalborg, 2010), pp. 2126–2130
  • 51.K. MacWilliam, T. Dietzen, T. van Waterschoot, in Proc. 11th Convent. Europ. Acoust. Assoc. Forum Acusticum/EuroNoise 2025. Tracking of spatially dynamic room impulse responses along locally linearized trajectories (Malaga, 2025), pp. 4103–4110. 10.61782/fa.2025.0913

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The database, including metadata and accompanying code is publicly available at [37]. The CAD files used to create all the rail and cart blocks, as well as the Python code to operate the robotic cart is publicly available at [40].


Articles from Journal on Audio, Speech, and Music Processing are provided here courtesy of Springer

RESOURCES