Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 1.
Published in final edited form as: Early Hum Dev. 2023 Feb 15;177-178:105723. doi: 10.1016/j.earlhumdev.2023.105723

Enhancing Motor Screening Efficiency: Toward an Empirically Derived Abridged Version of the Alberta Infant Motor Scale

Bharath Modayur 1, Teresa Fair-Field 2, Sheri Komori 3
PMCID: PMC10084811  NIHMSID: NIHMS1876836  PMID: 36841200

Abstract

Use of machine learning (ML) in the early detection of developmental delay is possible through the analysis of infant motor skills, though the large number of potential indicators limits the speed at which the system can be trained. Body joint obstructions, the inability to infer aspects of movement such as muscle tone and volition, and the complexities of the home environment – confound machine learning’s ability to distinguish between some motor items. To train the system efficiently requires using an excerpted list of validated items, a salient set, which uses only those motor items that are the ‘easiest’ to see and identify, while being the most highly correlated to a low/qualifying score. This work describes the examination of motor items, selection of 15 items that comprise the salient set, and the ability of the set to reliably screen for motor delay in the first-year infant.

Keywords: developmental delay, infant assessment, motor development, milestones, machine learning

Introduction

Early intervention provides a child’s best opportunity to ameliorate the potential effects of developmental delay, so improving infant screening is of critical importance. In-person assessment by developmental experts remains the standard point of entry to intervention, but known challenges of access, equity, and outreach put intervention in the first years of life out of reach for many children and families [1]. Detection of developmental delay in one or more motor milestones is one possible way children can access services, but measurement of this developmental domain is currently limited by several challenges. Delayed motor milestones are assessed during well-child visits with primary care physicians, but this approach relies heavily upon the parent’s observational accuracy since the physician may have very little time to observe and examine the child across multiple domains. Although parent report is considered generally accurate it may be affected by parental education and motivation [2]. Even when motor milestone achievements are reported as grossly met, attention to more subtle risk factors such as low tone, asymmetry, or a paucity of spontaneous movements are often overlooked.

The use of a standardized motor assessment tool may increase the accuracy of an examination but is not always feasible due to time constraints and lack of expert examiners. Further, poor and underserved parents may lack a consistent provider for their child or may not have access to a provider with adequate training to identify developmental concerns [1]. An observation-based short-item screener could provide objective analysis of motor markers that can be conducted quickly with minimal specialized training.

Pediatric physical and occupational therapists in early intervention settings often prefer evaluation tools that provide a multi-factorial assessment using a team-based approach; however, the Alberta Infant Motor Scales (AIMS) [3] is a widely used and culturally validated tool for detecting motor delay in infants, especially from 3 months to those approaching their first year. This motor-specific instrument takes approximately 20–30 minutes to administer [4], is in use worldwide [5, 6], and demonstrates strong psychometric properties, with concurrent validity in video observation [79]. While initially developed to “fill a gap in the clinical practice of pediatric physical and occupational therapy” [3], the AIMS is used across clinical environments, research settings, and as a foundation for parent education [2].

Observational Capacity of the Evaluator

The AIMS requires direct interaction with each infant by an expert evaluator or by a parent prompted by an expert evaluator (via video or telehealth assessment) to produce scorable infant movements. Both direct examiner interaction and prompted parent interaction require the time-intensive involvement of one or more parent(s) in addition to the examiner, whether simultaneously engaged in the assessment, in the act of capturing video, in training the parent what specific movements to capture, or in the review and analysis of collected material. While asynchronous video is more convenient than that collected synchronously by an expert, (whether in office or telehealth), video captured by the parents may still fail to be representative of a child’s motor skill for a variety of reasons [8]. If video snippets were to be collected from the child’s natural environments spontaneously throughout the day or week and submitted to a cloud server via the parent’s smartphone (as easily as one shares with friends and loved ones), it may be possible to automate the extraction of detectable motor items for analysis using a standardized infant motor assessment tool such as the AIMS.

Training a machine learning (ML) system to automate assessment from observational data (e.g., video) is possible and emerging in early detection research, though increased efficiency requires an excerpted set of salient items. Extraction of motor items from video data requires skilled human resources for both development and deployment. Automating the extraction of all 58 AIMS items from videos is impractical. The limited information obtained from videos (e.g., body joints obstructed in view), as well as its inability to infer aspects of movement such as muscle tone and volition, confounds ML’s ability to distinguish between certain items. A short list of empirically derived salient AIMS motor items, in combination with known motor signs such as head lag [10, 11] could propel early screening and surveillance efforts across environments.

This project evaluated whether an abridged set of items could be derived from the AIMS to optimize identification of motor delay in infants in the first year of life. We specifically asked 1) Does an abridged set of AIMS items acceptably screen for motor delay in the first year of life? 2) Which items comprise this salient set? and 3) How many items are needed to reach acceptability criteria (established a priori as R2 = 0.9 or and r = 0.9)? Use of an abridged set cuts AIMS administration and scoring time reduced by half, a crucial step toward developing an automated motor assessment that can predict an infant’s motor development from observational videos [18].

Methods

Subjects

Assessment data was obtained as part of an infant movement study – Modeling Infant Motor Movement (MIMM) – designed to gather high-quality observation data for training an ML system to estimate infant pose and motor activity. Procedures were approved by Ethical & Independent Review Services (https://www.eandireview.com/) and all participants received informed consent. Participants were recruited through parental support groups, social media, and posted flyers at an early intervention partner clinic. Inclusion criteria recruited infants between 3- and 15-months at the time of observation, both typically developing as well as those with a suspected motor condition, but no present diagnosis. A total of 102 assessment videos were coded by trained clinicians with the following distribution of age groups: Group 1: ≤ 5-months-old (n = 34); Group 2: 5–8-months-old (n = 29); Group 3: 8–11-months-old (n = 25); and Group 4: 11–15-months-old (n=14).

Establishing Ground Truth Using a Standardized Measure

The Alberta Infant Motor Scale [3], was normed using a substantial Canadian sample (2200 infants) to measure gross motor development in infants from birth through 18-months-old using a 58-item total set, divided into 4 ‘positional’ subscales: Prone (21 items; from Prone Lying to Reciprocal Creeping), Supine (9 items; from Supine Lying through Rolling Supine to Prone with Rotation), Sitting (12 items; from Sitting with Support through Sitting without Arm Support) and Standing (16 items; from Supported Standing to Squat). The infant’s achievement of an item (and the evaluator’s ability to score it using an observational approach), relies upon qualitative aspects of movement such as ‘postural alignment, control, balance, and coordination’ [3, p. 26] which goes beyond the static observation of a posture and indicates an infant’s ability to accomplish the postural control, anti-gravity, and weight bearing necessary to achieve and sustain a pose.

While a machine learning system is limited in its ‘observational’ capacity, ground truth coding is a method used to train a machine learning system to identify when these qualitative aspects of movement have been met. The process of developing these supervised learning algorithms means providing the machine learning system with expert-labeled data that a movement has been achieved according to the qualitative constructs of the Alberta Infant Motor Scale. In this way, the supervised learning algorithm can be trained to recognize patterns through video images of infants demonstrating postural alignment, control, and coordination in their achievement of a pose. Two trained experts – a doctoral occupational therapist (author Fair-Field, who has decades of experience in infant/child assessments) and a registered nurse, trained in AIMS administration and coding – completed ground truth coding of video snippets from the MIMM study. The data from this ground truth set was then used to train the machine learning system to recognize these qualitative constructs in future observations when marking a pose as ‘achieved.’ Prior to ground-truth coding the MIMM dataset (n=102), the two coders scored a separate, non-overlapping dataset (n=11 in the 3-month-old group and n=9 in the 6-month-old group) over a 3-month period. This preparatory coding task was executed to establish inter-rater reliability. Intraclass correlation coefficients (ICC2; absolute agreement model, two-way random, average score) were determined for the two raters on this preparatory set. For the 3-month group, the ICC2 value was 0.9 and for the 6-month group, the ICC2 value was 0.907. The ICC2 value for the combined preparatory dataset (n=20) was 0.973, indicating excellent agreement between the raters.

Another feature within each subscale of the AIMS, a ‘window,’ is established with the least mature item providing the floor and the most mature item providing the ceiling of the window. All items in the subscale that are below the floor are given the value ‘credited.’ All items above the ceiling are given the value ‘above ceiling.’ Within the window, items can have values of observed/not observed. The AIMS scoring algorithm assigns a value of 0 for ‘above ceiling’ and ‘not observed,’ and a value of 1 for ‘credited’ and ‘observed.’ The four subscale scores are added to get to the cumulative motor score. The total score is, thus, computed based on the four ‘positional scores’ (previous items credited + observed items within the window).

It is impractical, at least in the near term and without a large corpus of training data, to develop a ML system that can identify all 58 AIMS items from videos. We must, instead, rely upon an automated system that can identify only a subset of the items. Conceptually, this restriction leads to the removal of the construct of a scoring ‘window’ which requires the observer to examine each AIMS item individually.

While a traditional scoring window is important to the structure of the AIMS as an assessment tool and removing the scoring window limits the validity of the tool for assessment use, its value as a screener may still be achievable with acceptable validity and reliability given other inherent item limitations of the AIMS. That is, documented ceiling effects, misfit items, and the non-linear difficulty of items [14] may mitigate the impact of examining each item individually, depending on the items selected. It should be noted that it was necessary to retain the bottom of the scoring window in several scenarios when scoring items individually, since infants may not demonstrate less mature items once more mature items have been mastered, an issue noted by the AIMS authors as one that requires clinical judgment [3]. The application of coding rules was required to credit these less mature items as ‘observed’ when the more mature item was ‘observed.’ For example, once the infant achieves the ability to be left alone in sitting (Sitting without Arm Support 1), then all prior sitting items that require support would have been achieved, even if not ‘observed’ (except for ‘Pull to Sit’ which requires evaluator handling and has benefits as a standalone screening mechanism) [10, 11].

Use of a scoring window may also limit the feasibility of attaining a complete score within a well-child visit or community screening setting. Though the AIMS is designed to be administered in an ‘unobtrusive environment with…no arbitrary stimuli’ [3, p. 30], such environments are rarely true for those settings where an infant is most likely to be screened; unfamiliar settings not conducive to eliciting the robust variety of movements of which the infant is capable. The AIMS authors identify “the temperature of the surroundings, restrictive clothing, noise level, and lighting (as environmental constraints which) “may affect an infant’s motor behaviors” [3, p. 11]. Instructions allow for the AIMS to be paused and the infant to be comforted or soothed if necessary, but such conditions could either extend the time required to detect all items within a window, or increase the likelihood of missing items—another benefit of a screening mechanism that relies on salient items detected in the home rather than attempting to elicit an infant’s complete repertoire in a medical office or unfamiliar community setting.

Data Analysis and Modeling Methods

We trained Support Vector Machine Regressors (SVR) [12] to predict AIMS scores using all 58 items as well as a salient set of 15 items. The choice of SVR over simpler techniques (e.g., linear regression) was made due to SVR’s ability to alleviate overfitting, which could be an issue given the smaller datasets, in the context of supervised machine learning, under consideration (this work is reported on 102 subjects). The salient sets were determined either through a recursive feature elimination (RFE) method or exhaustive search. RFE is a method of choice in reducing feature sets used in training an ML algorithm. RFE expects the feature subset size to be specified, then works by searching for the subset of features that, when used in a ML model, best predict outcomes (in our case, this would be the regressed AIMS motor score). However, given the dataset size (i.e., 102 subjects) and the feature set (58 items), it is also possible to arrive at the salient subsets by performing an exhaustive search. The performance of the regressors trained on these salient sets were quantified using correlation and the R2 metric, the latter of which measures the accuracy of the SVR in predicting the AIMS motor score. These salient sets can be used to prioritize generation of ground-truth samples from videos toward training ML models that can detect these items automatically from observational videos, leading to full automation of motor score estimation.

SVR Regressors

We used a Support Vector Machine Regressor (SVR) with the Radial Basis Function kernel [12] to regress the total AIMS score from the individual motor item features. We used the Scikit-learn python package for all our analyses. Since the AIMS subscales are scored independently (in addition to a composite score), we developed four SVRs – one for each subscale (Prone, Supine, Sitting, and Standing) – with each one computing a subscale score, with the total AIMS score determined as a sum of the four predictions. Next, we describe the process of constructing and analyzing the regressors and determining salient subsets of motor items using all subject data (n=102).

Each subscale regressor is trained on a randomly generated training set (90% of total samples) and then tested on the remaining 10%. This process is termed a ‘trial,’ with 200 such trials conducted for each experiment to generate subscale score predictions, where the true subscale scores are known from the assessment data in the MIMM study. The true and the predicted AIMS total scores (equal to the sum of the four subscale SVRs) are then used to compute the Pearson correlation coefficient (r) and SVR score (R2 coefficient). Since we were aiming to measure the correspondence of the ML-estimated AIMS score with the ‘true’ gold standard score, we chose Pearson coefficient as a suitable measure. In addition, we also computed Intraclass Correlation Coefficient (ICC2), to measure the correspondence between two raters (the ‘true’ and ML-estimation) that rated the AIMS scores for all the subjects. The SVR score for a regressor indicates how well the regressor performed in predicting the AIMS scores, with the best possible score being a 1.0.

Prediction Trials

The objective was to train SVRs on the subject data (n=102) and analyze their performance in terms of their ability to predict AIMS scores. For each prediction experiment, the objective is to train four subscale regressors (SVRs) to predict subscale scores and evaluate their performance. A subscale SVR will take as its input, training data of form (X:y), where ‘X’ is the AIMS item information in that subscale and ‘y’ the human-evaluated subscale score. As an example, in the Prone subscale, ‘X’ will be a vector of length 21 (representing 21 individual prone items) and ‘y’ will be the human-evaluated Prone subscale score. An additional point to re-emphasize is that the 21 components of X are represented individually and as binary values: Observed or Not Observed. The concept of a scoring ‘window’ (which assumes one of four potential values: previously credited, observed, not observed, above ceiling) is not possible within an automated ML system that can only verify if an item was observed. Therefore, any value other than ‘observed’ is converted to ‘not observed.’ Finally, not all 21 components of the Prone subscale are used to train the Prone SVR – only the salient items in the Prone subscale, the means of determination of which is addressed in the next section.

Method of Combining & Selecting the Initial Test Set

Based on our extensive experience of coding and discerning AIMS items from observational videos, our trained evaluators had difficulty distinguishing between several closely-paired motor items within a subscale—Supported Standing 2 and Supported Standing 3, for example, a quandary also presented in Bartlett & Piper’s [13] examination of mothers’ ability to accurately assess their infants using a maternal version of the AIMS. Coding the movements from video data, we had the ability to slow motor movements down and progress them frame-by-frame. This revealed far greater detail than what is available to a live evaluator, but also introduced the dilemma that an infant may pass into (and out of) more than one single pose/movement and containing aspects that meet the descriptive criteria of both the more and less mature item which was undetected in live scoring. This confounded our inter-rater reliability and led to an additional question: Can closely resembling motor items be combined without affecting our system’s ability to predict motor scores accurately?

We selected an initial set of items to combine which were found to be difficult to distinguish by even trained observers and/or those identified in existing literature on AIMS item analysis as affected by higher rates of misfit or ‘noise’ [14]. Combined items included those described in AIMS as with or without rotation, or items with granular qualities distinguishing subtle levels of achievement which contribute to AIMS’ value as an assessment tool but limit its utility in screening applications (Figure 1b). A source of confusion even among trained developmental experts, and impossible to distinguish by the parent [13], these items are a limiting factor in adoption of the AIMS as a parent screening tool and would pose similar problems for an ML system as well.

Figure 1b.

Figure 1b

SVR Performance on All Ages Group (n=102) using Salient Set of 15 Items

Differences between these hard-to-discern items are described in the AIMS manual [3] to assist the human evaluator in distinguishing the factors differentiating a more mature item from a less mature item. Our decision to combine items may further reduce the validity of the salient set. However, these descriptions were meant to be observed in live (synchronous) assessment. Differentiating these items becomes even easier to perform in video analysis where it is possible to tag the specific video frames that demonstrate achievement of the item (i.e., trunk rotation as a condition of ‘Reach with Rotation in Sitting’) but this also presents additional confounding variables affecting item discrimination [17]. The AIMS item descriptions do not provide specific guidance as to ‘how much,’ ‘how long,’ or ‘how often’ these conditions must be met to distinguish the less mature item from the more mature. In most cases, AIMS descriptors use modal auxiliaries such as ‘can,’ ‘must,’ ‘may,’ and ‘some’ which adds ambiguity as to whether and when conditions have been ‘met’ in the virtual environment. Thus, using asynchronous video, the medium on which the SVR was trained, it is possible for an item that was not observed in live assessment to be detected/observed in video, though only briefly—granularity that was not intended in the original AIMS. Further, motor milestones developed by the Centers for Disease Control and Prevention [CDC] do not differentiate between these qualities of maturation in parent-facing screening tools. (That is, the CDC’s four-month motor milestone stating “Pushes up onto elbows/forearms when on tummy [16] could equally be ascribed to any or all of the AIMS items expected at or near that age including Prone Prop, Forearm Support (1), Prone Mobility, or Forearm Support (2).) The decision to combine these hard-to-discern items was made based on a) the necessity to reduce the number of items needed to train the ML; b) differentiating our use of the salient set as a screening mechanism as distinct from the full AIMS as a validated assessment; and c) our aim to adjunct (not replace) existing developmental surveillance tools and mechanisms.

Method of Selecting the Regressor

We designed two different methods to determine salient subsets in each subscale of the AIMS catalog of motor items. The first one uses recursive feature elimination (RFE) [15] and the second one uses an exhaustive search. For either method of determining salient sets, the number of salient items we seek is required as input.

Gathering infant movement data from the assessments of 102 infants, we created a training library of over 15,000 video snippets featuring the 58 AIMS motor items. The number of salient items sought in each subscale is influenced by factors such as the age group to be targeted, the number of differentiated items in the subscale, and the ease of automation in the extraction of the items from videos. If the number of items we seek in the salient set is fixed as Prone (4), Supine (4), Sitting (5), and Standing (2), we can then employ either of the methods to determine the salient set and then conduct our experiments.

The RFE method, (given the number of salient items sought and an estimator (the SVR)), seeks to recursively consider smaller and smaller sets of items (up to the number of items sought) using the estimator to assign ‘importance’ to each item, eliminating the least important item at each step. This process is repeated recursively until the desired number of items is reached.

The RFE method is appropriate when then number of items to consider is large (measured in the 100s or 1000s) or, as in our case, when we are first attempting to determine a suitable number of salient items in each subscale. Once this is established, we can switch to an exhaustive search method to reveal the specific items to include in that set. If we grid search for items in Prone (4), Supine (4), Sitting (5), and Standing (2), we can exhaustively evaluate all prone combinations containing 4 items (21C4=5985), all supine combinations containing 4 items (9C4=126), all sitting combinations containing 5 items (12C5=792), and all standing combinations containing 2 items (16C2=120) for the given dataset. For each subscale, we can evaluate the SVR score that indicates how good that salient set is in determining the associated subscale score. We can then pick the best salient set in a subscale that achieves the best SVR score or use a histogram to determine the set from the top-10 or top-20 best performing combinations.

Results

SVR Performance of All Ages, All Items

Using four SVRs for the ‘All Ages’ group (n=102), incorporating all 58 items and removing the ‘window’ where four potential item values (credited; observed; not observed; above ceiling) condense to two (observed and not observed), yielded an impressive R2 = 0.972, r = 0.986, and ICC2 = 0.985 (Fig. 1a). Some outliers are apparent in Fig 1a. The SVR score (0.972 in this case) takes this into account as it measures the absolute agreement between the true and predicted scores. The SVR coefficient, R2, is defined as (1-u/v), where u is the residual sum of squares, sum(scoretrue-scorepred)2, and v is the total sum of squares, sum(scoretrue-truescoremean)2.

Figure 1a.

Figure 1a

SVR Performance on All Ages Group (n=102) using all 58 AIMS items

Performance of an Abridged Set

To address our initial questions of a) whether an abridged set of items can acceptably screen for motor delay in the first year of life and b) the identity of motor items comprising that set, the four SVRs were employed to examine the ‘All Ages’ group (n=102) and a subset of 15 salient items were algorithmically selected via the exhaustive search method, yielding a performance of R2 = 0.95, r = 0.975, and ICC2 = 0.974 (Figure 1b). Shown below are the items comprising the salient set.

  • Prone (4 items): Prone Prop, Extended Arm Support, Reach from Extended Arm Support, Reciprocal Creeping 2

  • Supine (4 items): Supine Lying 3, Supine Lying 4, Hands to Knees, Rolling Supine to Prone with Rotation

  • Sitting (5 items): Unsustained Sitting, Weight Shift in Unsustained Sitting, Sitting Without Arm Support 1, Sitting to Four-Point Kneeling, Sitting Without Arm Support 2

  • Standing (2 items): Pulls to Stand with Support, Cruising without Rotation

Number of Items Comprising the Salient Set

The number of salient items sought in each subscale is influenced by numerous factors, including a) the differentiated and distinguishable items in that subscale; b) the ease of automation in terms of an ML algorithm’s ability to detect an item from video; c) the balance between conciseness – that can enable ease of administration and scoring – and diminishing returns in motor score prediction accuracy as we add more salient items.

While the team is aiming to keep the total salient set concise at between 12–15 total items (of the full 58), the subscale SVRs allow adjustment to include more or fewer salient items within each subscale. Using recursive feature elimination (RFE) to automate selection of the salient items, we determined that adding more items to the salient set does not dramatically improve the results (Figure 1c). It should be noted that while the final tally of salient items (Table 1) is 15 (supine:4, prone:4, sitting:5, standing:2), some of the salient items include multiple combined subscale items. In the prone subscale, for instance, there were 8 salient items, some of which were combined. Looking at Figure 1C, it is clear that the SVR performance did not benefit significantly from inclusion of additional prone items.

Fig 1c.

Fig 1c

Improvement in motor score prediction in the prone subscale as a function of the number of salient items. We chose a 4-item prone salient set after consideration of the diminishing returns in SVR performance across different age groups.

Table 1.

Items Comprising the Salient Set, Including Combined Items

Subscale Notationa Items
Prone PN03 Prone Prop
PN04 Forearm Support 1
PN06 Forearm Support 2
PN07 Extended Arm Support
PN08 Rolling Prone to Supine without Rotation
PN12 Rolling Prone to Supine with Rotation
PN17 Reciprocal Creeping 1
PN21 Reciprocal Creeping 2
Supine SU03 Supine Lying 3
SU04 Supine Lying 4
SU05 Hands to Knees
SU07 Hands to Feet
SU08 Rolling Supine to Prone without Rotation
SU09 Rolling Supine to Prone with Rotation
Sitting ST02 Sitting with Propped Arms
ST03 Pull to Sit
ST04 Unsustained Sitting
ST06 Unsustained Sitting without Arm Support
ST08 Sitting without Arm Support 1
ST09 Reach with Rotation in Sitting
ST12 Sitting with Arm Support 2
Standing SD04 Pulls to Stand with Support
SD05 Pulls to Stand/Stands
SD07 Cruising without Rotation
SD10 Cruising with Rotation

Note. This table indicates the items comprising the salient set: Prone (4), Supine (4), Sitting (5), and Standing (2).

Row delineations show combined items. If the infant demonstrates any one of the age-appropriate items within the combination, the item is credited.

a

Pose notations first described by Liao & Campbell [14].

Effects of Item Combination on SVR Performance

By combining hard-to-discern motor items, we aimed to build concise salient sets that are easier to score while not compromising the accuracy of the trained SVRs in estimating true motor scores. The combining of similar items (Table 1), however, led to better accuracy in predicting total AIMS motor scores. When these items are not combined, the SVR performance degrades, though not substantially so, with R2 decreasing from 0.95 to 0.904 and r decreasing from 0.975 to 0.95.

Filtering Results by Age Groups

The SVRs trained on all subjects (n=102) perform well in predicting true motor scores (R2 = 0.95, r = 0.975, visualized in Figure 1b). The performances deteriorate when subjects are split by age range. In the <5 month group (n=34), we obtained R2 = 0.72, r = 0.86; and in the 5–8 month group (n=29), we obtained R2 = 0.608, r = 0.878. While observing that there is lack of sufficient samples to support training SVRs in these age groups, it also bears mentioning the paucity of standing items in the latter age group. Supported Standing 3 (SD03) is achieved by 50% of infants at 4.5 months while Pulls to Stand with Support (SD04) – the very next item in the AIMS Standing subscale – is achieved by 50% of infants at 8.25 months (Piper & Darrah, 1994, p. 149). This lack of relatively even spacing of standing items across the age ranges is identified as a ‘gap’ also noted by Liao & Campbell [14] when examining the item structure of the AIMS. This yields a significantly low SVR performance in the Standing subscale (0.134), though it bears only minimal effect on the total SVR. Towards the end of the first year, SVR performances improve (8–11 months: R2 = 0.866, r = 0.944), more closely approaching the All Ages results (R2 = 0.95, r = 0.975). We should note that while we have selected a 15-item salient set to handle all age groups, when sufficient data for the various age groups are available, it might be useful to design SVRs for each age group, with different salient sets of varying sizes. For example, for the 8–11 month age group (Group 3), we might select a salient set that has fewer supine items (and more prone items) as the infants in this age group will have likely attained all supine items. The methodology outlined in this article for salient set size determination and item selection can be utilized to customize the SVRs for specific age groups.

Refining the Salient Set

To examine the SVR regressor’s performance, we developed a web app (Appendix A) which enabled the team to access the tool remotely from standard web browsers. The app allowed for customization of several experimentation factors such as the number of items to include in each subscale, the individual items comprising each subset, and the method of regression (recursive feature elimination (RFE) or exhaustive search), which enabled the team to examine the effects of such changes on the performance statistics of the SVR regressors.

With the salient set size fixed at 15, we were able to determine the identity of the subscale items using the exhaustive search method. With this as the starting point, we refined this set using the web app in an iterative manner. Based on our extensive experience coding the 102 videos, by multiple observers, we decided on the set of closely resembling items to be combined. The final salient set was selected and examined for its SVR performance (Table 1).

Discussion

In answer to our primary question, our results indicate that a salient set of AIMS motor items can acceptably predict motor delay in infants over the course of their first year and up to 15-months of age. The salient set retains a very high SVR and correlation coefficient well within the acceptable range for population screening, on par with instruments in current use, and using fewer items, and ultimately with substantial time savings.

Although a specific set of items was identified, analyses determined that multiple salient subsets produce comparable results. Using the Prone subscale, which has the largest number of potential items, if we obtain a 4-item salient set of Prone items, there are 5985 unique 4-item combinations that can be created. For each one of those 4-item sets, a score can be computed (R2 coefficient) that ranges from −1 to +1 and indicates how good that 4-item set is at predicting the subscale score. A score of 1 would indicate the best determination of variance. What can be clearly seen in Appendix B is that there is no single winner. This was found to be the case for each of the subscales in AIMS; no one fixed set is more predictive of the total score than any other. This allows for flexibility in modifying the items that comprise the salient set, with the data-driven approaches (exhaustive search or RFE) providing a very strong starting point. If ML automation is a priority, items that are not frequently observed in assessment sessions (e.g., Squat SD16) can be substituted for those that are common (e.g., Cruising SD10 and SD07).

Activities that are seen in age-specific regressors vary by motor activities expected of the age (see Appendices CE). Recall that development of the regressor required binary coding of the item (observed/not observed). Further examination of the regressor’s selection of salient items in each age group reveals a ‘window’ within the salient set. That is, the regressor’s automated selection of a salient set in each age range includes a low target (achieved by 90% of infants at the bottom end of the age range), one or more expected targets for infants within the age range, and a ‘stretch’ target (observed in fewer than half of the infants at the upper bound of the age group). This is a pattern that was reproduced in all age-specific regressors. Thus, it appears that it may be the window itself (the lowest-observed and the highest-observed items), rather than the specific individual items within the window that lend relevance to the total score for the purposes of motor screening (versus assessment).

Increasing the number of observed items in each subscale did not materially change the SVR (only in the hundredths place and above). That is, our 15-item salient set appears to be on par with a 24-item salient set, producing diminishing returns on the approach to all 58 items. This is valuable since an evaluator may have little control over how many items are observed within a screening session, factors often affected by the infant’s temperament, the evaluator or caregiver’s facilitation and handling, and the affordances of the environment itself.

Even beyond a high and highly correlative SVR score, it is important that clinicians are confident that a salient set used in screening adequately captures items that are representative of an infant at a variety of ages. To accomplish this, it was necessary to minimize any gaps in the spread of selected items (beyond the gaps presented by the AIMS scale itself) by selecting items that are detectable across a range of ages (in months), particularly those ages that coincide with the American Medical Association’s recommended schedule of well-child visits and align with the age-expected achievement of motor milestones. This accounts for the shift from an exclusively SVR-selected abridged set to a clinically informed salient set that demonstrates greater clinical utility.

While the AIMS was constructed to examine infants between 0–18 months of age, several authors cited here have suggested that it is most useful in the differentiation of infants between 3–12 months of age. Indeed, none of the earliest-appearing items were determined to be salient (i.e., Prone Lying 1 or 2, Supine Lying 1 or 2, Sitting with Support, and all three Supported Standing items). Elimination of these items from the final set did not impact the set’s statistical performance, and accounts for the absence of 0–3-month items as well as the small number of standing items selected in the final set. (A younger infant would be scored ‘not observed.’) As other authors have stated [14], other assessment tools may better screen for motor skills in the child beyond 12-months of age (Peabody Developmental Motor Scales, Bayley Scales of Infant Development, etc.). For those youngest infants, a tool that examines general movements may be more advantageous than capturing the earliest emergence of volitional movements in the 0–3-month age range.

As a primary limitation, since the regression analysis presented has only been performed on a data sample, the salient sets remain clinically untested which could account for the high R2 values; a validation set is an important next step in research. Additional limitations (as noted in Methods) include the variations from a standardized administration of AIMS including scoring items individually (without the use of a scoring window) and combining similar items, both of which will reduce the validity from a standardized administration. Further limiting any ML research, larger data sets provide additional training of the system. Our ML tool will benefit from additional subjects added to the 102 infants presented here, including a greater number of infants in the 8-to-11-month-old, and 11-to-15-month-old cohorts which contained fewer participants as recruitment became more difficult as the infants approached one year of age. With a goal of screening in the natural home environment, it is a limitation that the majority of videos used in this study have been filmed in relatively stable clinic environments. To handle the complexity of backgrounds, clothing, lighting, etc., characteristic of a varied home environment, further refinements have to be made to the infant pose estimator that automates extraction of infant keypoint landmarks from videos. Additional future research includes performance of the salient set as an asynchronous screening from parent-acquired home video, and development of our ML tool to extract the salient items from longitudinal video of the infant in these more complex natural environments.

Conclusion

The identification of salient sets can serve as guideposts for the motor items we would like to prioritize for automated extraction from the video produced in the child’s natural environment. Results indicate it is best for us to use the exhaustive search method to obtain their identities. Perhaps because of a fair amount of redundancy built into the design of the Alberta Infant Motor Scale (or any developmental assessment scale, for that matter), we can attain acceptable accuracy levels in motor score estimation from salient subsets obtained through the methods described.

Symmetry, muscle tone, and the synergy of movement are critical to building a complete picture of normal development. We cannot determine typicality from motor achievement as a factor of age, alone. However, the potential for this work to enhance the insights provided by parent-reported motor actions is significant. Screening with salient sets may provide a streamlined pathway in primary care and developmental screening across community settings, expanding our reach to vulnerable and underserved children and families.

Highlights.

  • Proposes a 15-item ‘salient set’ of AIMS items for use in screening applications

  • Computer vision automation requires training on an abridged number of motor skills

  • Abridged set of items open asynchronous screening from home and improves efficiency

  • Proposed tool can improve access and compliance in developmental review/monitoring

Acknowledgements

The authors would like to acknowledge the contributions of the Eunice Kennedy Shriver National Institute of Child Health and Human Development SBIR grants HD095783 and MH107063, Paul Bergmann (Foresight Logic), Mara Modayur RN (Early Markers), the University of Washington Autism Center, Kindering, and the children and families that participated in the MIMM infant study.

Funding:

This work was supported by the National Institutes of Health (NICHD; Award #R44HD095783) and the National Institute of Mental Health (NIMH; Award #R43MH107063).

Appendix A. Web App Designed to Determine Salient Sets

graphic file with name nihms-1876836-f0004.jpg

Note. In its current configuration, the web app provides results when salient sets are extracted from an exhaustive search. The user has selected ‘All Ages’ and the number of items in each subscale.

Appendix B. Performance of All Potential 4-Item Salient Sets in the Prone Subscale

graphic file with name nihms-1876836-f0005.jpg

Note. Each dot indicates the R2 score of a unique 4-item Prone subscale salient set. There are 5985 potential salient combinations. As shown, there is not one highest-performing salient set. This was true amongst all subscales.

Appendix C. Salient Items for Group 1: < 5 mo.

Subscale Salient Items
Prone PN02 Prone Lying 2
PN06 Forearm Support 2
PN08 Rolling Prone to Supine without Rotation
PN09 Swimming
Supine SU04 Supine Lying 4
SU05 Hands to Knees
SU06 Active Extension
SU07 Hands to Feet
Sitting ST01 Sitting with Support
ST04 Unsustained Sitting
ST05 Sitting with Arm Support
ST11 Sitting to Four-Point Kneeling
ST12 Sitting without Arm Support 2
Standing SD02 Supported Standing 2
SD03 Supported Standing 3

graphic file with name nihms-1876836-f0006.jpg

Appendix D. Salient Items for Group 2: 5 ≤ 8 mo

Subscale Salient Items
Prone PN12 Rolling Prone to Supine with Rotation
PN13 Fourpoint Kneeling
PN17 Reciprocal Creeping 1
PN19 Reach with Extended Arm Support
Supine SU06 Active Extension
SU07 Hands to Feet
SU08 Rolling Supine to Prone without Rotation
SU09 Rolling Supine to Prone with Rotation
Sitting ST04 Unsustained Sitting
ST05 Sitting with Arm Support
ST06 Unsustained Sitting without Arm Support
ST07 Weight Shift in Unsustained Sitting
ST09 Reach with Rotation in Sitting
Standing SD03 Supported Standing 3
SD09 Controlled Lowering

graphic file with name nihms-1876836-f0007.jpg

Appendix E. Salient Items for Group 3: 8 < 11 mo

Subscale Salient Items
Prone PN06 Forearm Support 2
PN07 Extended Arm Support
PN15 Reciprocal Crawling
PN16 Four-point Kneeling to Sit-Half-Sit
Supine SU03 Supine Lying 3
SU07 Hands to Feet
SU08 Rolling Supine to Prone without Rotation
SU09 Rolling Supine to Prone with Rotation
Sitting ST03 Pull to Sit
ST06 Unsus Sitting without Arm Support
ST10 Sitting to Prone
ST11 Sitting to Fourpoint Kneeling
ST12 Sitting without Arm Support
Standing SD06 Supported Standing with Rotation
SD08 Half Kneeling

graphic file with name nihms-1876836-f0008.jpg

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Jimenez ME, Barg FK, Guevara JP, Gerdes M, Fiks AG The impact of parental health literacy on the early intervention referral process. J. of Healthc. for the Poor and Underserved 2013;24(3):1053–62. 10.1353/hpu.2013.0141 [DOI] [PubMed] [Google Scholar]
  • 2.Bodnarchuk JL, Eaton WO Can parent reports be trusted?: Validity of daily checklists of gross motor milestone attainment. J. of Appl. Dev. Psychol 2004;25(4):481–90. 10.1016/j.appdev.2004.06.005 [DOI] [Google Scholar]
  • 3.Piper MC, Darrah J Motor Assessment of the Developing Infant: Elsevier; 2021. [Google Scholar]
  • 4.Mason AN, Haer B, Lively A, Parsemain C A review of Alberta Infant Motor Scale (AIMS). Crit. Rev. in Phys. and Rehabil. Med. 2018;30(3):255–8. 10.1615/CritRevPhysRehabilMed.2018028941 [DOI] [Google Scholar]
  • 5.Aimsamrarn P, Janyachareon T, Rattanathanthong K, Emasithi A, Siritaratiwat W Cultural translation and adaptation of the Alberta Infant Motor Scale, Thai version. Early Hum. Dev 2019;130:65–70. 10.1016/j.earlhumdev.2019.01.018 [DOI] [PubMed] [Google Scholar]
  • 6.Ko J, Lim HK Reliability study of the items of the Alberta Infant Motor Scale (AIMS) using Kappa analysis. Int. J. of Environ. Res. and Public Health 2022;19(3):1767. 10.3390/ijerph19031767 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Boonzaaijer M, van Dam E, van Haastert IC, Nuysink J Concurrent validity between live and home video observations using the Alberta Infant Motor Scale. Pediatric Phys. Ther 2017;29(2):146. 10.1097/PEP.0000000000000363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Boonzaaijer M, Van Wesel F, Nuysink J, Volman MJM, Jongmans MJ, Leerstoel J, et al. A home-video method to assess infant gross motor development: Parent perspectives on feasibility. BMC Ped. 2019;19(1):392– 10.1186/s12887-019-1779-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Veena KS, Shyamilee S, Padmanabhan K, Sudhakar S, Aravind S, Samuel AJ, et al. Reliability of Alberta Infant Motor Scale using recorded video observations among the preterm infants in India: A reliability study. Online J. of Health & Allied Sci. : OJHAS 2017;16(3). https://www.ojhas.org/issue63/2017-3-3.html [Google Scholar]
  • 10.Flanagan JE, Landa R, Bhat A, Bauman M Head lag in infants at risk for autism: A preliminary study. Amer. J. of Occ. Ther 2012;66(5):577–85. 10.5014/ajot.2012.004192 [DOI] [PubMed] [Google Scholar]
  • 11.Pineda RG, Reynolds LC, Seefeldt K, Hilton CL, Rogers CL, Inder TE Head lag in infancy: What is it telling us? Amer. J. of Occ Ther. 2016;70(1):7001220010p1–p8. 10.5014/ajot.2016.017558 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chang C-C, Lin C-J LIBSVM: A library for support vector machines. ACM Transactions on Intell. Systems & Technol. 2011;2(3):1–27. https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf [Google Scholar]
  • 13.Bartlett DJ, Piper MC Mothers’ difficulty in assessing the motor development of their infants born preterm: Implications for intervention. Pediatric Phys. Ther 1994;6:55–60. 10.1097/00001577-199400620-00002 [DOI] [Google Scholar]
  • 14.Liao PJ, Campbell SK Examination of the item structure of the Alberta Infant Motor Scale. Pediatric Phys Ther. 2004;16(1):31–8 10.1097/01.PEP.0000114843.92102.98 [DOI] [PubMed] [Google Scholar]
  • 15.Guyon I, Weston J, Barnhill S, Vapnik V Gene selection for cancer classification using support vector machines. Machine Learn. 2002;46(1–3):389–422. 10.1023/A:1012487302797 [DOI] [Google Scholar]
  • 16.Centers for Disease Control and Prevention [CDC](2022, November 29). Learn the signs. Act early https://www.cdc.gov/ncbddd/actearly/milestones/milestones-4mo.html
  • 17.Boonzaaijer M, van Dam E, van Haastert IC, & Nuysink J (2017). Concurrent validity between live and home video observations using the Alberta Infant Motor Scale. Pediatr Phys Ther, 29, 146–151. 10.1097/PEP.0000000000000363 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES