E-Sports Training System Based on Intelligent Gesture Recognition

Hui Li; Yao Lu; Hongqiao Yan

doi:10.1155/2022/2689949

. 2022 Jul 9;2022:2689949. doi: 10.1155/2022/2689949

E-Sports Training System Based on Intelligent Gesture Recognition

Hui Li ¹, Yao Lu ¹, Hongqiao Yan ^1,^✉

PMCID: PMC9288316 PMID: 35855795

Abstract

In order to improve the effect of e-sports training, this paper combines the intelligent gesture recognition technology to construct an e-sports training system and judges the training effect of players through the recognition of players' gestures. Moreover, this paper studies the commonly used feature extraction algorithms and proposes an improved SLC-Harris feature extraction algorithm, and the feasibility of this algorithm is verified by the experimental results on the EUROC data set. In addition, this paper uses the KLT optical flow algorithm to track the extracted feature points and calculates the pure visual pose through epipolar geometry, triangulation, and PnP algorithms. The experimental research results show that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

1. Introduction

The reason why e-sports can become a sports competition is that it is closely related to the progress of society, the development of science and technology, and the spiritual and cultural needs of the people. Although there are countless people who enjoy this high-tech intellectual sports event, in fact, public opinion instills the harmful opinion of e-sports in people intentionally or unintentionally. Some media reported extensively that some students were addicted to games and could not extricate themselves, wasting their youth and studies, which made e-sports become “electronic heroin” that everyone shouted. The huge pressure of public opinion makes e-sports face severe survival pressure, and it is difficult for enterprises to enter this market justifiably. Moreover, athletes can only be called “players,” and their treatment cannot be compared with that of ordinary athletes. At the same time, the majority of fans can only engage in e-sports secretly. In addition, in the face of huge pressure from public opinion, it is difficult for the government to guide and supervise confidently, and sometimes, it has to prohibit escrow. The ban on television broadcasting of e-sports competitions can be described as a huge obstacle to the normal development of the current social discrimination against the e-sports sports industry.

Generally speaking, the development of e-sports is not yet mature, and the development of e-sports is still in its infancy [1], which is manifested in many aspects: the public recognition is not enough, there are few related large-scale events, there is no professional-scale operation, there is less research in this area, and so on [2]. Especially on college campuses, although students have more time for self-discipline than before, the school does not pay enough attention to e-sports, and there is no relatively formal organization and management of participants, which has led to many human resources problems waste [3].

In order to cater to the trend of e-sports development, vigorously develop e-sports business, improve the overall level of e-sports, and enable e-sports activities to develop well in colleges and universities, the current primary task is to deepen the characteristics of students participating in e-sports activities [4]. Among them, the analysis and research on the current situation, development trend, and participation significance of e-sports in colleges and universities are particularly important in order to discover the problems existing in the development of campus e-sports and put forward reasonable suggestions for the development [5].

As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. It can also make the younger generation have the awareness of abiding by the rules in the process of participating in e-sports [6]; trained participants have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the lives of participants. Many colleges and universities have successively opened related majors in e-sports. Although e-sports is popular in the world, the related research and guiding theories on how to cultivate e-sports talents are rare [7].

Different scholars have different views on the attributes and characteristics of e-sports. Literature [8] proposed that “e-sports include three basic characteristics: one is electronics, the other is competitive sports, and the third is a confrontation between people. At the same time, e-sports sports are divided into virtualized e-sports sports and fictionalized sports.” Literature [9] pointed out that “the most fundamental characteristics of video games that distinguish them from other artificial games are: virtual environment, absence of the body and artificial intelligence,” emphasizing the main position of electronic communication technology in e-sports. Scholar Yang Fang believes that “e-sports should return to the essence of games, and games to competitive sports are based on the evolution trend of play-game-competitive sports” and based on the development process of traditional competitive sports, puts forward a plan for the development of e-sports. Jia Peng and Yao Jiaxin believe that e-sports has great characteristics: the diversity of functional structure requirements, the full expansion of self-awareness, the complexity of sports information pattern recognition, the agility of information processing efficiency, and the accuracy of intuitive thinking and decision-making. Sex analyzes and clarifies the various attributes of e-sports from many aspects [10].

The discussion on the attributes of e-sports is still going on. Based on the current research, it can be determined that the two essential attributes of e-sports are electronic interaction and confrontational competition. Without electronic interaction, it becomes traditional competitive sports; it becomes a video game, so the two are interdependent and indispensable. With the development of electronic interaction technology, various forms of e-sports have emerged [11].

Event services are mainly engaged in e-sports referees, coaches, club operation and management, game commentary, data and tactical analysis, and so on. Practitioners need to have data analysis capabilities, management capabilities, and commentary capabilities. The production and broadcast of the event include content production and external dissemination, mainly involving the design of live content and promotion plans, venue layout, equipment debugging, video data collection, postprocessing, background data analysis, and so on. The practitioners should have journalism, communication, broadcasting, TV technology, and other related professional abilities [12].

Since the e-sports industry is an emerging industry, most employees are not from e-sports majors and have not received a complete and systematic e-sports theoretical education, but nearly 90% of the employees believe that the e-sports industry needs prejob training [13]. Judging from the current situation of the development of the entire industry, it is undoubtedly the most attractive option to work for game manufacturers, but it is difficult for game manufacturers to absorb more human resources without major business adjustments. Therefore, the need to train practitioners in support organizations around e-sports events becomes more obvious [14]. For example, training content production capabilities (reporters, screenwriters, copywriters, and anchors) requires a professional background in journalism and communication; training event support capabilities (coaches, data analysts, nutritionists, and brokers) requires sports and information technology. Professional background: training public relations, marketing capabilities (products, business, brand marketing, and media), requires a professional background in marketing and management [15].

E-sports self-media is still a media, and you must have the ability to report news, or you can dig deep into a vertical field, such as specializing in video commentary of games, specializing in game clearance strategies, specializing in sharing game skills, and so on. After all, hot spots can bring traffic. WeMedia is a personalized media with social attributes; it communicates with users; and it has its own distinct character orientation [16]. To be a self-media, you should also have strong analytical skills and be able to interpret a topic or special event from a unique or professional perspective. Current e-sports professional ability training pathways.

1.1. Current E-Sports Professional Ability Training Pathways

Most training institutions in society position themselves as training professional players but basically lack training resources. Training institutions do not have coaches, data analysts, or club managers, and it is difficult for the trained people to find a suitable position in the e-sports circle. Rather than cultivating professional skills, it is better to make money from e-sports hot spots. Money has no intention or inability to contribute to the development of the e-sports industry [17].

At present, the main e-sports talents are cultivated by e-sports companies and e-sports clubs. The club mainly trains professional players, coaches, and data analysts in order to achieve better results in the league. Game companies train referees, game developers, commentators, and other related talents to ensure the healthy development of the e-sports industry [18]. An analysis of the revenue structure of the e-sports industry can help us see the e-sports industry more transparently. The truly profitable institutions are still game manufacturers, which continuously create market value through development and operation. In the context of the continuous development and popularization of the video game industry, competition has become a starting point for expanding influence and creating new commercial value. The comprehensive development of competitive value is inseparable from the promotion of surrounding formats, and new jobs such as video, live broadcast, and commentary emerge in an endless stream [19].

This paper combines the intelligent gesture recognition technology to construct an e-sports training system and uses the player's gesture recognition to judge the player's training effect to improve the e-sports training effect.

2. Intelligent Gesture Recognition

2.1. Gesture Intelligent Positioning

The structural framework of the gesture autonomous localization algorithm is shown in Figure 1.

Gesture positioning algorithm structural framework.

Monocular visual-inertial odometry uses a pure camera in the front end for motion estimation. The algorithm firstly extracts the features of the image information collected by the camera, then uses the optical flow method to track the feature points, and finally uses PnP (Perspective-n-Point) to perform motion estimation on the tracked feature points. Then, the algorithm eliminates the mismatched point pairs through random sampling consistency (RANSAC) and uses nonlinear optimization to optimize the pose. The front-end process is shown in Figure 2.

2.2. SLC-Harris Feature Extraction

The feature is the digital expression of the object in the image, and the image can be quantitatively analyzed by extracting the feature. Commonly used feature extraction methods mainly include SIFT algorithm, SURF algorithm, and ORB algorithm.

The traditional Harris algorithm calculates the angular responsivity as shown below. It is mainly based on the weighted summation of the squared and multiplied gradients of all pixels in the window.

\begin{matrix} R = |C| - k \times {(trace (C))}^{2} . \end{matrix}

(1)

Among them, there are

\begin{matrix} |C| = λ_{1} \times λ_{2}, \end{matrix}

(2)

\begin{matrix} trace (C) = λ_{1} + λ_{2} . \end{matrix}

(3)

In formula (1), k is a constant ranging from 0.04 to 0.06, and both λ₁ and λ₂ in formula (2) represent eigenvalues.

For a grayscale image, the value of any point (x, y) in the integral image ii (x, y) refers to the sum of all grayscale values from the upper left corner of the image to the area where this point is located, as shown in Figure 3.

The calculation formula of pixels in the rectangular window is as follows:

\begin{matrix} i i (x, y) = \sum_{x^{'} \leq x, y^{'} \leq y} I (x^{'}, y^{'}) . \end{matrix}

(4)

The most complex calculation in the Harris algorithm is the calculation of diagonal responsivity. The original calculation method causes the calculation overlap between each pixel in the integration window, resulting in high computational complexity. For this, the gradient values in g_x², g_y² and g_xg_y are used to integrate the image to speed up the calculation of the angular responsivity. The calculation formula is as follows:

\begin{matrix} i i_{x x} (x, y) = \sum_{x^{'} \leq x, y^{'} \leq y} g_{x}^{2} (x^{'}, y^{'}), \\ i i_{y y} (x, y) = \sum_{x^{'} \leq x, y^{'} \leq y} g_{y}^{2} (x^{'}, y^{'}), \\ i i_{x y} (x, y) = \sum_{x^{'} \leq x, y^{'} \leq y} g_{x} (x^{'}, y^{'}) g_{y} (x^{'}, y^{'}) . \end{matrix}

(5)

Efficient nonmaximum suppression (E-NMS) is used to efficiently extract unique feature locations for each corner region, and the region thresholds are compared using image patches instead of pixels. The principle is shown in Figure 4.

2.3. KLT Optical Flow Tracking

After the key points are extracted, the optical flow method is used to calculate the minimum photometric error by establishing an error model. This method does not need to calculate descriptors or feature point matching, which will greatly save the amount of calculation.

The basic idea of LK optical flow is to assume that the optical flow in the local neighborhood of a pixel is invariant, and based on this assumption, construct a least-squares problem about the optical flow of the neighborhood pixels.

First, it is assumed that the light intensity of the pixel in each frame of the image is constant. According to this, for the pixel located at (x, y) at time t, moving to (x + dx, y + dy) at time t + dt, there are

\begin{matrix} I (x, y, t) = I (x + d x, y + d y, t + d t) . \end{matrix}

(6)

Then, according to another basic assumption of LK optical flow, the displacement of pixels in adjacent images is small; the Taylor expansion of formula (6) is

\begin{matrix} I (x + d x, y + d y, t + d t) \approx I (x, y, t) + \frac{\partial I}{\partial x} d x + \frac{\partial I}{\partial y} d y + \frac{\partial I}{\partial t} d t . \end{matrix}

(7)

Combining the above formulas and dividing by dt into both sides of the formula, we get:

\begin{matrix} \frac{\partial I}{\partial x} \frac{d x}{d t} + \frac{\partial I}{\partial y} \frac{d y}{d t} = - \frac{\partial I}{\partial t}, \end{matrix}

(8)

where dx/dt represents the motion speed of the pixel on the x-axis, dy/dt represents the motion speed of the pixel on the y-axis, and the two speeds are recorded as u and v, respectively. At the same time, ∂I/∂x represents the gradient value of the image in the x-axis direction at the pixel point; ∂I/∂y represents the gradient value in the y-axis direction at the pixel point; and ∂I/∂t represents the derivative value of the image in the t direction, which are denoted as I_x, I_y, and I_t, respectively. Therefore, formula (8) can be written in matrix form as follows:

\begin{matrix} [\begin{matrix} I_{x} & I_{y} \end{matrix}] [\begin{matrix} u \\ v \end{matrix}] = - I_{t} . \end{matrix}

(9)

Finally, according to the third basic assumption of the LK optical flow method, adjacent pixels in the same image plane have similar motion; a w × w size window is defined. According to the same motion of all pixels in the window, w² formulas can be listed; the overdetermined formulas can be constructed; and the motion parameters of the center point can be obtained by the least square method. Accordingly, its formula can be expressed as follows:

\begin{matrix} (\begin{matrix} \sum I_{x} I_{x} & \sum I_{x} I_{y} \\ \sum I_{x} I_{y} & \sum I_{y} I_{y} \end{matrix}) (\begin{matrix} u \\ v \end{matrix}) = - (\begin{matrix} \sum I_{x} I_{t} \\ \sum I_{y} I_{t} \end{matrix}) . \end{matrix}

(10)

Each image frame is downsampled by pyramid layering, and multilevel pyramids are established.

\begin{matrix} I^{L} (x, y) = \frac{1}{4} I^{L - 1} (2 x, 2 y) + \frac{1}{8} (\begin{matrix} I^{L - 1} (2 x - 1,2 y) + I^{L - 1} (2 x + 1,2 y) \\ + I^{L - 1} (2 x, 2 y - 1) + I^{L - 1} (2 x, 2 y + 1) \end{matrix}) \\ + \frac{1}{16} (\begin{matrix} I^{L - 1} (2 x + 1,2 y + 1) + I^{L - 1} (2 x + 1,2 y + 1) + \\ I^{L - 1} (2 x + 1,2 y + 1) + I^{L - 1} (2 x + 1,2 y + 1) \end{matrix}), \end{matrix}

(11)

where L represents the Lth layer image.

The algorithm calculates the value of the bottom layer from top to bottom according to the Gaussian pyramid and calculates the pixel value near the edge of the image based on the following formulas:

\begin{matrix} I^{L - 1} (- 1, y) \approx I^{L - 1} (- 1, y), \\ I^{L - 1} (x, - 1) \approx I^{L - 1} (x, 0), \\ I^{L - 1} (n_{x}^{L - 1}, y) \approx I^{L - 1} (n_{x}^{L - 1} - 1, y), \\ I^{L - 1} (x, n_{y}^{L - 1}) \approx I^{L - 1} (x, n_{y}^{L - 1} - 1), \\ I^{L - 1} (n_{x}^{L - 1}, n_{y}^{L - 1}) \approx I^{L - 1} (n_{x}^{L - 1} - 1, n_{y}^{L - 1} - 1) . \end{matrix}

(12)

The camera motion pose is estimated using SFM in the vision front end. For a monocular camera, the camera pose can be estimated by the geometric relationship between two points in different locations in real space and their projected points on their respective imaging planes. As shown in Figure 5, P is any point in the three-dimensional space, and its coordinates are [X, Y, Z]^T; O₁ and O₂ are the optical centers of the two camera positions. p₁ and p₂ are the projection points of point P on the imaging plane I₁ and the imaging plane I₂, respectively. According to the pixel positions of the two matched point pairs p₁ and p₂, the essential matrix E and the fundamental matrix F can be obtained.

According to the camera imaging model, we assume that K is the camera internal parameter matrix, and R and t represent the rotation matrix and translation vector from plane I₁ to plane I₂, and the following formula can be obtained:

\begin{matrix} s_{1} p_{1} = K P, s_{2} p_{2} = K (R P + t) . \end{matrix}

(13)

Homogeneous coordinate transformation and normalization between 2D and 3D, we can get

\begin{matrix} x_{1} = K^{- 1} p_{1}, x_{2} = K^{- 1} p_{2}, \end{matrix}

(14)

where x₁ and x₂ represent the coordinates of pixels p₁ and p₂ in the normalized plane, respectively. The algorithm combines formuls (13) and (14) and multiplies by x₂^Tt^∧ to obtain the essential matrix E and the fundamental matrix F, which can be sorted out as follows:

\begin{matrix} p_{2}^{T} K^{- T} t^{\land} R K^{- 1} p_{1} = 0, \\ E = t^{\land} R, F = K^{- T} E K^{- 1}, \end{matrix}

(15)

where t^∧ represents the antisymmetric matrix.

When there are more than eight sets of point pairs such as p₁ and p₂, the eight-point method can be used to construct a linear formula system for the simplified formula, and then the unique solution of R and t can be obtained.

\begin{matrix} x_{2}^{T} E x_{1} = p_{2}^{T} F p_{1} = 0. \end{matrix}

(16)

When the monocular camera recovers the pose through the epipolar geometric relationship, the obtained translation is the normalized value, which has no practical significance. In order to obtain the depth information on feature points, triangulation needs to be introduced. We assume that s₁ and s₂ represent the depth of the two feature points; we can get

\begin{matrix} s_{2} x_{2} = s_{1} R x_{1} + t . \end{matrix}

(17)

The feature point depth values x₂^∧ and x₁∧ can be obtained by left-multiplying formula (17) by s₁ or s₂, respectively, as follows:

\begin{matrix} s_{1} x_{2}^{\land} R x_{1} + x_{2}^{\land} t = 0, \\ s_{2} x_{1}^{\land} R x_{2} - x_{1}^{\land} t = 0. \end{matrix}

(18)

When the positions of multiple points in space are known, the camera pose can be estimated by the PnP algorithm. Common PnP algorithms include P3P, DLT, and BA optimization. Among them, the P3P algorithm is the most common method. The algorithm needs to know at least three points and their projection points on the camera imaging plane. Then, the camera pose can be estimated by solving the relationship between point pairs according to the similar triangle principle and the cosine theorem. A schematic diagram of the P3P relationship is shown in Figure 6.

The coordinate system convention is as follows: the world coordinate system is represented by (·)^w, and (·)^b and (·)^c represent the IMU coordinate system and the camera coordinate system, respectively. The relationship between the coordinate systems is shown in Figure 7. (.)^v represents the visual reference frame in the sliding window, which is independent of the IMU measurement and can represent any frame in the visual structure. (.)_b^w represents the transformation from the IMU coordinate system to the world coordinate system; b_k represents the IMU frame of the kth image; (·)_c^v represents the transformation from the camera coordinate system to the visual reference system; and c_k represents the camera frame of the kth image. $\hat{(\cdot)}$ represents the measured value and parameter estimation value of the sensor; $\bar{(\cdot)}$ represents the latest scale parameter of the sliding window; and the rotation can be represented by the rotation matrix R and the quaternion q. g^w=[0,0, g]^T represents the gravity vector in the world coordinate system, and g^v represents the gravity vector in the visual reference coordinate system.

2.4. IMU Preintegration

The sampling frequency of the camera used in this paper is 20 Hz, and the sampling frequency of the IMU is 200 Hz. It can be seen that the frequency of the IMU is much higher than that of the image. In order to avoid the repeated integration phenomenon caused by the frequency change of the visual frame optimization state caused by the high sampling rate of the IMU, a preintegration technique is used for all IMU sampling data between two image key frames. Furthermore, inertial measurements between adjacent image key frames are aggregated into a relative motion constraint through a preintegration technique. The principle of preintegration is shown in Figure 8.

In Figure 8, from top to bottom are the time scale line, the number of image frames generated, the number of image key frames generated, the number of IMU samples, and the IMU preintegration value.

The measurement error of the system is mainly affected by bias random walk b and white noise η, and other errors such as the Markov process are ignored. Then, the measurement model of the accelerometer and gyroscope in the IMU can be expressed by the following formula:

\begin{matrix} {\hat{ω}}^{b} = ω^{b} (t) + b_{ω} + η_{ω} \frac{1}{2}, \\ {\hat{a}}^{b} = q_{b}^{w^{T}} (a^{w} - g^{w}) + b_{a} + η_{a}, \end{matrix}

(19)

where ${\hat{ω}}^{b}, {\hat{a}}^{b} (t), ω^{b} (t)$ , and a^w(t) represent the measured value and real value of angular velocity and acceleration, respectively; b_ω, b_a, η_ω, and η_a represent the random walk noise and measurement white noise of angular velocity and acceleration, respectively; and q_b^{w^T} is the rotation matrix transformed from the world coordinate system to the IMU coordinate system.

White noise obeys a Gaussian distribution, that is, η_a ~ N(0, σ_a²), η_ω ~ N(0, σ_ω²). The derivative of random walk noise also obeys the Gaussian distribution, that is, η_{b_a} ~ N(0, σ_{b_a}²), η_{b_ω} ~ N(0, σ_{b_ω}²).

The differential kinematic formulas for P, V, Q (representing the position, velocity, and rotation expressed in quaternions, respectively) versus time can be written as follows:

\begin{matrix} {\dot{p}}_{b_{t}}^{w} = v_{t}^{w}, \\ {\dot{v}}_{t}^{w} = a_{t}^{w}, \\ {\dot{q}}_{b_{t}}^{w} = q_{b_{t}}^{w} \otimes [\begin{matrix} 0 \\ \frac{1}{2} ω^{b_{t}} \end{matrix}], \end{matrix}

(20)

where ⊗ represents quaternion multiplication.

Through the above derivative relationship, the position, velocity, and rotation at time k + 1 can be obtained from the position, velocity, and rotation at time k and by integrating the measured values of the IMU over time Δt_k. The continuous integration formula for PVQ is as follows:

\begin{matrix} p_{b_{k + 1}}^{w} = p_{b_{k}}^{w} + v_{b_{k}}^{w} Δ t_{k} + \int \int_{t \in [k, k + 1]} [R_{t}^{w} ({\hat{a}}_{t} - b_{a_{t}}) - g^{w}] d t^{2}, \\ v_{b_{k + 1}}^{w} = v_{b_{k}}^{w} + \int_{t \in [k, k + 1]} [R_{t}^{w} ({\hat{a}}_{t} - b_{a_{t}}) - g^{w}] d t, \\ q_{b_{k + 1}}^{w} = q_{b_{k}}^{w} \otimes \int_{t \in [k, k + 1] 1 / 2} Ω ({\hat{ω}}_{t} - b_{w_{t}}) q_{t}^{b_{k}} d t, \end{matrix}

(21)

where ${\hat{a}}_{t}$ and ${\hat{ω}}_{t}$ represent the acceleration and angular velocity measured in the IMU coordinate system, respectively. Δt_k represents the time difference from the kth frame to the k + 1 frame. R_t^w represents the rotation matrix from the world coordinate system to the IMU coordinate system. Because the measured ${\hat{a}}_{t}$ and ${\hat{ω}}_{t}$ belong to the IMU coordinate system, in order to transform the IMU measured value to the world coordinate system, the rotation matrix needs to be left-multiplied. Ω(ω) means quaternion right multiplication; ω_x means antisymmetric matrix in quaternion multiplication (ω means the imaginary part value of quaternion). We assume that the quaternion is $q = [\begin{matrix} x & y & z & s \end{matrix}] = [\begin{matrix} ω & s \end{matrix}]$ ; then we have

\begin{matrix} Ω (ω) = [\begin{matrix} - ω_{\times} & ω \\ - ω^{T} & 0 \end{matrix}], \\ ω_{\times} = [\begin{matrix} 0 & ω_{z} & ω_{y} \\ ω_{z} & 0 & - ω_{x} \\ - ω_{y} & ω_{x} & 0 \end{matrix}] . \end{matrix}

(22)

By observing the continuous integral formula of PVQ, it can be seen that the current state is recursively obtained from the state of the previous time, and the estimated value is constantly changing. This will cause the IMU measurements to be repropagated, causing the velocity and rotation to be reintegrated after each nonlinear optimization iteration, resulting in a higher computational cost. Therefore, the optimization variables are separated from the IMU preintegration terms of the two key frames, and the rotation matrix R_w^b_k of the world coordinate system to the IMU coordinate system can be obtained by simultaneously left-multiplying the left and right sides of the continuous integration formula of PVQ:

\begin{matrix} R_{w}^{b_{k}} p_{b_{k + 1}}^{w} = R_{w}^{b_{k}} (p_{b_{k}}^{w} + v_{b_{k}}^{w} Δ t_{k} - \frac{1}{2} g^{w} Δ t_{k}^{2}) + α_{b_{k + 1}}^{b_{k}}, \\ R_{w}^{b_{k}} v_{b_{k + 1}}^{w} = R_{w}^{b_{k}} (v_{b_{k}}^{w} - g^{w}) + β_{b_{k + 1}}^{b_{k}}, \\ q_{w}^{b_{k}} \otimes q_{b_{k + 1}}^{w} = γ_{b_{k + 1}}^{b_{k}} . \end{matrix}

(23)

The image frames b_k and b_k+1 of two consecutive moments are given, and the linear acceleration and angular velocity are preintegrated in the local coordinate system b_k to obtain

\begin{matrix} α_{b_{k + 1}}^{b_{k}} = \int \int_{t \in [k, k + 1]} R_{t}^{b_{k}} ({\hat{a}}_{t} - b_{a_{t}}) d t^{2}, \\ β_{b_{k + 1}}^{b_{k}} = \int_{t \in [k, k + 1]} [R_{t}^{b_{k}} ({\hat{a}}_{t} - b_{a_{t}})] d t, \\ γ_{b_{k + 1}}^{b_{k}} = \int_{t \in [k, k + 1] 2} (1 / 2) Ω ({\hat{ω}}_{t} - b_{ω_{t}}) γ_{t}^{b_{k}} d t, \end{matrix}

(24)

where α_{b_k+1}^b_k, β_{b_k+1}^b_k, γ_{b_k+1}^b_k represent the relative pose, velocity, and rotation constraints, respectively, and are also the relative motion of b_k+1 to b_k. It can be seen that they are only related to ${\hat{a}}_{t}$ and ${\hat{ω}}_{t}$ in b_k and b_k+1, and they have nothing to do with the initial position and velocity of coordinate system b_k.

Therefore, the preintegration formula is rediscussed, in terms of α_{b_k+1}^b_k; it is related to ${\hat{a}}_{t}$ and ${\hat{ω}}_{t}$ of the IMU; and ${\hat{a}}_{t}$ and ${\hat{ω}}_{t}$ are also variables that need to be optimized. When the bias change is small, α_{b_k+1}^b_k, β_{b_k+1}^b_k, γ_{b_k+1}^b_k are adjusted according to their first-order approximations to the bias.

\begin{matrix} α_{b_{k + 1}}^{b_{k}} \approx {\hat{α}}_{b_{k + 1}}^{b_{k}} + J_{b_{a}}^{α} δ b_{a} + J_{b_{ω}}^{α} δ b_{ω}, \\ β_{b_{k + 1}}^{b_{k}} \approx {\hat{β}}_{b_{k + 1}}^{b_{k}} + J_{b_{a}}^{β} δ b_{a} + J_{b_{ω}}^{β} δ b_{ω}, \\ γ_{b_{k + 1}}^{b_{k}} \approx γ_{b_{k + 1}}^{b_{k}} \otimes [\begin{matrix} 0 \\ \frac{1}{2} J_{b_{a}}^{γ} δ b_{ω} \end{matrix}], \end{matrix}

(25)

where J_{b_a}^α and J_{b_ω}^α are the block matrices in J_{b_k+1}^α and J_{b_a}^β and J_{b_ω}^β are the block matrices in J_{b_k+1}^β.

There are errors in the integral values of the IMU at different times, and the errors at time t are mainly related to the measured values of α_t^b_k, β_t^b_k, θ_t^b_k, b_{a_t}, and b_{w_t} at time t. The following definition is given to represent the error vector:

\begin{matrix} δ z_{t}^{b_{k}} = [\begin{matrix} δ α_{t}^{b_{k}} \\ δ β_{t}^{b_{k}} \\ δ θ_{t}^{b_{k}} \\ δ b_{a_{t}} \\ δ b_{ω_{t}} \end{matrix}] . \end{matrix}

(26)

The derivation is based on the derivative of the error term kinetic formula. First, two concepts are introduced: true and nominal, where true represents the real measurement value containing noise and nominal represents the theoretical value without noise, and δ represents the measurement error; there are

\begin{matrix} δ \dot{α} = {\dot{α}}_{true} - {\dot{α}}_{nominal}, \\ δ \dot{β} = {\dot{β}}_{true} - {\dot{β}}_{nominal} . \end{matrix}

(27)

Among them, there are:

\begin{matrix} {\dot{β}}_{true} = R_{t}^{b_{k}} [{\hat{a}}_{t} - η_{a} - b_{a_{t}} - δ b_{a_{t}} - [{\hat{a}}_{t} - b_{a_{t}}_{\times} δ θ], \\ {\dot{β}}_{nominal} = R_{t}^{b_{k}} ({\hat{a}}_{t} - b_{a_{t}}) . \end{matrix}

(28)

Combining the above formulas, we can obtain

\begin{matrix} δ \dot{β} = - R_{t}^{b_{k}} {\hat{a}}_{t} - b_{a_{t}}_{\times} δ θ - R_{t}^{b_{k}} δ b_{a_{t}} - R_{t}^{b_{k}} η_{a} . \end{matrix}

(29)

The derivation of $δ \dot{θ}$ is as follows, and according to the formula in the literature, it can be known that

\begin{matrix} q_{true} = q_{nominal} \otimes δ q . \end{matrix}

(30)

In this paper, according to the noise model and bias, we can get

\begin{matrix} δ \dot{θ} \approx - {\hat{ω}}_{t} - b_{ω_{t}}_{\times} δ θ - η_{ω} - δ b_{ω_{t}} . \end{matrix}

(31)

In summary, the derivative of the IMU measurement error term at time t can be as follows:

\begin{matrix} [\begin{matrix} δ {\dot{α}}_{t}^{b_{k}} \\ δ {\dot{β}}_{t}^{b_{k}} \\ δ {\dot{θ}}_{t}^{b_{k}} \\ δ {\dot{b}}_{a_{t}} \\ δ {\dot{b}}_{ω_{t}} \end{matrix}] = [\begin{matrix} 0 & I & 0 & 0 & 0 \\ 0 & 0 & - R_{t}^{b_{k}} {\hat{a}}_{t} - b_{a_{t}}_{\times} & - R_{t}^{b_{k}} & 0 \\ 0 & 0 & - {\hat{ω}}_{t} - b_{ω_{t}}_{\times} & 0 & - I \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] [\begin{matrix} δ α_{t}^{b_{k}} \\ δ β_{t}^{b_{k}} \\ δ θ_{t}^{b_{k}} \\ δ b_{a_{t}} \\ δ b_{ω_{t}} \end{matrix}] + c [\begin{matrix} η_{a} \\ η_{ω} \\ η_{b_{a}} \\ η_{b_{ω}} \end{matrix}] . \end{matrix}

(32)

We set $F_{t} = [\begin{matrix} 0 & I & 0 & 0 & 0 \\ 0 & 0 & - R_{t}^{b_{k}} {\hat{a}}_{t} - b_{a_{t}}_{\times} & - R_{t}^{b_{k}} & 0 \\ 0 & 0 & - {\hat{ω}}_{t} - b_{ω_{t}}_{\times} & 0 & - I \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}], G_{t} = [\begin{matrix} 0 & 0 & 0 & 0 \\ - R_{t}^{b_{k}} & 0 & 0 & 0 \\ 0 & I & 0 & 0 \\ 0 & 0 & I & 0 \\ 0 & 0 & 0 & I \end{matrix}]$ . The above formula can be simplified to

\begin{matrix} δ {\dot{z}}_{t}^{b_{k}} = F_{t} δ z_{t}^{b_{k}} + G_{t} n_{t} . \end{matrix}

(33)

According to the definition of the derivative, the prediction formula of the mean is as follows:

\begin{matrix} δ {\dot{z}}_{t}^{b_{k}} = \lim_{δ_{t} ⟶ 0} \frac{δ z_{t + δ t}^{b_{k}} - δ z_{t}^{b_{k}}}{δ_{t}}, \\ δ z_{t + δ_{t}}^{b_{k}} = (1 + F_{t} δ_{t}) δ z_{t}^{b_{k}} + (G_{t} δ_{t}) n_{t} . \end{matrix}

(34)

According to the error value at the current moment, the mean and covariance at the next moment can be predicted. The prediction formula for covariance is as follows:

\begin{matrix} P_{t + δ t}^{b_{k}} = (1 + F_{t} δ_{t}) P_{t}^{b_{k}} {(1 + F_{t})}^{T} + (G_{t} δ_{t}) Q {(G_{t} δ_{t})}^{T}, \end{matrix}

(35)

where P_t^b_k represents the initial value of the iteration and its value is zero and Q represents the diagonal covariance matrix of the noise term as follows:

According to formula (35), the iterative formula of the error term Jacobian can be obtained as follows:

\begin{matrix} J_{t + δ t} = (1 + F_{t}) J_{t}, \end{matrix}

(36)

where the iterative initial value of the Jacobian matrix J_t is I.

2.5. Sliding Window Initialization

When the camera extrinsic parameter matrix $({\bar{p}}_{b_{k}}^{v}, {\bar{p}}_{c_{k}}^{v})$ is known, the pose obtained by the initialization of the monocular camera is transformed from the visual coordinate system to the IMU coordinate system to obtain the following formula:

\begin{matrix} q_{b_{k}}^{v} = q_{c_{k}}^{v} \otimes q_{b}^{v}, \\ s {\bar{p}}_{b_{k}}^{v} + q_{c_{k}}^{v} p_{b}^{c} = s {\bar{p}}_{c_{k}}^{v}, \end{matrix}

(37)

where s is the translation factor obtained by visual initialization, which has no real information.

The pure visual initialization method lacks absolute scale information. Therefore, the value estimated by the visual SFM is aligned with the IMU after preintegration to estimate the true scale. Visual-inertial alignment initialization is mainly to solve the following problems, including the initialization of gyroscope bias, the initialization of velocity, gravitational acceleration, and scale.

The first is to initialize the gyroscope. The gyroscope bias can be obtained from two consecutive key frames with known orientations, considering two consecutive frames b_k and b_k+1 in the sliding window, and q_bk^v and q_{b_k+1}^v represent the rotations obtained from the pure visual sliding window optimization, respectively. Linearize the IMU preintegration term for the gyroscope bias and minimize the following function:

\begin{matrix} \min_{δ b_{w}} \sum_{k \in ℬ} {‖q_{b_{k + 1}}^{v - 1} \otimes q_{b_{k}}^{v} \otimes γ_{b_{k + 1}}^{b_{k}}‖}^{2} . \end{matrix}

(38)

Among them, there are:

\begin{matrix} γ_{b_{k + 1}}^{b_{k}} \approx {\hat{γ}}_{b_{k + 1}}^{b_{k}} \otimes [J_{b_{w}}^{γ} \begin{matrix} 1 \\ δ b_{w} \end{matrix}] . \end{matrix}

(39)

In formula (42), ℬ represents all the frames in the window, and the product of the two quaternions indicates that the camera rotates from the kth frame to the k + 1th frame, and the gyroscope rotates from the k + 1th frame to the kth frame, and the optimized objective function is

\begin{matrix} q_{b_{k + 1}}^{v - 1} \otimes q_{b_{k}}^{v} \otimes γ_{b_{k + 1}}^{b_{k}} = [\begin{matrix} 1 \\ 0 \end{matrix}] . \end{matrix}

(40)

The algorithm takes $\hat{γ}$ into the formula and multiplies the inverse moment ordering of the relative constraints obtained from the preintegration to the left on both sides of formula (40) and obtains by Cholesky decomposition (multiplying the transpose of J_{b_w}^γ on both sides of the formula):

\begin{matrix} J_{b_{w}}^{γ T} J_{b_{w}}^{γ} δ b_{w} = 2 J_{b_{w}}^{γ T} 2 {({\hat{γ}}_{b_{k + 1}}^{b_{k} - 1} \otimes q_{b_{k}}^{v} \otimes q_{b_{k + 1}}^{v - 1})}_{v e c} . \end{matrix}

(41)

In this way, the initial calibration value of the gyroscope bias b_w can be estimated, and then the IMU preintegration terms ${\hat{α}}_{b_{k + 1}}^{b_{k}}, {\hat{β}}_{b_{k + 1}}^{b_{k}}, {\hat{γ}}_{b_{k + 1}}^{b_{k}}$ are corrected with the new gyroscope bias.

The second is the initialization of velocity, gravitational acceleration, and scale. The initialized state vector is as follows:

\begin{matrix} χ_{I} = [v_{b_{0}}^{v}, v_{b_{1}}^{v}, \dots v_{b_{n}}^{v}, g^{v}, s], \end{matrix}

(42)

where the state vector v_{b_k}^v represents the speed of the visual coordinate system of the kth frame image, g^v represents the gravity vector in the visual coordinate system, and s is the estimated scale parameter. To sum up, the dimension of χ_I is 3(n + 1) + 3 + 1. The constraint relationship between the scale parameter and the speed of the visual SFM is as follows:

\begin{matrix} {\hat{z}}_{b_{k + 1}}^{b_{k}} = [\begin{matrix} {\hat{α}}_{b_{k + 1}}^{b_{k}} \\ {\hat{β}}_{b_{k + 1}}^{b_{k}} \end{matrix}] = H_{b_{k + 1}}^{b_{k}} χ_{I} + n_{b_{k + 1}}^{b_{k}} \approx, \\ [\begin{matrix} - q_{v}^{b_{k}} Δ t_{k} & 0 & \frac{1}{2} q_{v}^{b_{k}} Δ t^{2} & q_{v}^{b_{k}} ({\bar{p}}_{b_{k + 1}}^{v} - {\bar{p}}_{b_{k}}^{v}) \\ - q_{v}^{b_{k}} & q_{v}^{b_{k}} & q_{v}^{b_{k}} Δ t_{k} & 0 \end{matrix}] [\begin{matrix} v_{b_{k}}^{v} \\ v_{v_{k + 1}}^{g^{v}} \\ g^{v} \end{matrix}], \end{matrix}

(43)

where q_{b_k}^V, q_{b_k}^V, q_{b_k}^V are all obtained from visual SFM. q_V^b_k and q_{b_k}^V are mutually inverse matrices. The following linear least squares problem is constructed to complete the initialization of velocity, gravitational acceleration, and scale:

\begin{matrix} \min_{χ_{I}} \sum_{k \in ℬ} {‖{\hat{z}}_{b_{k + 1}}^{b_{k}} - H_{b_{k + 1}}^{b_{k}} χ_{I}‖}^{2} . \end{matrix}

(44)

2.6. Monocular Visual Inertial Coupling Nonlinear Optimization

When coupling the visual constraint value and the IMU constraint value, the data of the inertial sensor should be introduced first, and the constraint value of the IMU on the state should be added to the optimized state vector. Then, nonlinear optimization is performed within a sliding window, and all state vectors of the sliding window are as follows:

\begin{matrix} χ = [x_{0}, x_{1}, \dots x_{n}, x_{c}^{b}, λ_{0}, λ_{1}, \dots λ_{m}], \\ x_{k} = [p_{b_{k}}^{w}, v_{b_{k}}^{w}, q_{b_{k}}^{w}, b_{a}, b_{g}], k ϵ [0, n], \\ x_{c}^{b} = [p_{c}^{b}, q_{c}^{b}], \end{matrix}

(45)

where x_k represents the state of the IMU when the kth image is captured. There are n + 1 states in the sliding window, and each state contains the position, velocity, and rotation in the world coordinate system, and the IMU offsets in the IMU coordinate system. λ_m represents the inverse depth information of the mth 3D point, and x_c^b represents the external parameter from the camera to the IMU. The objective function is

\begin{matrix} \min_{χ} \{{‖r_{p} - H_{p} χ‖}^{2} + \sum_{k \in ℬ} {‖r_{ℬ} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ)‖}_{p_{b_{k + 1}}^{b_{k}}}^{2} + \sum_{(l, j) ó C} ρ ({‖r_{C} ({\hat{z}}_{l}^{c_{j}})‖}_{p_{l}}^{2} c_{j})\}, \end{matrix}

(46)

where H_p is the Huber norm, which is defined as follows:

\begin{matrix} ρ (s) = \{\begin{matrix} 1, s \geq 1, \\ 2 \sqrt{s} - 1, s < 1. \end{matrix} \end{matrix}

(47)

In formula (52), ‖·‖_P represents the Mahalanobis distance weighted by the covariance matrix P, and $\{r_{p}, H_{p}\}, r_{B} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, X)$ , and $r_{C} ({\hat{z}}_{l}^{c}, X)$ represent the marginalized prior information, the IMU measurement residual, and the visual reprojection error, respectively. ℬ is the set of all IMU measurement frames, and 𝒞 is the set of visual features in the sliding window.

According to the Gauss–Newton method, the incremental method can be used to calculate the minimum value of the Gaussian objective function, as follows:

\begin{matrix} \min_{δ X} {‖r_{ℬ} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ + Δ X)‖}_{p_{b_{k + 1}}^{b_{k}}}^{2} = \min_{δ X} {‖r_{ℬ} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ) + J_{b_{k + 1}}^{b_{k}} Δ X‖}_{p_{b_{k + 1}}^{b_{k}}}^{2}, \end{matrix}

(48)

where J_{b_k+1}^b_k is the Jacobian matrix of the error term r_ℬ with respect to all state vectors χ.

The algorithm differentiates the above formula from ΔX and then sets its derivative to 0, and the formula for the increment ΔX can be calculated as follows:

\begin{matrix} J_{b_{k + 1}}^{b_{k} T} P_{b_{k + 1}}^{b_{k} - 1} J_{b_{k + 1}}^{b_{k}} Δ X = - J_{b_{k + 1}}^{b_{k} T} P_{b_{k + 1}}^{b_{k} - 1} J_{b_{k + 1}}^{b_{k}} r_{ℬ} . \end{matrix}

(49)

The overall incremental equation of the objective function is as follows:

\begin{matrix} (H_{p} + \sum J_{b_{k + 1}}^{b_{k} T} P_{b_{k + 1}}^{b_{k} - 1} J_{b_{k + 1}}^{b_{k}} + \sum J_{l}^{c_{j}^{T}} P_{l}^{c_{j} - 1} J_{l}^{c_{j}}) Δ X = b_{p} + \sum J_{b_{k + 1}}^{b_{k} T} P_{b_{k + 1}}^{b_{k} - 1} r_{ℬ} + \sum J_{l}^{C_{j}^{T}} P_{l}^{C_{j}^{- 1}} r_{C}, \end{matrix}

(50)

where P_{b_k+1}^b_k represents the covariance of the IMU preintegrated noise term and P_l^C_j represents the visually observed noise covariance. When the noise covariance P_{b_k+1}^b_k of the IMU is larger, the inverse of P_{b_k+1}^b_k, that is, its information matrix is smaller, which means that the IMU observations are not as reliable as the visual observations. Formula (50) can be simplified to

\begin{matrix} (Λ_{p} + Λ_{B} + Λ_{C}) Δ X = b_{p} + b_{B} + b_{C}, \end{matrix}

(51)

where Λ_p, Λ_B, and Λ_C represent the Hessian matrix. Using the perturbation method to calculate, we can get

\begin{matrix} J = \frac{\partial r}{\partial X} \\ = \lim_{δ X ⟶ 0} \frac{r (X \oplus δ X) - r (X)}{δ X}, \end{matrix}

(52)

where δX represents the small disturbance of the state vector instead of the incremental ΔX, ⊕ represents the disturbance of the state vector.

The continuous preintegration formula is derived in the IMU preintegration, and the IMU residuals are as follows:

\begin{matrix} r_{ℬ} ({\hat{z}}_{b_{k + 1}}^{b_{k}}, χ) = [\begin{matrix} δ α_{b_{k + 1}}^{b_{k}} \\ δ β_{b_{k + 1}}^{b_{k}} \\ δ θ_{b_{k + 1}}^{b_{k}} \\ δ b_{a} \\ δ b_{g} \end{matrix}] \\ = [\begin{matrix} R_{w}^{b_{k}} (p_{b_{k + 1}}^{w} - p_{b_{k}}^{w} - v_{b_{k}}^{w} Δ t_{k} + \frac{1}{2} g^{w} Δ t_{k}^{2}) - α_{b_{k + 1}}^{b_{k}} \\ R_{w}^{b_{k}} (v_{b_{k + 1}}^{w} Δ t_{k} - v_{b_{k}}^{w} Δ t_{k} + g^{w} Δ t_{k}) - β_{b_{k + 1}}^{b_{k}} \\ 2 {[γ_{b_{k + 1}}^{b_{k}} \otimes q_{b_{k}}^{w} \otimes q_{b_{k + 1}}^{w}]}_{x y z} \\ b_{a_{b_{k + 1}}} - b_{a_{b_{k}}} \\ b_{ω_{b_{k + 1}}} - b_{ω_{b_{k}}} \end{matrix}] . \end{matrix}

(53)

According to the above formula, it can be seen that the optimization variables of the IMU residual mainly include the position, rotation, speed, and inertia bias at the i and j times:

\begin{matrix} [p_{b_{k}}^{w}, q_{b_{k}}^{w}], [v_{b_{k}}^{w}, b_{a_{k}}, b_{ω_{k}}], [p_{b_{k + 1}}^{w}, q_{b_{k + 1}}^{w}], [v_{b_{k + 1}}^{w}, b_{a_{k + 1}}, b_{ω_{k + 1}}] . \end{matrix}

(54)

To calculate the Jacobian matrix, perturbation is added to each optimization variable to obtain

\begin{matrix} [δ p_{b_{k}}^{w}, δ θ_{b_{k}}^{w}], [δ v_{b_{k}}^{w}, δ b_{a_{k}}, δ b_{ω_{k}}], [δ p_{b_{k + 1}}^{w}, δ θ_{b_{k + 1}}^{w}], [δ v_{b_{k + 1}}^{w}, δ b_{a_{k + 1}}, δ b_{ω_{k + 1}}] . \end{matrix}

(55)

Taking the partial derivatives for the above disturbance variables, respectively, we can get

\begin{matrix} J {[0]}^{15 \times 7} = [\frac{\partial r_{B}}{δ p_{b_{k}}^{w}}, \frac{\partial r_{B}}{δ θ_{b_{k}}^{w}}], \\ J {[1]}^{15 \times 7} = [\frac{\partial r_{B}}{δ v_{b_{k}}^{w}}, \frac{\partial r_{B}}{δ b_{a_{k}}}, \frac{\partial r_{B}}{δ b_{ω_{k}}}], \\ J {[2]}^{15 \times 7} = [\frac{\partial r_{B}}{δ p_{b_{k + 1}}^{w}}, \frac{\partial r_{B}}{δ θ_{b_{k + 1}}^{w}}], \\ J {[3]}^{15 \times 7} = [\frac{\partial r_{B}}{δ v_{b_{k + 1}}^{W}}, \frac{\partial r_{B}}{δ b_{a_{k + 1}}}, \frac{\partial r_{B}}{δ b_{ω_{k + 1}}}] . \end{matrix}

(56)

The visual residual is a reprojection error, which represents the difference between the estimated value and the observed value of the feature point in the normalized camera coordinate system. The small receiver camera used in this paper belongs to the fisheye camera model and belongs to the fisheye with a large viewing angle, so its projection on the unit sphere needs to be considered, as shown in Figure 9.

Through the unit spherical projection model illustrated in the figure above, the value of the visual residual is decomposed into two directions. The final visual residual model looks like this:

\begin{matrix} r_{c} ({\hat{z}}_{l}^{c_{j}}, χ) = {[b_{1}, b_{2}]}^{T} \cdot ({\hat{p}}_{l}^{c_{j}} - \frac{p_{l}^{c_{j}}}{‖p_{l}^{c_{j}}‖}), \\ {\hat{p}}_{l}^{c_{j}} = π_{c}^{- 1} ([\begin{matrix} {\hat{u}}_{l}^{c_{j}} \\ {\hat{v}}_{l}^{c_{j}} \end{matrix}]), \\ p_{l}^{c_{j}} = R_{c}^{b} \{R_{w}^{b_{j}} [R_{b_{i}}^{w} (R_{c}^{b} \frac{1}{λ_{l}} {\bar{P}}_{l}^{c_{i}} + p_{c}^{b}) + p_{b_{i}}^{w} - p_{b_{j}}^{w}] - p_{c}^{b}\}, \end{matrix}

(57)

where ${\hat{\bar{p}}}_{l}^{c_{j}}$ and p_l^c_j represent the estimated and observed coordinates of the landmark point 1 in the j-th frame image under the normalized camera coordinate system, respectively. The optimization variables of the visual residual are as follows:

\begin{matrix} [p_{b_{i}}^{w}, q_{b_{i}}^{w}], [p_{b_{j}}^{w}, q_{b_{j}}^{w}], [p_{c}^{b}, q_{c}^{b}], λ_{l}, \end{matrix}

(58)

where λ_l represents the inverse depth value when the landmark point 1 is first observed by the j-th image. The inverse depth is used as the optimization variable because the inverse depth satisfies the Gaussian distribution, and it can reduce the parameter variables in the actual optimization process. According to the above formula, by adding disturbance to each optimization variable, the following Jacobian is obtained:

\begin{matrix} J {[0]}^{3 \times 7} = [\frac{\partial r_{C}}{\partial p_{b_{i}}^{w}}, \frac{\partial r_{C}}{\partial q_{b_{i}}^{w}}], \\ J {[2]}^{3 \times 7} = [\frac{\partial r_{C}}{\partial p_{c}^{b}}, \frac{\partial r_{C}}{\partial q_{c}^{b}}], \end{matrix}

(59)

\begin{matrix} J {[3]}^{3 \times 1} = \frac{\partial r_{C}}{\partial λ_{l}} . \end{matrix}

(60)

3. E-Sports Training System Based on Intelligent Gesture Recognition

This paper combines the finger joints and the sensor in the data glove to demarcate the movement of the finger joints. This paper mainly considers the distal phalanx of the thumb (TDP) and the proximal joint proximal phalanx of the thumb (TPP) of the thumb as shown in Figure 10 and the changes in the intermediate joints middle phalanges (MP) and proximal phalanges (PP) of the remaining four fingers.

Demarcation boundaries of hand movements.

This paper combines the algorithm part of the second part to construct the e-sports training system, and the overall framework of the system is shown in Figure 11.

The simulation of the system proposed in this paper is carried out through the MATLAB platform, and the gesture recognition effect of the system and the application effect in the e-sports training system are evaluated, and the obtained results are shown in Tables 1 and 2.

Table 1.

Gesture recognition effect of the system.

Number	Gesture recognition	Number	Gesture recognition	Number	Gesture recognition
1	86.095	13	87.363	25	86.633
2	88.782	14	89.124	26	90.414
3	90.695	15	90.055	27	86.697
4	86.325	16	89.968	28	86.916
5	86.705	17	88.482	29	89.003
6	89.557	18	91.278	30	88.446
7	86.623	19	89.215	31	89.911
8	91.281	20	86.272	32	90.202
9	88.050	21	91.253	33	89.507
10	87.621	22	86.770	34	88.130
11	88.498	23	86.843	35	91.326
12	86.304	24	90.614	36	86.846

Open in a new tab

Table 2.

The application effect of the method proposed in this paper in the e-sports training system.

Number	Training effect	Number	Training effect	Number	Training effect
1	82.685	13	82.234	25	79.365
2	80.315	14	78.537	26	81.356
3	79.700	15	81.913	27	78.790
4	78.179	16	82.938	28	79.700
5	82.942	17	78.167	29	78.194
6	80.821	18	78.250	30	78.253
7	81.176	19	81.512	31	80.133
8	78.183	20	80.076	32	82.668
9	82.636	21	81.925	33	83.747
10	81.613	22	82.857	34	82.354
11	80.514	23	82.951	35	78.416
12	80.913	24	79.066	36	83.128

Open in a new tab

It can be seen from the above research that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

4. Conclusion

As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. Moreover, in the process of participating in e-sports, it can also make the younger generation have the awareness of abiding by the rules, cultivate the participants to have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the participants' lives. This paper combines the intelligent gesture recognition technology and the construction of the performance e-sports training system and judges the training effect of the players through the player gesture recognition. The research shows that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

Acknowledgments

This study was sponsored by Shandong Sport University.

Data Availability

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

1.Martins D. J. D. Q., Moraes L. C. L., Júnior W. M. COVID-19 impacts on school sports events: an alternative through E-sports. Managing Sport and Leisure . 2022;27(1-2):45–49. doi: 10.1080/23750472.2021.1928537. [DOI] [Google Scholar]
2.Yasumoto M., Teraoka T. SIGGRAPH Asia 2019 XR . New York, NY, U.S.A: ACM; 2019. Physical e-Sports in VAIR Field system; pp. 31–33. [DOI] [Google Scholar]
3.Hussain U., Yu B., Cunningham G. B., Bennett G. “I can be who I Am when I play tekken 7”: E-sports women participants from the islamic republic of Pakistan. Games and Culture . 2021;16(8):978–1000. doi: 10.1177/15554120211005360. [DOI] [Google Scholar]
4.Romanenko B. B., Pyatisotska C. C., Ashanin B. C., Yefremenko A. M. Analysis of sensorimotor abilities and nervous system properties of players of different e-sports disciplines.Scientific journal of the National Pedagogical University named after MP Drahomanov. Series 15. Scientific and pedagogical problems of physical culture (physical culture and sports) . 2021;11(143):125–131. [Google Scholar]
5.Lu W., Xu G., Gong X., Chen Q. On the interdisciplinary education among undergraduate students majoring in E-sports. Proceedings of the 2021 International Conference on Culture-oriented Science & Technology (ICCST); November 2021; Beijing, China. IEEE; pp. 63–66. [DOI] [Google Scholar]
6.Jasny M. Doping in e-sports. An empirical exploration and search for sociological interpretations. Acta Universitatis Lodziensis. Folia Sociologica . 2020;26(75):85–99. doi: 10.18778/0208-600x.75.06. [DOI] [Google Scholar]
7.Akkaya S., Gezer B. S., Kapidere M. A multi-directional assessment related to E-sports as A new game experience field and socialising tool. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi . 2021;25(3):968–988. doi: 10.53487/ataunisosbil.944458. [DOI] [Google Scholar]
8.Хaлык Д. E-sports broadcasts as a new format in journalism. Al-Farabi kazakh national university . 2019;52(2):202–209. doi: 10.26577/hj.2019.v52.i2.022. [DOI] [Google Scholar]
9.Ardianto R., Rivanie T., Alkhalifi Y., Nugraha F. S., Gata W. Sentiment analysis on E-sports for education curriculum using naive Bayes and support vector machine. Jurnal Ilmu Komputer dan Informasi . 2020;13(2):109–122. doi: 10.21609/jiki.v13i2.885. [DOI] [Google Scholar]
10.Kwag S., Lee W. J., Ko Y. D. Optimal seat allocation strategy for e‐sports gaming center. International Transactions in Operational Research . 2022;29(2):783–804. doi: 10.1111/itor.12809. [DOI] [Google Scholar]
11.Marelić M., Vukušić D., Vukušić D. E-sports: definition and social implications. Exercise and Quality of Life . 2019;11(2):47–54. doi: 10.31382/eqol.191206. [DOI] [Google Scholar]
12.Çeti̇n A., Coşkuner M. A conceptual overview of E-sports tourism as a new trend in the tourism industry. Turk Turizm Arastirmalari Dergisi . 2021;3(1):28–34. doi: 10.26677/tr1010.2021.723. [DOI] [Google Scholar]
13.Marta R. F., Prasetya A. A., Laurensia B., Stevani S., Syarnubi K. L. Imbalance identity in E-sports news intersectionality on covid-19 pandemic situation. Jurnal ASPIKOM . 2020;5(2):p. 206. doi: 10.24329/aspikom.v5i2.769. [DOI] [Google Scholar]
14.Shin D. H., Kim J. S., Kim C. W. A Study on subtitle synchronization calibration to enhance hearing-impaired persons’ viewing convenience of e-sports contents or game streamer contents. Journal of Korea Game Society . 2019;19(1):73–83. doi: 10.7583/jkgs.2019.19.1.73. [DOI] [Google Scholar]
15.Hwang J. Cheating in E-sports: a proposal to regulate the growing problem of E-doping. Northwestern University Law Review . 2022;116(5):1283–1318. [Google Scholar]
16.Ersin A., Tezeren H. C., Pekyavas N. O., et al. The relationship between reaction time and gaming time in e-sports players. Kinesiology . 2022;54(1):36–42. doi: 10.26582/k.54.1.4. [DOI] [Google Scholar]
17.Hou J., Yang X., Panek E. How about playing games as a career? The evolution of E-sports in the eyes of mainstream media and public relations. International Journal of Sport Communication . 2020;13(1):1–21. doi: 10.1123/ijsc.2019-0060. [DOI] [Google Scholar]
18.Tjokrodinata C., Bangun C. R. A., Dinansyah F., Farmita A. R. Gamers with different ability and the role of EAI (E-Sports ability Indonesia) Jurnal Interaksi: Jurnal Ilmu Komunikasi . 2022;6(1):52–66. [Google Scholar]
19.Göktaş B. Level of awareness of e-sports concept and its effect on image of sports clubs in Turkey. Uluslararası Toplumsal Bilimler Dergisi . 2019;3(1):68–85. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

[B1] 1.Martins D. J. D. Q., Moraes L. C. L., Júnior W. M. COVID-19 impacts on school sports events: an alternative through E-sports. Managing Sport and Leisure . 2022;27(1-2):45–49. doi: 10.1080/23750472.2021.1928537. [DOI] [Google Scholar]

[B2] 2.Yasumoto M., Teraoka T. SIGGRAPH Asia 2019 XR . New York, NY, U.S.A: ACM; 2019. Physical e-Sports in VAIR Field system; pp. 31–33. [DOI] [Google Scholar]

[B3] 3.Hussain U., Yu B., Cunningham G. B., Bennett G. “I can be who I Am when I play tekken 7”: E-sports women participants from the islamic republic of Pakistan. Games and Culture . 2021;16(8):978–1000. doi: 10.1177/15554120211005360. [DOI] [Google Scholar]

[B4] 4.Romanenko B. B., Pyatisotska C. C., Ashanin B. C., Yefremenko A. M. Analysis of sensorimotor abilities and nervous system properties of players of different e-sports disciplines.Scientific journal of the National Pedagogical University named after MP Drahomanov. Series 15. Scientific and pedagogical problems of physical culture (physical culture and sports) . 2021;11(143):125–131. [Google Scholar]

[B5] 5.Lu W., Xu G., Gong X., Chen Q. On the interdisciplinary education among undergraduate students majoring in E-sports. Proceedings of the 2021 International Conference on Culture-oriented Science & Technology (ICCST); November 2021; Beijing, China. IEEE; pp. 63–66. [DOI] [Google Scholar]

[B6] 6.Jasny M. Doping in e-sports. An empirical exploration and search for sociological interpretations. Acta Universitatis Lodziensis. Folia Sociologica . 2020;26(75):85–99. doi: 10.18778/0208-600x.75.06. [DOI] [Google Scholar]

[B7] 7.Akkaya S., Gezer B. S., Kapidere M. A multi-directional assessment related to E-sports as A new game experience field and socialising tool. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi . 2021;25(3):968–988. doi: 10.53487/ataunisosbil.944458. [DOI] [Google Scholar]

[B8] 8.Хaлык Д. E-sports broadcasts as a new format in journalism. Al-Farabi kazakh national university . 2019;52(2):202–209. doi: 10.26577/hj.2019.v52.i2.022. [DOI] [Google Scholar]

[B9] 9.Ardianto R., Rivanie T., Alkhalifi Y., Nugraha F. S., Gata W. Sentiment analysis on E-sports for education curriculum using naive Bayes and support vector machine. Jurnal Ilmu Komputer dan Informasi . 2020;13(2):109–122. doi: 10.21609/jiki.v13i2.885. [DOI] [Google Scholar]

[B10] 10.Kwag S., Lee W. J., Ko Y. D. Optimal seat allocation strategy for e‐sports gaming center. International Transactions in Operational Research . 2022;29(2):783–804. doi: 10.1111/itor.12809. [DOI] [Google Scholar]

[B11] 11.Marelić M., Vukušić D., Vukušić D. E-sports: definition and social implications. Exercise and Quality of Life . 2019;11(2):47–54. doi: 10.31382/eqol.191206. [DOI] [Google Scholar]

[B12] 12.Çeti̇n A., Coşkuner M. A conceptual overview of E-sports tourism as a new trend in the tourism industry. Turk Turizm Arastirmalari Dergisi . 2021;3(1):28–34. doi: 10.26677/tr1010.2021.723. [DOI] [Google Scholar]

[B13] 13.Marta R. F., Prasetya A. A., Laurensia B., Stevani S., Syarnubi K. L. Imbalance identity in E-sports news intersectionality on covid-19 pandemic situation. Jurnal ASPIKOM . 2020;5(2):p. 206. doi: 10.24329/aspikom.v5i2.769. [DOI] [Google Scholar]

[B14] 14.Shin D. H., Kim J. S., Kim C. W. A Study on subtitle synchronization calibration to enhance hearing-impaired persons’ viewing convenience of e-sports contents or game streamer contents. Journal of Korea Game Society . 2019;19(1):73–83. doi: 10.7583/jkgs.2019.19.1.73. [DOI] [Google Scholar]

[B15] 15.Hwang J. Cheating in E-sports: a proposal to regulate the growing problem of E-doping. Northwestern University Law Review . 2022;116(5):1283–1318. [Google Scholar]

[B16] 16.Ersin A., Tezeren H. C., Pekyavas N. O., et al. The relationship between reaction time and gaming time in e-sports players. Kinesiology . 2022;54(1):36–42. doi: 10.26582/k.54.1.4. [DOI] [Google Scholar]

[B17] 17.Hou J., Yang X., Panek E. How about playing games as a career? The evolution of E-sports in the eyes of mainstream media and public relations. International Journal of Sport Communication . 2020;13(1):1–21. doi: 10.1123/ijsc.2019-0060. [DOI] [Google Scholar]

[B18] 18.Tjokrodinata C., Bangun C. R. A., Dinansyah F., Farmita A. R. Gamers with different ability and the role of EAI (E-Sports ability Indonesia) Jurnal Interaksi: Jurnal Ilmu Komunikasi . 2022;6(1):52–66. [Google Scholar]

[B19] 19.Göktaş B. Level of awareness of e-sports concept and its effect on image of sports clubs in Turkey. Uluslararası Toplumsal Bilimler Dergisi . 2019;3(1):68–85. [Google Scholar]

PERMALINK

E-Sports Training System Based on Intelligent Gesture Recognition

Hui Li

Yao Lu

Hongqiao Yan

Abstract

1. Introduction

1.1. Current E-Sports Professional Ability Training Pathways

2. Intelligent Gesture Recognition

2.1. Gesture Intelligent Positioning

Figure 1.

Figure 2.

2.2. SLC-Harris Feature Extraction

Figure 3.

Figure 4.

2.3. KLT Optical Flow Tracking

Figure 5.

Figure 6.

Figure 7.

2.4. IMU Preintegration

Figure 8.

2.5. Sliding Window Initialization

2.6. Monocular Visual Inertial Coupling Nonlinear Optimization

Figure 9.

3. E-Sports Training System Based on Intelligent Gesture Recognition

Figure 10.

Figure 11.

Table 1.

Table 2.

4. Conclusion

Acknowledgments

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases