Skip to main content
Computational Intelligence and Neuroscience logoLink to Computational Intelligence and Neuroscience
. 2022 Jul 9;2022:2689949. doi: 10.1155/2022/2689949

E-Sports Training System Based on Intelligent Gesture Recognition

Hui Li 1, Yao Lu 1, Hongqiao Yan 1,
PMCID: PMC9288316  PMID: 35855795

Abstract

In order to improve the effect of e-sports training, this paper combines the intelligent gesture recognition technology to construct an e-sports training system and judges the training effect of players through the recognition of players' gestures. Moreover, this paper studies the commonly used feature extraction algorithms and proposes an improved SLC-Harris feature extraction algorithm, and the feasibility of this algorithm is verified by the experimental results on the EUROC data set. In addition, this paper uses the KLT optical flow algorithm to track the extracted feature points and calculates the pure visual pose through epipolar geometry, triangulation, and PnP algorithms. The experimental research results show that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

1. Introduction

The reason why e-sports can become a sports competition is that it is closely related to the progress of society, the development of science and technology, and the spiritual and cultural needs of the people. Although there are countless people who enjoy this high-tech intellectual sports event, in fact, public opinion instills the harmful opinion of e-sports in people intentionally or unintentionally. Some media reported extensively that some students were addicted to games and could not extricate themselves, wasting their youth and studies, which made e-sports become “electronic heroin” that everyone shouted. The huge pressure of public opinion makes e-sports face severe survival pressure, and it is difficult for enterprises to enter this market justifiably. Moreover, athletes can only be called “players,” and their treatment cannot be compared with that of ordinary athletes. At the same time, the majority of fans can only engage in e-sports secretly. In addition, in the face of huge pressure from public opinion, it is difficult for the government to guide and supervise confidently, and sometimes, it has to prohibit escrow. The ban on television broadcasting of e-sports competitions can be described as a huge obstacle to the normal development of the current social discrimination against the e-sports sports industry.

Generally speaking, the development of e-sports is not yet mature, and the development of e-sports is still in its infancy [1], which is manifested in many aspects: the public recognition is not enough, there are few related large-scale events, there is no professional-scale operation, there is less research in this area, and so on [2]. Especially on college campuses, although students have more time for self-discipline than before, the school does not pay enough attention to e-sports, and there is no relatively formal organization and management of participants, which has led to many human resources problems waste [3].

In order to cater to the trend of e-sports development, vigorously develop e-sports business, improve the overall level of e-sports, and enable e-sports activities to develop well in colleges and universities, the current primary task is to deepen the characteristics of students participating in e-sports activities [4]. Among them, the analysis and research on the current situation, development trend, and participation significance of e-sports in colleges and universities are particularly important in order to discover the problems existing in the development of campus e-sports and put forward reasonable suggestions for the development [5].

As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. It can also make the younger generation have the awareness of abiding by the rules in the process of participating in e-sports [6]; trained participants have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the lives of participants. Many colleges and universities have successively opened related majors in e-sports. Although e-sports is popular in the world, the related research and guiding theories on how to cultivate e-sports talents are rare [7].

Different scholars have different views on the attributes and characteristics of e-sports. Literature [8] proposed that “e-sports include three basic characteristics: one is electronics, the other is competitive sports, and the third is a confrontation between people. At the same time, e-sports sports are divided into virtualized e-sports sports and fictionalized sports.” Literature [9] pointed out that “the most fundamental characteristics of video games that distinguish them from other artificial games are: virtual environment, absence of the body and artificial intelligence,” emphasizing the main position of electronic communication technology in e-sports. Scholar Yang Fang believes that “e-sports should return to the essence of games, and games to competitive sports are based on the evolution trend of play-game-competitive sports” and based on the development process of traditional competitive sports, puts forward a plan for the development of e-sports. Jia Peng and Yao Jiaxin believe that e-sports has great characteristics: the diversity of functional structure requirements, the full expansion of self-awareness, the complexity of sports information pattern recognition, the agility of information processing efficiency, and the accuracy of intuitive thinking and decision-making. Sex analyzes and clarifies the various attributes of e-sports from many aspects [10].

The discussion on the attributes of e-sports is still going on. Based on the current research, it can be determined that the two essential attributes of e-sports are electronic interaction and confrontational competition. Without electronic interaction, it becomes traditional competitive sports; it becomes a video game, so the two are interdependent and indispensable. With the development of electronic interaction technology, various forms of e-sports have emerged [11].

Event services are mainly engaged in e-sports referees, coaches, club operation and management, game commentary, data and tactical analysis, and so on. Practitioners need to have data analysis capabilities, management capabilities, and commentary capabilities. The production and broadcast of the event include content production and external dissemination, mainly involving the design of live content and promotion plans, venue layout, equipment debugging, video data collection, postprocessing, background data analysis, and so on. The practitioners should have journalism, communication, broadcasting, TV technology, and other related professional abilities [12].

Since the e-sports industry is an emerging industry, most employees are not from e-sports majors and have not received a complete and systematic e-sports theoretical education, but nearly 90% of the employees believe that the e-sports industry needs prejob training [13]. Judging from the current situation of the development of the entire industry, it is undoubtedly the most attractive option to work for game manufacturers, but it is difficult for game manufacturers to absorb more human resources without major business adjustments. Therefore, the need to train practitioners in support organizations around e-sports events becomes more obvious [14]. For example, training content production capabilities (reporters, screenwriters, copywriters, and anchors) requires a professional background in journalism and communication; training event support capabilities (coaches, data analysts, nutritionists, and brokers) requires sports and information technology. Professional background: training public relations, marketing capabilities (products, business, brand marketing, and media), requires a professional background in marketing and management [15].

E-sports self-media is still a media, and you must have the ability to report news, or you can dig deep into a vertical field, such as specializing in video commentary of games, specializing in game clearance strategies, specializing in sharing game skills, and so on. After all, hot spots can bring traffic. WeMedia is a personalized media with social attributes; it communicates with users; and it has its own distinct character orientation [16]. To be a self-media, you should also have strong analytical skills and be able to interpret a topic or special event from a unique or professional perspective. Current e-sports professional ability training pathways.

1.1. Current E-Sports Professional Ability Training Pathways

Most training institutions in society position themselves as training professional players but basically lack training resources. Training institutions do not have coaches, data analysts, or club managers, and it is difficult for the trained people to find a suitable position in the e-sports circle. Rather than cultivating professional skills, it is better to make money from e-sports hot spots. Money has no intention or inability to contribute to the development of the e-sports industry [17].

At present, the main e-sports talents are cultivated by e-sports companies and e-sports clubs. The club mainly trains professional players, coaches, and data analysts in order to achieve better results in the league. Game companies train referees, game developers, commentators, and other related talents to ensure the healthy development of the e-sports industry [18]. An analysis of the revenue structure of the e-sports industry can help us see the e-sports industry more transparently. The truly profitable institutions are still game manufacturers, which continuously create market value through development and operation. In the context of the continuous development and popularization of the video game industry, competition has become a starting point for expanding influence and creating new commercial value. The comprehensive development of competitive value is inseparable from the promotion of surrounding formats, and new jobs such as video, live broadcast, and commentary emerge in an endless stream [19].

This paper combines the intelligent gesture recognition technology to construct an e-sports training system and uses the player's gesture recognition to judge the player's training effect to improve the e-sports training effect.

2. Intelligent Gesture Recognition

2.1. Gesture Intelligent Positioning

The structural framework of the gesture autonomous localization algorithm is shown in Figure 1.

Figure 1.

Figure 1

Gesture positioning algorithm structural framework.

Monocular visual-inertial odometry uses a pure camera in the front end for motion estimation. The algorithm firstly extracts the features of the image information collected by the camera, then uses the optical flow method to track the feature points, and finally uses PnP (Perspective-n-Point) to perform motion estimation on the tracked feature points. Then, the algorithm eliminates the mismatched point pairs through random sampling consistency (RANSAC) and uses nonlinear optimization to optimize the pose. The front-end process is shown in Figure 2.

Figure 2.

Figure 2

Front-end flowchart.

2.2. SLC-Harris Feature Extraction

The feature is the digital expression of the object in the image, and the image can be quantitatively analyzed by extracting the feature. Commonly used feature extraction methods mainly include SIFT algorithm, SURF algorithm, and ORB algorithm.

The traditional Harris algorithm calculates the angular responsivity as shown below. It is mainly based on the weighted summation of the squared and multiplied gradients of all pixels in the window.

R=Ck×traceC2. (1)

Among them, there are

C=λ1×λ2, (2)
traceC=λ1+λ2. (3)

In formula (1), k is a constant ranging from 0.04 to 0.06, and both λ1 and λ2 in formula (2) represent eigenvalues.

For a grayscale image, the value of any point (x, y) in the integral image ii (x, y) refers to the sum of all grayscale values from the upper left corner of the image to the area where this point is located, as shown in Figure 3.

Figure 3.

Figure 3

Rectangular window pixel calculation.

The calculation formula of pixels in the rectangular window is as follows:

iix,y=xx,yyIx,y. (4)

The most complex calculation in the Harris algorithm is the calculation of diagonal responsivity. The original calculation method causes the calculation overlap between each pixel in the integration window, resulting in high computational complexity. For this, the gradient values in gx2, gy2 and gxgy are used to integrate the image to speed up the calculation of the angular responsivity. The calculation formula is as follows:

iixxx,y=xx,yygx2x,y,iiyyx,y=xx,yygy2x,y,iixyx,y=xx,yygxx,ygyx,y. (5)

Efficient nonmaximum suppression (E-NMS) is used to efficiently extract unique feature locations for each corner region, and the region thresholds are compared using image patches instead of pixels. The principle is shown in Figure 4.

Figure 4.

Figure 4

Efficient nonmaximum suppression.

2.3. KLT Optical Flow Tracking

After the key points are extracted, the optical flow method is used to calculate the minimum photometric error by establishing an error model. This method does not need to calculate descriptors or feature point matching, which will greatly save the amount of calculation.

The basic idea of LK optical flow is to assume that the optical flow in the local neighborhood of a pixel is invariant, and based on this assumption, construct a least-squares problem about the optical flow of the neighborhood pixels.

First, it is assumed that the light intensity of the pixel in each frame of the image is constant. According to this, for the pixel located at (x, y) at time t, moving to (x + dx, y + dy) at time t + dt, there are

Ix,y,t=Ix+dx  ,y+dy  ,t+dt. (6)

Then, according to another basic assumption of LK optical flow, the displacement of pixels in adjacent images is small; the Taylor expansion of formula (6) is

Ix+dx  ,y+dy  ,t+dtIx,y,t+Ixdx+Iydy+Itdt. (7)

Combining the above formulas and dividing by dt into both sides of the formula, we get:

Ixdxdt+Iydydt=It, (8)

where dx/dt represents the motion speed of the pixel on the x-axis, dy/dt represents the motion speed of the pixel on the y-axis, and the two speeds are recorded as u and v, respectively. At the same time, ∂I/∂x represents the gradient value of the image in the x-axis direction at the pixel point; ∂I/∂y represents the gradient value in the y-axis direction at the pixel point; and ∂I/∂t represents the derivative value of the image in the t direction, which are denoted as Ix, Iy, and It, respectively. Therefore, formula (8) can be written in matrix form as follows:

IxIyuv=It. (9)

Finally, according to the third basic assumption of the LK optical flow method, adjacent pixels in the same image plane have similar motion; a w × w size window is defined. According to the same motion of all pixels in the window, w2 formulas can be listed; the overdetermined formulas can be constructed; and the motion parameters of the center point can be obtained by the least square method. Accordingly, its formula can be expressed as follows:

IxIxIxIyIxIyIyIyuv=IxItIyIt. (10)

Each image frame is downsampled by pyramid layering, and multilevel pyramids are established.

ILx,y=14IL12x,2y+18IL12x1,2y+IL12x+1,2y+IL12x,2y1+IL12x,2y+1+116IL12x+1,2y+1+IL12x+1,2y+1+IL12x+1,2y+1+IL12x+1,2y+1, (11)

where L represents the Lth layer image.

The algorithm calculates the value of the bottom layer from top to bottom according to the Gaussian pyramid and calculates the pixel value near the edge of the image based on the following formulas:

IL11,yIL11,y,IL1x,1IL1x,0,IL1nxL1,yIL1nxL11,y,IL1x,nyL1IL1x,nyL11,IL1nxL1,nyL1IL1nxL11,nyL11. (12)

The camera motion pose is estimated using SFM in the vision front end. For a monocular camera, the camera pose can be estimated by the geometric relationship between two points in different locations in real space and their projected points on their respective imaging planes. As shown in Figure 5, P is any point in the three-dimensional space, and its coordinates are [X, Y, Z]T; O1 and O2 are the optical centers of the two camera positions. p1 and p2 are the projection points of point P on the imaging plane I1 and the imaging plane I2, respectively. According to the pixel positions of the two matched point pairs p1 and p2, the essential matrix E and the fundamental matrix F can be obtained.

Figure 5.

Figure 5

Epipolar geometric constraints.

According to the camera imaging model, we assume that K is the camera internal parameter matrix, and R and t represent the rotation matrix and translation vector from plane I1 to plane I2, and the following formula can be obtained:

s1p1=KP,s2p2=KRP+t. (13)

Homogeneous coordinate transformation and normalization between 2D and 3D, we can get

x1=K1p1,x2=K1p2, (14)

where x1 and x2 represent the coordinates of pixels p1 and p2 in the normalized plane, respectively. The algorithm combines formuls (13) and (14) and multiplies by x2Tt to obtain the essential matrix E and the fundamental matrix F, which can be sorted out as follows:

p2TKTtRK1p1=0,E=tR,F=KTEK1, (15)

where t represents the antisymmetric matrix.

When there are more than eight sets of point pairs such as p1 and p2, the eight-point method can be used to construct a linear formula system for the simplified formula, and then the unique solution of R and t can be obtained.

x2TEx1=p2TFp1=0. (16)

When the monocular camera recovers the pose through the epipolar geometric relationship, the obtained translation is the normalized value, which has no practical significance. In order to obtain the depth information on feature points, triangulation needs to be introduced. We assume that s1 and s2 represent the depth of the two feature points; we can get

s2x2=s1Rx1+t. (17)

The feature point depth values x2 and x1∧ can be obtained by left-multiplying formula (17) by s1 or s2, respectively, as follows:

s1x2Rx1+x2t=0,s2x1Rx2x1t=0. (18)

When the positions of multiple points in space are known, the camera pose can be estimated by the PnP algorithm. Common PnP algorithms include P3P, DLT, and BA optimization. Among them, the P3P algorithm is the most common method. The algorithm needs to know at least three points and their projection points on the camera imaging plane. Then, the camera pose can be estimated by solving the relationship between point pairs according to the similar triangle principle and the cosine theorem. A schematic diagram of the P3P relationship is shown in Figure 6.

Figure 6.

Figure 6

Schematic diagram of P3P relationship.

The coordinate system convention is as follows: the world coordinate system is represented by (·)w, and (·)b and (·)c represent the IMU coordinate system and the camera coordinate system, respectively. The relationship between the coordinate systems is shown in Figure 7. (.)v represents the visual reference frame in the sliding window, which is independent of the IMU measurement and can represent any frame in the visual structure. (.)bw represents the transformation from the IMU coordinate system to the world coordinate system; bk represents the IMU frame of the kth image; (·)cv represents the transformation from the camera coordinate system to the visual reference system; and ck represents the camera frame of the kth image. ·^ represents the measured value and parameter estimation value of the sensor; ·¯ represents the latest scale parameter of the sliding window; and the rotation can be represented by the rotation matrix R and the quaternion q. gw=[0,0, g]T represents the gravity vector in the world coordinate system, and gv represents the gravity vector in the visual reference coordinate system.

Figure 7.

Figure 7

Coordinate conversion relationship.

2.4. IMU Preintegration

The sampling frequency of the camera used in this paper is 20 Hz, and the sampling frequency of the IMU is 200 Hz. It can be seen that the frequency of the IMU is much higher than that of the image. In order to avoid the repeated integration phenomenon caused by the frequency change of the visual frame optimization state caused by the high sampling rate of the IMU, a preintegration technique is used for all IMU sampling data between two image key frames. Furthermore, inertial measurements between adjacent image key frames are aggregated into a relative motion constraint through a preintegration technique. The principle of preintegration is shown in Figure 8.

Figure 8.

Figure 8

Principle of preintegration.

In Figure 8, from top to bottom are the time scale line, the number of image frames generated, the number of image key frames generated, the number of IMU samples, and the IMU preintegration value.

The measurement error of the system is mainly affected by bias random walk b and white noise η, and other errors such as the Markov process are ignored. Then, the measurement model of the accelerometer and gyroscope in the IMU can be expressed by the following formula:

ω^b=ωbt+bω+ηω12,a^b=qbwTawgw+ba+ηa, (19)

where ω^b,a^bt,ωbt, and aw(t) represent the measured value and real value of angular velocity and acceleration, respectively; bω, ba, ηω, and ηa represent the random walk noise and measurement white noise of angular velocity and acceleration, respectively; and qbwT is the rotation matrix transformed from the world coordinate system to the IMU coordinate system.

White noise obeys a Gaussian distribution, that is, ηa ~ N(0, σa2), ηω ~ N(0, σω2). The derivative of random walk noise also obeys the Gaussian distribution, that is, ηba ~ N(0, σba2), ηbω ~ N(0, σbω2).

The differential kinematic formulas for P, V, Q (representing the position, velocity, and rotation expressed in quaternions, respectively) versus time can be written as follows:

p˙btw=vtw,v˙tw=atw,q˙btw=qbtw012ωbt, (20)

where ⊗ represents quaternion multiplication.

Through the above derivative relationship, the position, velocity, and rotation at time k + 1 can be obtained from the position, velocity, and rotation at time k and by integrating the measured values of the IMU over time Δtk. The continuous integration formula for PVQ is as follows:

pbk+1w=pbkw+vbkwΔtk+tk,k+1Rtwa^tbatgwdt2,vbk+1w=vbkw+tk,k+1Rtwa^tbatgwdt,qbk+1w=qbkwtk,k+11/2Ωω^tbwtqtbkdt, (21)

where a^t and ω^t represent the acceleration and angular velocity measured in the IMU coordinate system, respectively. Δtk represents the time difference from the kth frame to the k + 1 frame. Rtw represents the rotation matrix from the world coordinate system to the IMU coordinate system. Because the measured a^t and ω^t belong to the IMU coordinate system, in order to transform the IMU measured value to the world coordinate system, the rotation matrix needs to be left-multiplied. Ω(ω) means quaternion right multiplication; ωx means antisymmetric matrix in quaternion multiplication (ω means the imaginary part value of quaternion). We assume that the quaternion is q=xyzs=ωs; then we have

Ωω=ω×ωωT0,ω×=0ωzωyωz0ωxωyωx0. (22)

By observing the continuous integral formula of PVQ, it can be seen that the current state is recursively obtained from the state of the previous time, and the estimated value is constantly changing. This will cause the IMU measurements to be repropagated, causing the velocity and rotation to be reintegrated after each nonlinear optimization iteration, resulting in a higher computational cost. Therefore, the optimization variables are separated from the IMU preintegration terms of the two key frames, and the rotation matrix Rwbk of the world coordinate system to the IMU coordinate system can be obtained by simultaneously left-multiplying the left and right sides of the continuous integration formula of PVQ:

Rwbkpbk+1w=Rwbkpbkw+vbkwΔtk12gwΔtk2+αbk+1bk,Rwbkvbk+1w=Rwbkvbkwgw+βbk+1bk,qwbkqbk+1w=γbk+1bk. (23)

The image frames bk and bk+1 of two consecutive moments are given, and the linear acceleration and angular velocity are preintegrated in the local coordinate system bk to obtain

αbk+1bk=tk,k+1Rtbka^tbatdt2,βbk+1bk=tk,k+1Rtbka^tbatdt,γbk+1bk=tk,k+121/2Ωω^tbωtγtbkdt, (24)

where αbk+1bk, βbk+1bk, γbk+1bk represent the relative pose, velocity, and rotation constraints, respectively, and are also the relative motion of bk+1 to bk. It can be seen that they are only related to a^t and ω^t in bk and bk+1, and they have nothing to do with the initial position and velocity of coordinate system bk.

Therefore, the preintegration formula is rediscussed, in terms of αbk+1bk; it is related to a^t and ω^t of the IMU; and a^t and ω^t are also variables that need to be optimized. When the bias change is small, αbk+1bk, βbk+1bk, γbk+1bk are adjusted according to their first-order approximations to the bias.

αbk+1bkα^bk+1bk+Jbaαδba+Jbωαδbω,βbk+1bkβ^bk+1bk+Jbaβδba+Jbωβδbω,γbk+1bkγbk+1bk012Jbaγδbω, (25)

where Jbaα and Jbωα are the block matrices in Jbk+1α and Jbaβ and Jbωβ are the block matrices in Jbk+1β.

There are errors in the integral values of the IMU at different times, and the errors at time t are mainly related to the measured values of αtbk, βtbk, θtbk, bat, and bwt at time t. The following definition is given to represent the error vector:

δztbk=δαtbkδβtbkδθtbkδbatδbωt. (26)

The derivation is based on the derivative of the error term kinetic formula. First, two concepts are introduced: true and nominal, where true represents the real measurement value containing noise and nominal represents the theoretical value without noise, and δ represents the measurement error; there are

δα˙=α˙trueα˙nominal,δβ˙=β˙trueβ˙nominal. (27)

Among them, there are:

β˙true=Rtbka^tηabatδbata^tbat×δθ,β˙nominal=Rtbka^tbat. (28)

Combining the above formulas, we can obtain

δβ˙=Rtbka^tbat×δθRtbkδbatRtbkηa. (29)

The derivation of δθ˙ is as follows, and according to the formula in the literature, it can be known that

qtrue=qnominalδq. (30)

In this paper, according to the noise model and bias, we can get

δθ˙ω^tbωt×δθηωδbωt. (31)

In summary, the derivative of the IMU measurement error term at time t can be as follows:

δα˙tbkδβ˙tbkδθ˙tbkδb˙atδb˙ωt=0I00000Rtbka^tbat×Rtbk000ω^tbωt×0I0000000000δαtbkδβtbkδθtbkδbatδbωt+cηaηωηbaηbω. (32)

We set Ft=0I00000Rtbka^tbat×Rtbk000ω^tbωt×0I0000000000,Gt=0000Rtbk0000I0000I0000I. The above formula can be simplified to

δz˙tbk=Ftδztbk+Gtnt. (33)

According to the definition of the derivative, the prediction formula of the mean is as follows:

δz˙tbk=limδt0δzt+δtbkδztbkδt,δzt+δtbk=1+Ftδtδztbk+Gtδtnt. (34)

According to the error value at the current moment, the mean and covariance at the next moment can be predicted. The prediction formula for covariance is as follows:

Pt+δtbk=1+FtδtPtbk1+FtT+GtδtQGtδtT, (35)

where Ptbk represents the initial value of the iteration and its value is zero and Q represents the diagonal covariance matrix of the noise term as follows:

According to formula (35), the iterative formula of the error term Jacobian can be obtained as follows:

Jt+δt=1+FtJt, (36)

where the iterative initial value of the Jacobian matrix Jt is I.

2.5. Sliding Window Initialization

When the camera extrinsic parameter matrix p¯bkv,p¯ckv is known, the pose obtained by the initialization of the monocular camera is transformed from the visual coordinate system to the IMU coordinate system to obtain the following formula:

qbkv=qckvqbv,sp¯bkv+qckvpbc=sp¯ckv, (37)

where s is the translation factor obtained by visual initialization, which has no real information.

The pure visual initialization method lacks absolute scale information. Therefore, the value estimated by the visual SFM is aligned with the IMU after preintegration to estimate the true scale. Visual-inertial alignment initialization is mainly to solve the following problems, including the initialization of gyroscope bias, the initialization of velocity, gravitational acceleration, and scale.

The first is to initialize the gyroscope. The gyroscope bias can be obtained from two consecutive key frames with known orientations, considering two consecutive frames bk and bk+1 in the sliding window, and qbkv and qbk+1v represent the rotations obtained from the pure visual sliding window optimization, respectively. Linearize the IMU preintegration term for the gyroscope bias and minimize the following function:

minδbwkqbk+1v1qbkvγbk+1bk2. (38)

Among them, there are:

γbk+1bkγ^bk+1bkJbwγ1δbw. (39)

In formula (42), ℬ represents all the frames in the window, and the product of the two quaternions indicates that the camera rotates from the kth frame to the k + 1th frame, and the gyroscope rotates from the k + 1th frame to the kth frame, and the optimized objective function is

qbk+1v1qbkvγbk+1bk=10. (40)

The algorithm takes γ^ into the formula and multiplies the inverse moment ordering of the relative constraints obtained from the preintegration to the left on both sides of formula (40) and obtains by Cholesky decomposition (multiplying the transpose of Jbwγ on both sides of the formula):

JbwγTJbwγδbw=2JbwγT2γ^bk+1bk1qbkvqbk+1v1vec. (41)

In this way, the initial calibration value of the gyroscope bias bw can be estimated, and then the IMU preintegration terms α^bk+1bk,β^bk+1bk,γ^bk+1bk are corrected with the new gyroscope bias.

The second is the initialization of velocity, gravitational acceleration, and scale. The initialized state vector is as follows:

χI=vb0v,vb1v,vbnv,gv,s, (42)

where the state vector vbkv represents the speed of the visual coordinate system of the kth frame image, gv represents the gravity vector in the visual coordinate system, and s is the estimated scale parameter. To sum up, the dimension of χI is 3(n + 1) + 3 + 1. The constraint relationship between the scale parameter and the speed of the visual SFM is as follows:

z^bk+1bk=α^bk+1bkβ^bk+1bk=Hbk+1bkχI+nbk+1bk,qvbkΔtk012qvbkΔt2qvbkp¯bk+1vp¯bkvqvbkqvbkqvbkΔtk0vbkvvvk+1gvgv, (43)

where qbkV, qbkV, qbkV are all obtained from visual SFM. qVbk and qbkV are mutually inverse matrices. The following linear least squares problem is constructed to complete the initialization of velocity, gravitational acceleration, and scale:

minχIkz^bk+1bkHbk+1bkχI2. (44)

2.6. Monocular Visual Inertial Coupling Nonlinear Optimization

When coupling the visual constraint value and the IMU constraint value, the data of the inertial sensor should be introduced first, and the constraint value of the IMU on the state should be added to the optimized state vector. Then, nonlinear optimization is performed within a sliding window, and all state vectors of the sliding window are as follows:

χ=x0,x1,xn,xcb,λ0,λ1,λm,xk=pbkw,vbkw,qbkw,ba,bg,kϵ0,n,xcb=pcb,qcb, (45)

where xk represents the state of the IMU when the kth image is captured. There are n + 1 states in the sliding window, and each state contains the position, velocity, and rotation in the world coordinate system, and the IMU offsets in the IMU coordinate system. λm represents the inverse depth information of the mth 3D point, and xcb represents the external parameter from the camera to the IMU. The objective function is

minχrpHpχ2+krz^bk+1bk,χpbk+1bk2+l,jóCρrCz^lcjpl2cj, (46)

where Hp is the Huber norm, which is defined as follows:

ρs=1,s1,2s1,s<1. (47)

In formula (52), ‖·‖P represents the Mahalanobis distance weighted by the covariance matrix P, and rp,Hp,rBz^bk+1bk,X, and rCz^lc,X represent the marginalized prior information, the IMU measurement residual, and the visual reprojection error, respectively. ℬ is the set of all IMU measurement frames, and 𝒞 is the set of visual features in the sliding window.

According to the Gauss–Newton method, the incremental method can be used to calculate the minimum value of the Gaussian objective function, as follows:

minδXrz^bk+1bk,χ+ΔXpbk+1bk2=minδXrz^bk+1bk,χ+Jbk+1bkΔXpbk+1bk2, (48)

where Jbk+1bk is the Jacobian matrix of the error term r with respect to all state vectors χ.

The algorithm differentiates the above formula from ΔX and then sets its derivative to 0, and the formula for the increment ΔX can be calculated as follows:

Jbk+1bkTPbk+1bk1Jbk+1bkΔX=Jbk+1bkTPbk+1bk1Jbk+1bkr. (49)

The overall incremental equation of the objective function is as follows:

Hp+Jbk+1bkTPbk+1bk1Jbk+1bk+JlcjTPlcj1JlcjΔX=bp+Jbk+1bkTPbk+1bk1r+JlCjTPlCj1rC, (50)

where Pbk+1bk represents the covariance of the IMU preintegrated noise term and PlCj represents the visually observed noise covariance. When the noise covariance Pbk+1bk of the IMU is larger, the inverse of Pbk+1bk, that is, its information matrix is smaller, which means that the IMU observations are not as reliable as the visual observations. Formula (50) can be simplified to

Λp+ΛB+ΛCΔX=bp+bB+bC, (51)

where Λp, ΛB, and ΛC represent the Hessian matrix. Using the perturbation method to calculate, we can get

J=rX=limδX0rXδXrXδX, (52)

where δX represents the small disturbance of the state vector instead of the incremental ΔX, ⊕ represents the disturbance of the state vector.

The continuous preintegration formula is derived in the IMU preintegration, and the IMU residuals are as follows:

rz^bk+1bk,χ=δαbk+1bkδβbk+1bkδθbk+1bkδbaδbg=Rwbkpbk+1wpbkwvbkwΔtk+12gwΔtk2αbk+1bkRwbkvbk+1wΔtkvbkwΔtk+gwΔtkβbk+1bk2γbk+1bkqbkwqbk+1wxyzbabk+1babkbωbk+1bωbk. (53)

According to the above formula, it can be seen that the optimization variables of the IMU residual mainly include the position, rotation, speed, and inertia bias at the i and j times:

pbkw,qbkw,vbkw,bak,bωk,pbk+1w,qbk+1w,vbk+1w,bak+1,bωk+1. (54)

To calculate the Jacobian matrix, perturbation is added to each optimization variable to obtain

δpbkw,δθbkw,δvbkw,δbak,δbωk,δpbk+1w,δθbk+1w,δvbk+1w,δbak+1,δbωk+1. (55)

Taking the partial derivatives for the above disturbance variables, respectively, we can get

J015×7=rBδpbkw,rBδθbkw,J115×7=rBδvbkw,rBδbak,rBδbωk,J215×7=rBδpbk+1w,rBδθbk+1w,J315×7=rBδvbk+1W,rBδbak+1,rBδbωk+1. (56)

The visual residual is a reprojection error, which represents the difference between the estimated value and the observed value of the feature point in the normalized camera coordinate system. The small receiver camera used in this paper belongs to the fisheye camera model and belongs to the fisheye with a large viewing angle, so its projection on the unit sphere needs to be considered, as shown in Figure 9.

Figure 9.

Figure 9

Unit spherical projection model.

Through the unit spherical projection model illustrated in the figure above, the value of the visual residual is decomposed into two directions. The final visual residual model looks like this:

rcz^lcj,χ=b1,b2T·p^lcjplcjplcj,p^lcj=πc1u^lcjv^lcj,plcj=RcbRwbjRbiwRcb1λlP¯lci+pcb+pbiwpbjwpcb, (57)

where p¯^lcj and plcj represent the estimated and observed coordinates of the landmark point 1 in the j-th frame image under the normalized camera coordinate system, respectively. The optimization variables of the visual residual are as follows:

pbiw,qbiw,pbjw,qbjw,pcb,qcb,λl, (58)

where λl represents the inverse depth value when the landmark point 1 is first observed by the j-th image. The inverse depth is used as the optimization variable because the inverse depth satisfies the Gaussian distribution, and it can reduce the parameter variables in the actual optimization process. According to the above formula, by adding disturbance to each optimization variable, the following Jacobian is obtained:

J03×7=rCpbiw,rCqbiw,J23×7=rCpcb,rCqcb, (59)
J33×1=rCλl. (60)

3. E-Sports Training System Based on Intelligent Gesture Recognition

This paper combines the finger joints and the sensor in the data glove to demarcate the movement of the finger joints. This paper mainly considers the distal phalanx of the thumb (TDP) and the proximal joint proximal phalanx of the thumb (TPP) of the thumb as shown in Figure 10 and the changes in the intermediate joints middle phalanges (MP) and proximal phalanges (PP) of the remaining four fingers.

Figure 10.

Figure 10

Demarcation boundaries of hand movements.

This paper combines the algorithm part of the second part to construct the e-sports training system, and the overall framework of the system is shown in Figure 11.

Figure 11.

Figure 11

Overall system framework.

The simulation of the system proposed in this paper is carried out through the MATLAB platform, and the gesture recognition effect of the system and the application effect in the e-sports training system are evaluated, and the obtained results are shown in Tables 1 and 2.

Table 1.

Gesture recognition effect of the system.

Number Gesture recognition Number Gesture recognition Number Gesture recognition
1 86.095 13 87.363 25 86.633
2 88.782 14 89.124 26 90.414
3 90.695 15 90.055 27 86.697
4 86.325 16 89.968 28 86.916
5 86.705 17 88.482 29 89.003
6 89.557 18 91.278 30 88.446
7 86.623 19 89.215 31 89.911
8 91.281 20 86.272 32 90.202
9 88.050 21 91.253 33 89.507
10 87.621 22 86.770 34 88.130
11 88.498 23 86.843 35 91.326
12 86.304 24 90.614 36 86.846

Table 2.

The application effect of the method proposed in this paper in the e-sports training system.

Number Training effect Number Training effect Number Training effect
1 82.685 13 82.234 25 79.365
2 80.315 14 78.537 26 81.356
3 79.700 15 81.913 27 78.790
4 78.179 16 82.938 28 79.700
5 82.942 17 78.167 29 78.194
6 80.821 18 78.250 30 78.253
7 81.176 19 81.512 31 80.133
8 78.183 20 80.076 32 82.668
9 82.636 21 81.925 33 83.747
10 81.613 22 82.857 34 82.354
11 80.514 23 82.951 35 78.416
12 80.913 24 79.066 36 83.128

It can be seen from the above research that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

4. Conclusion

As emerging sports, e-sports are mainly participated by the younger generation, which has the characteristics of younger and younger age. E-sports can exercise people's thinking ability, psychological pressure resistance, unity and cooperation, hand-eye coordination, and so on. Moreover, in the process of participating in e-sports, it can also make the younger generation have the awareness of abiding by the rules, cultivate the participants to have a fair and open, never admit defeat, pursue a stronger competitive spirit, and have a positive impact on the participants' lives. This paper combines the intelligent gesture recognition technology and the construction of the performance e-sports training system and judges the training effect of the players through the player gesture recognition. The research shows that the electronic economic training system based on intelligent gesture recognition proposed in this paper has certain effects.

Acknowledgments

This study was sponsored by Shandong Sport University.

Data Availability

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

  • 1.Martins D. J. D. Q., Moraes L. C. L., Júnior W. M. COVID-19 impacts on school sports events: an alternative through E-sports. Managing Sport and Leisure . 2022;27(1-2):45–49. doi: 10.1080/23750472.2021.1928537. [DOI] [Google Scholar]
  • 2.Yasumoto M., Teraoka T. SIGGRAPH Asia 2019 XR . New York, NY, U.S.A: ACM; 2019. Physical e-Sports in VAIR Field system; pp. 31–33. [DOI] [Google Scholar]
  • 3.Hussain U., Yu B., Cunningham G. B., Bennett G. “I can be who I Am when I play tekken 7”: E-sports women participants from the islamic republic of Pakistan. Games and Culture . 2021;16(8):978–1000. doi: 10.1177/15554120211005360. [DOI] [Google Scholar]
  • 4.Romanenko B. B., Pyatisotska C. C., Ashanin B. C., Yefremenko A. M. Analysis of sensorimotor abilities and nervous system properties of players of different e-sports disciplines.Scientific journal of the National Pedagogical University named after MP Drahomanov. Series 15. Scientific and pedagogical problems of physical culture (physical culture and sports) . 2021;11(143):125–131. [Google Scholar]
  • 5.Lu W., Xu G., Gong X., Chen Q. On the interdisciplinary education among undergraduate students majoring in E-sports. Proceedings of the 2021 International Conference on Culture-oriented Science & Technology (ICCST); November 2021; Beijing, China. IEEE; pp. 63–66. [DOI] [Google Scholar]
  • 6.Jasny M. Doping in e-sports. An empirical exploration and search for sociological interpretations. Acta Universitatis Lodziensis. Folia Sociologica . 2020;26(75):85–99. doi: 10.18778/0208-600x.75.06. [DOI] [Google Scholar]
  • 7.Akkaya S., Gezer B. S., Kapidere M. A multi-directional assessment related to E-sports as A new game experience field and socialising tool. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi . 2021;25(3):968–988. doi: 10.53487/ataunisosbil.944458. [DOI] [Google Scholar]
  • 8.Хaлык Д. E-sports broadcasts as a new format in journalism. Al-Farabi kazakh national university . 2019;52(2):202–209. doi: 10.26577/hj.2019.v52.i2.022. [DOI] [Google Scholar]
  • 9.Ardianto R., Rivanie T., Alkhalifi Y., Nugraha F. S., Gata W. Sentiment analysis on E-sports for education curriculum using naive Bayes and support vector machine. Jurnal Ilmu Komputer dan Informasi . 2020;13(2):109–122. doi: 10.21609/jiki.v13i2.885. [DOI] [Google Scholar]
  • 10.Kwag S., Lee W. J., Ko Y. D. Optimal seat allocation strategy for e‐sports gaming center. International Transactions in Operational Research . 2022;29(2):783–804. doi: 10.1111/itor.12809. [DOI] [Google Scholar]
  • 11.Marelić M., Vukušić D., Vukušić D. E-sports: definition and social implications. Exercise and Quality of Life . 2019;11(2):47–54. doi: 10.31382/eqol.191206. [DOI] [Google Scholar]
  • 12.Çeti̇n A., Coşkuner M. A conceptual overview of E-sports tourism as a new trend in the tourism industry. Turk Turizm Arastirmalari Dergisi . 2021;3(1):28–34. doi: 10.26677/tr1010.2021.723. [DOI] [Google Scholar]
  • 13.Marta R. F., Prasetya A. A., Laurensia B., Stevani S., Syarnubi K. L. Imbalance identity in E-sports news intersectionality on covid-19 pandemic situation. Jurnal ASPIKOM . 2020;5(2):p. 206. doi: 10.24329/aspikom.v5i2.769. [DOI] [Google Scholar]
  • 14.Shin D. H., Kim J. S., Kim C. W. A Study on subtitle synchronization calibration to enhance hearing-impaired persons’ viewing convenience of e-sports contents or game streamer contents. Journal of Korea Game Society . 2019;19(1):73–83. doi: 10.7583/jkgs.2019.19.1.73. [DOI] [Google Scholar]
  • 15.Hwang J. Cheating in E-sports: a proposal to regulate the growing problem of E-doping. Northwestern University Law Review . 2022;116(5):1283–1318. [Google Scholar]
  • 16.Ersin A., Tezeren H. C., Pekyavas N. O., et al. The relationship between reaction time and gaming time in e-sports players. Kinesiology . 2022;54(1):36–42. doi: 10.26582/k.54.1.4. [DOI] [Google Scholar]
  • 17.Hou J., Yang X., Panek E. How about playing games as a career? The evolution of E-sports in the eyes of mainstream media and public relations. International Journal of Sport Communication . 2020;13(1):1–21. doi: 10.1123/ijsc.2019-0060. [DOI] [Google Scholar]
  • 18.Tjokrodinata C., Bangun C. R. A., Dinansyah F., Farmita A. R. Gamers with different ability and the role of EAI (E-Sports ability Indonesia) Jurnal Interaksi: Jurnal Ilmu Komunikasi . 2022;6(1):52–66. [Google Scholar]
  • 19.Göktaş B. Level of awareness of e-sports concept and its effect on image of sports clubs in Turkey. Uluslararası Toplumsal Bilimler Dergisi . 2019;3(1):68–85. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The labeled data set used to support the findings of this study is available from the corresponding author upon request.


Articles from Computational Intelligence and Neuroscience are provided here courtesy of Wiley

RESOURCES