Abstract
Pedestrian attribute recognition (PAR) and re-identification (ReID) are important works in the area of computer vision, which are widely used in intelligent surveillance and are of great significance to the creation of smart life. The purpose of this article is to focus on organizing a review of ReID based on deep learning and analyze the associations between PAR and ReID. Firstly, we summarize the major ideas of Attribute-Assisted ReID and compare the differences in datasets and algorithmic concerns between the two areas. Secondly, we introduce a wide range of representative ReID methods. By analyzing some cutting-edge researches, we summarize their specific network structure, loss function design, and effective training tricks. Reference methods and solutions are provided for the main challenges of ReID, such as cloth-changing, domain adaptation, occlusion condition, resolution changes, etc. Finally, we conclude the performance and characteristics of the SOTA methods, obtain inspiration and prospects for future research directions, and demonstrate the effectiveness of Attribute-Assisted ReID.
Keywords: Computer vision, Pedestrian attribute recognition, Person re-identification
Computer vision; Pedestrian attribute recognition; Person re-identification
1. Introduction
Pedestrian re-identification can be considered as a research area for image retrieval, which uses computer vision to judge whether the target pedestrian is present in specified image or in consecutive frames. Given a monitored pedestrian image, the purpose is to retrieve the pedestrian image across devices. Pedestrian re-recognition technology can effectively use multiple camera views, often complementing pedestrian detection and pedestrian tracking, which is of great importance in the fields of intelligent surveillance and security.
General pedestrian re-identification can be divided into four steps, as shown in Fig. 1, including pedestrian detection, feature extraction, similarity measurement and image retrieval respectively. The input of pedestrian re-recognition is a specific pedestrian image called probe. Feature extraction is to learn the diverse features of pedestrians on different cameras. Metric learning is to map the features to a new space, so that people with similar features are clustered together. Image retrieval is to sort and reranking pictures according to the similarity distance between features.
Figure 1.

System composition of pedestrian recognition and similarity ranking method.
The traditional algorithm of pedestrian ReID is mainly based on manual design features combined with distance measurement. The research of feature extraction mainly focuses on obtaining robust and discriminative low-level visual features, including color features or histograms in RGB and HSV color spaces. [1], [2], texture features (Gabor) [3] Shape features (HOG features) [4] and key points (SIFT) [5] and so on. Metric learning finds feature mappings in different spaces, so that the feature vector of the same kind of sample is closer than the feature vector of the non-same kind of sample. Commonly used metric learning methods include Mahalanobis distance [1], local adaptive decision function [6], and saliency weighted metric learning [2]. Due to the pedestrian re-recognition problem, the scene (background, posture, etc.) changes are complex and other challenges.
The above-mentioned traditional low-level visual feature extraction methods are difficult to obtain effective and invariant features, which directly affect the accuracy and effectiveness of metric learning methods. In this paper, we mainly explore pedestrian ReID based on deep learning, which use more advanced algorithms to extract discriminative features. The preprint [7] is available at SSRN.
2. Association between APR and ReID
2.1. Attribute-Assisted re-identification
From the perspective of the composition of the ReID system, the realization of ReID requires feature extraction and then similarity measurement, attribute recognition also needs to extract pedestrians features. Attribute recognition network usually uses CNN to extract the features. Generally, the output of the model is consistent with the number of attributes needed to be recognized. The output of pedestrian ReID is generally not directly equal to the extracted features. Most methods use attribute to assist re-recognition [8], [9], [10], [11], [12] as shown in Fig. 2. Yin et al. [13] firstly presented extensive investigation on the attribute-image ReID problem under an adversarial framework.
Figure 2.

Attributes are consistent with human recognition mechanism for identifying people. [11].
One application of ReID is finding a suspect of interest in a criminal investigation. Witnesses often indicate personal traits of the suspect as seen at the moment when the crime was committed, including hair color, clothing type, etc. Based on that description, the police manually scan the entire video archive looking for a person with similar characteristics. Since people are less likely to change clothes or shave when walking in two different places in a short period of time, so attributes can be used for tracking and communication between two related cameras to achieve re-identification.
Human attributes usually stand as robust visual properties when condition changes and have been investigated in many works. This part analyzes several methods that leverage features from human attributes for person ReID. The algorithms are summarized in Fig. 4
Figure 4.

Attribute Assisted Re-Identification Algorithm.
2.1.1. Attribute attention mechanism
Attention models have been used in fine-grained recognition to crop the key parts. In Comparative Attention Network (CAN) [14], attention mechanism was used in person re-identification by integrating a recurrent attention with the Siamese model. Attribute-Aware Attention Model () [8] explored attention mechanism to select better attribute features. To generate attentions for global category features fusion and local attribute features fusion simultaneously, it consists of two components: attribute-guided attention module and category-guided attention module. The final feature is the addition of the weighted category feature and attribute feature .
2.1.2. Attribute feature fusion
Layne et al. [9] proposed Attribute Interpreted ReID (AIR) which Assigned weights to semantic attributes to serve as a representation of pedestrian features. The model learned an attribute-centric, parts-based feature representation, by selecting those attributes that are most effective for re-identification, the attribute-level information and standard low-level features are fused. Attributes-aided Part Detection and Refinement model (APDR) [11] designed an effective attribute fusion network, extracted local attribute features to handle the body part misalignment problem, which is another major challenge for ReID. Experiments show that the learned local features, along with holistic image-level feature, can further improve the accuracy on ReID task. In Fig. 3, Fariborz et al. [15] used a tensor to non-linearly fuse identity and attribute features, and then forced the parameters of the tensor in the loss function to generate discriminative fused features for ReID. Lin et al. [16] proposed the ARM module to calculate the weighting of M individual attributes, which effectively complements the global features for ID loss.
Figure 3.

Attribute based Tensor fusion diagram for Person ReID [15].
2.1.3. Attribute search / zero-shot ReID
Attributes provide a low-dimensional mid-level representation which variability in appearance. Although individual attributes vary in robustness and informativeness, they provide a strong cue for identity. Semantic attributes could potentially be used to constrain or permute a search for a particular person, for example by specifying invariance to whether or not they have removed or added a hat. Given an attribute description in the form of a binary vector, a person can be found by NN matching against the attribute profiles of each person i in the dataset [10]. In the , Parts-and-Attributes Search Framework (PASF) [17] distinguished people based on parsing the partial human body attributes, visual attributes include hair type (beards, mustaches, absence of facial hair), type of eye-wear (sunglasses, eyeglasses, absence of glasses), hair type (baldness, hair, wearing a hat), and clothing color. Face detection result was matched to its neighboring regions based on correlation, each person who leaves the scene is assigned a unique identifier (“track ID”). Lampert et al. [18] collected judgments on the “relative strength of association” between 85 attributes and 48 animal classes, showed that description of high-level attributes makes sense for person ReID on untrained images. Optimized Attribute Re-identification (OAR) [19] showed mid-level attributes can be an effective representation for ReID and zero-shot identification.
2.1.4. Attribute fine-tuning
Su et al. [12] proposed SSDAL algorithm (Semi-supervised Deep Attribute Learning) and further extended [20]. Firstly train a deep Convolutional Neural Network on an independent dataset , in which each sample is labeled with a binary attribute label. Then it was fine-tuned on the dataset with person ID labels as using triplet loss, where N, M represent sample size. The third stage fine-tuned the model using sigmoid cross entropy loss on the dataset T&U. Experiments showed that fine-tuning the model trained on PAR dataset acquires more discriminative power for ReID task. Attribute-complementary re-id net (ACRN) [21] included automatically predicted attribute features into the training stage of the model. This allowed the CNN to focus on learning information for ReID which is complementary contained in the attribute predictions. Tetsu [22] et al. improved the CNN features by conducting a fine-tuning on a pedestrian attribute dataset, proposed a loss function for classifying combination attributes to increase discriminative ability.
2.1.5. Attribute embedding / space mapping
Attribute-Consistent Matching (ACM) [23] projected the input image into a lower dimensional subspace where the matches exhibit consistency to changing illumination. Su et al. [24] studied the tracklet-to-tracklet identification, proposed a novel discriminative model to exploit low-level features and map attributes to a discriminative latent prototypes space (LPSM). MTL-LORAE framework [25] used semantic attributes in addition to low-level features, and transforms the attributes in low rank space. Adaptive Semantic Margin Regularizer (ASMR) [26] conducted person search by finding images that are closest to the query in a co-embedding space of person images and categories. Wang et al. [27] introduced TJ-AIDL and IIA Space for achieving the knowledge fusion learning on attribute and ID labels.
2.1.6. Clothing attributes assisted
Li et al. [28] studied clothing attributes assisted person reidentification, presented latent support vector machine (c-LSVM)-based person ReID approach to describes the integration of low-level, mid-level and high-level attributes, specifically, it learns the relationship between body parts, clothing information and ID labels. Yang et al. [29] presented a discriminative latent model (DLM) for joint modeling of object classes and their visual attributes. Wu et al. [30] suggested the usefulness of clothing color in surveillance scenarios, and evaluated at the pixel level and frame level respectively. Dual Attribute-aware Ranking Network (DARN) [31] could retrieve clothing items that have the same or similar attributes as a given clothing image.
2.2. Differences between attribute recognition and ReID
2.2.1. Label annotation
Person ReID and attribute recognition usually use different datasets separately to train the network, which have different annotations for the dataset labels. Datasets with pedestrian attribute tags such as the Market-1501 [32] can be used universally. What needs to be paid attention to is that the specific usage may be different for the same dataset. Specifically, PAR focuses on the correspondence between images and tags, while pedestrian ReID focuses on personal identity information, in particular, divide the train and test based on ID.
2.2.2. Image feature
Many ReID algorithms combine local and global features such as PIE [33], PDC [34], GLAD [35], or merge local features such as Spindle Net [36]. These local features may be extracted through local blocks or according to methods such as posture key points, and not necessarily attribute features. Typically, the APR [16] network combined attribute loss and ID loss, which is also a typical example of Attribute-Assisted ReID.
2.2.3. Main focus of the model
Pedestrian re-identification is to find the image of the target pedestrian in the galley data, so the model focuses on the similarity arrangement of the recognition results and the Average Precision of each query. The commonly used evaluation metrics mainly include CMC and mAP.
Attribute recognition aims to identify the attribute characteristics of pedestrian images as accurately as possible, so it focuses on the difference between the true label and the predicted value evaluate the model. Accuracy, precision, recall and f1 score are commonly used evaluation indicators.
In summary, Table 1, Table 2 shows the connection and difference between the two research fields of attribute recognition and pedestrian re-identification.
Table 1.
Comparisons of datasets and evaluations between PAR and ReID.
Table 2.
Comparisons of labels and focus between PAR and ReID.
| Comparison | ReID | PAR |
|---|---|---|
| labels | pedestrian ID | pedestrian attributes |
| Focus | arrangement of the recognition results | difference between true labels and predicted value |
3. Datasets and evaluation
3.1. ReID datasets
Pedestrian re-identification datasets can be divided into image-based and video-based datasets according to whether it is a single frame or a sequence as shown in Table 5, Table 6, respectively. The former category has Market-1501 [32], CUHK03 [40], DukeMTMC-reID [38], MSMT-17 [41], VIPeR [42], MARS [43], etc. Typical of the latter are iLIDS [44], Duke-video [45], LPW [46], etc. Besides, there are other scene-based datasets, such as PRW [47]. Table 3 and Table 4 provide the examples and statistical summaries of these datasets.
Table 5.
Summary of Image-based ReID Datasets.
| Dataset | Time | IDs | Cams | Images | Label method | Size |
|---|---|---|---|---|---|---|
| VIPeR [42] | 2007 | 632 | 2 | 1264 | Hand | 128 × 48 |
| GRID [48] | 2009 | 1025 | 8 | 1275 | Hand | Vary |
| CUHK01 [49] | 2012 | 971 | 2 | 3884 | Hand | 160 × 60 |
| CUHK02 [50] | 2013 | 1816 | 2 | 7264 | Hand | 160 × 60 |
| CUHK03 [40] | 2014 | 1360 | 2 | 13164 | DPM [51] | Vary |
| RAiD [52] | 2014 | 43 | 4 | 6920 | Hand | 128 × 64 |
| Market1501 [32] | 2015 | 1501 | 6 | 32668 | DPM | 128 × 64 |
| DukeMTMC [38] | 2017 | 1812 | 8 | 36441 | Hand | Vary |
| PKU-Reid [53] | 2016 | 114 | 2 | 1824 | Hand | 128 × 64 |
| MARS [43] | 2016 | 1261 | 6 | 20715 | DPM [51]+GMMCP [54] | 256 × 128 |
| Airport [55] | 2017 | 9651 | 6 | 39902 | ACF [56] | 128 × 64 |
| MSMT17 [41] | 2018 | 4101 | 15 | 126441 | Faster RCNN | Vary |
| PRW [41] | 2016 | 932 | 6 | 34304 | Hand | vary |
Table 6.
Summary of Video-based ReID Datasets.
Table 3.
Sample of Pedestrian Image Attribute Annotation of DukeMTMC-Attribute and Market-1501-Attribute.
| DukeMTMC-Attribute |
Market-1501-Attribute |
||
|---|---|---|---|
| attribute | label | attribute | label |
| top | 1 | hair | 2 |
| backpack | 2 | up | 2 |
| bag | 1 | down | 2 |
| handbag | 1 | clothes | 1 |
| boots | 2 | hat | 1 |
| shoes | 1 | backpack | 1 |
| upblack | 2 | bag | 1 |
| downblue | 2 | handbag | 2 |
| age | 2 | ||
| downred | 2 | ||
Table 4.
Summary of PAR Datasets.
| Dataset | Pedestrians | Attribute | Source |
|---|---|---|---|
| PETA [37] | 19000 | 61 binary and 4 multi-class | outdoor + indoor |
| RAP | 41585 | 69 binary and 3 multi-class | indoor |
| RAP2.0 | 84928 | 69 binary and 3 multi-class | indoor |
| PA-100K [39] | 10w | 26 binary | outdoor |
| Market1501-Attribute [16] | 32668 | 26 binary and 1 multi-class | outdoor |
| DukeMTMC-Attribute [16] | 34183 | 23 binary | outdoor |
As for the division of datasets, in the research of pedestrian re-identification, the datasets are usually obtained through manual annotation or detection algorithm. The pictures of pedestrians will be divided into training sets and validation sets, as well as query and gallery. After training, the model calculates the similarity for the extracted features in the query and the gallery separately. Generally, in the process of training and testing, the identity of the pedestrian is not repeatedly selected.
3.2. Attribute datasets
Compared with re-recognition datasets, attribute recognition focuses more on marking the appearance characteristics of pedestrians, such as gender, age, clothing, accessories, etc. Table 4 summarizes several commonly used datasets for attribute recognition. Fig. 5 concludes the annotation information of several APR datasets. Generally the datasets are labeled with arabic numerals, PETA [37] and PA-100K [39] using 1 and 0, existing attributes are represented by 1, labels in Market-1501 Attribute [16] and DukeMTMC Attribute [16] use 1,2,3,4 to represent each type of attribute. Table 3 gives an example of annotation.
Figure 5.
Labeling annotation of several APR datasets. (a) PA-100K dataset. (b) PETA dataset. (c) Market-1501 Attribute. (d) DukeMTMC-Attribute.
3.3. Metrics to measure ReID experiment results
To evaluate an attribute recognition system, Precision (P) and Recall (R) are two widely used measurements. Moreover, mAP and CMC are two indicators commonly used to measure re-identification and retrieval capabilities
The attribute recognition task mainly focuses on the differences between the true labels and predicted values. Both Precision (P) and Recall (R) can reflect the exact numbers of items retrieved, but they are actually different. Specifically, Precision is the number of correct information in the extracted information. Recall is the ratio of the extracted correct information to the information in all samples
3.3.1. mAP (mean Average Precision)
Because more than one person is detected in the pedestrian re-identification task, we use mAP as the final evaluation index, which reflects the property of the correct search results arranged at the top of the tested list. The area under the PR curve (Precision-Recall) is defined as AP (Average Precision). It reflects the relationship between the horizontal coordinate R and the vertical coordinate P. Furthermore, the larger the area under the PR curve, the better the model performance. The Precision rate and Recall rate are calculated as eq. (1) and eq. (2):
| (1) |
| (2) |
Among them, TP (True Positives) means the number of detection frames with IoU >0.5 (Intersection over Union, IoU); FP (False Positives) is the number of detection frames that indicate IoU ≤ 0.5, which can also be said to be the number of redundant detection frames that detect the same real label; FN (False Negatives) indicates the number of detection frames that have not detected real labels.
3.3.2. CMC (Cumulative Matching Characteristics)
Another metric is CMC (Rank-k matching accuracy). Rank-n indicates that in the descending order of similarity of recognition results, the first n results contain the target.
The recognition rate is the ratio of the number of Rank-n to the number of query samples; it is the same as the commonly used top1 err or top5 err evaluation index, but here Rank-1 recognition rate represents the correct rate rather than the error rate. The relationship between the two is eq. (3).
| (3) |
3.3.3. mINP (mean inverse negative penalty)
In passenger retrieval tasks, the ranking of the correct target in the search list should be considered, especially for multi-view data. The ranking of hard-samples further determines the measurement of model evaluation. It may happen that the AP obtained by Rank List 1 under the same CMC is better than Rank List 2, but finding all the right matches takes much more effort. To solve this problem, Ye [61] designed negative penalty (NP), in which the hardest sample with the correct match is considered.
| (4) |
where in eq. (4) represents the sort location of the hardest sample, and indicates the number of pairs matched correctly. The inverse of NP is defined as INP, the calculation of mINP is given by eq. (5).
| (5) |
3.3.4. RS (Robustness Score)
To further evaluate the robustness for retrieval objects in long time span. Huang [62] defined a Robustness Score (RS) which is defined as eq. (6):
| (6) |
where and represent the long-term and short-term re-ID accuracy respectively; represents different evaluation indexes. For a given ReID model, the closer and are to each other, the greater the value of the RS indicator.
4. Pedestrian reidentification based on deep learning
Deep learning based Pedestrian ReID including feature learning and distance metric. For the feature learning approach, the model treats ReID as a classification or validation task, rather than directly determining the similarity between images.
Feature representation learning mainly involves classification loss and verification loss. Specifically, ID is often used as the basis for classification loss. ID Embedding network (IDE) [47],[63] is a widely used baseline model. Through using ID of a pedestrian as a training label to train the model, it only needs to input one picture at a time. While the input of verification loss is a pair of pictures of pedestrians, the network learns whether they are the same person, which is equivalent to a binary classification problem. The verification model aims to learn the similarity of two pictures. Generally it uses end-to-end models such as Siamese network [64], Fig. 6 summarizes various ReID algorithms and potential future directions. In this part, we will introduce cutting-edge ReID methods mainly from the perspective of global and local features-based, video sequence-based, GAN-based and attention mechanism-based.
Figure 6.
Summary of deep learning-based ReID algorithm and its challenges and future research directions.
4.1. Global and local features-based method
4.1.1. Global features
Global feature refers to the extraction of the overall information of each picture, it does not have any spatial information. While Local feature refers to the extraction of a certain area in pedestrian image, and finally multiple local information is merged as the final feature.
In the early years of incorporating advanced deep learning methods into the area of PAR and pedestrian ReID, global feature was the well known choice. Global features can usually be extracted end-to-end, and the advantage lies in simple calculations. But the limitations include that noise regions may cause great interference to global features, and the misalignment of poses will also lead to poor global feature matching.
4.1.2. Horizontal slices
The partial feature extraction method based on deep learning includes image block [65], [66], [67]and the posture-based local feature [36],[68]. There are many blocking methods of many images, such as uniform block [65],[67]and context sensing block [69], where uniform block can be divided into grid block [65] and horizontal slice [66]. This part will introduce several levels of horizontal slice.
In Gate Siamese [70], each image is characterized by a CNN, and local features are input to the LSTM network in order, automatically express the image final feature. Deep-Person [71] regarded the pedestrian as a sequence of body parts, and apply LSTM to take into account the contextual information between body parts. AlignedReID [72] introduced Dynamically Matching Local Information (DMLI), using shortest path to find the most effective dynamic connection (Dynamic Time Warping). PCB [66] horizontally cut the input image into 6 pieces, extracts a partial feature for each line and finally concatenate the local characteristics. Integration convolutional neural network (ICNN) [73] jointly learned global and local features. Fan et al. [74] proposed SCPNet in which discriminative features are learned using the space-channel relationship, each channel is assigned to focus on a specific space of the body.
4.1.3. Posture information
The motivation of local features based on gesture extraction is to make full use of pedestrian key points [36],[68]. First we migrate the trained gesture estimation model to the pedestrian dataset, so obtain the positioning of the human body posture skeletal point, then extract local characteristics based on key points. Fig. 7 shows an effective approach under pose changes, i.e., fusing global and local features based on extracting key points of the human body.
Figure 7.
Fusion method of local features and global features based on the extraction of key points of human posture.
Spindle Net [36] firstly used human body structure information to facilitate feature learning, which employed feature decomposition and feature fusion based on body region information. Zheng et al. [33] introduced the PIE to describe pedestrian, the Pose-Box structure was used to align pedestrians, which is generated through pose estimation followed by affine transformations. Pose-driven Deep Convolutional (PDC) [34] proposed an end-to- end architecture that use partial human cues to mitigate pose changes. Wei et al. [35] explicitly leveraged the local and global information of body to generate more robust representation. PABP [75] learned part-aligned representation, the body feature maps are connected to a bilinear ensemble layer and then fused into a single image descriptor.
4.1.4. Semantic segmentation
Semantic segmentation of the image is also applied to the weight recognition, the representative methods are as follows. Generating realistic images of persons is a task full of challenge as the complex interplay between the different image factors, such as the foreground, background and pose information. Ma et al. [88] aimed at generating such images based on a novel, two-stage reconstruction pipeline, they give a new idea for image prospect extraction. SPReID [89] used additional semantic parsing branches to generate probability maps associated with different regions. Besides, the method of semantic segmentation can also be referred to DeepLabV2 [90], Mask RCNN [91], etc.
4.1.5. Grid characteristics
IDLA [92] simultaneously learned a similarity metric for person ReID. The input to the network is pairs of images, higher layers compute relationships by calculating the difference of the eigenvalues of a certain size grid. PersonNet [93]employed a layer of computing neighborhood range differences across two input images to capture local relationship between patches. For the problem of people images of different sizes, DSR [94] used FCN to make the pixel-level features consistent.
In general, the grid feature is a relatively fine-grained physical area feature. Some early works extended the grid feature to a part feature to calculate the feature map difference of two images. The above typical researches used the grid feature to solve ReID Work, but due to the large amount of calculation and the lack of obvious performance, the grid feature is not very commonly used.
4.2. Video sequence-based method
Video re-identification (Video ReID) is also called as sequence re-identification, which refers to the use of a continuous sequence of pedestrian pictures for pedestrian re-identification tasks. Generally, the characteristics of Video ReID include rich pedestrian posture changes, common occlusion phenomena, difficulty in fusion frame information, and inconsistent quality between frames, etc.
The earlier method of video ReID is to extract per-frame features, and then directly obtain the final feature through average pooling or maximum pooling. This is relatively simple, and the performance depends on the single frame. The difficulty of sequence re-identification lies in how to judge the quality of each frame and perform feature fusion of multiple frame features, how to generate the motion features for the sequence, and improve the computational efficiency.
Accumulative Motion Context (AMOC) [95] contained spatial motion sub-network and employed RNN to fuse the characteristic information of all frames. Aiming at how to integrate features, Deep Feature Guided Pooling (DFGP) [96] used PCA-based convolutional network (PCN) to extract features of each frame and give a new idea to take full advantage of the space-time information. Region-based Quality Estimation Network (RQEN) [97] mainly focused on the case of occlusion. Besides, Saha et al. [98]used residual network (ResNet) along with LSTM for feature extraction. Zhang et al. [99] introduced the concept of “mean-body” and defined an intra-video loss. Yu et al. [100] exploited unlabeled tracklets, Li et al. [101] proposed a new spatio-temporal attention model that automatically discovers adverse set of distinctive body parts. TCA-Net [102]considered the temporal alignment problem.
Research on video ReID across time and space has received increasing attention in recent studies. Table 7 compares several cross-temporal video re-recognition algorithm indicators published at the top conference.
Table 7.
Performance comparison of several cross-temporal video re-recognition algorithms on MARS dataset.
| Method | Source | MARS |
|||
|---|---|---|---|---|---|
| Rank1 | Rank5 | Rank20 | mAP | ||
| CTL [76] | CVPR 21 | 91.4 | 96.8 | 86.7 | |
| BiCnet-TKS [77] | CVPR 21 | 90.2 | - | - | 86.0 |
| GRL [78] | CVPR 21 | 91.0 | 96.7 | 98.4 | 84.8 |
| ST-GCN [79] | CVPR 20 | 89.95 | 96.41 | 98.28 | 83.70 |
| MGH [80] | CVPR 20 | 90.0 | 96.7 | 98.5 | 85.8 |
| MG-RAFA [81] | CVPR 20 | 88.8 | 97.0 | 98.5 | 85.9 |
| AP3D [82] | ECCV 20 | 90.7 | - | - | 85.6 |
| AFA [83] | ECCV 20 | 90.2 | 96.6 | - | 82.9 |
| TCLNet [84] | ECCV 20 | 89.8 | - | - | 85.1 |
| STRF [85] | ICCV 21 | 90.30 | - | - | 86.10 |
| STMN [86] | ICCV 21 | 90.5 | - | - | 84.5 |
| STEP-Emb [87] | ICCV 21 | 90.8 | 97.1 | 98.8 | 87.0 |
4.3. GAN based method
Generally, the performance of training models on different datasets has certain differences. The changes in experimental effect caused by changes in the training datasets are likely to be caused by overfitting of the model. To solve such problems, GAN is widely used in data augmentation and expansion. GAN [105] could randomly generate sample images, CGAN [106] added conditional constraints to generate images on the basis of GAN, Pix2pix [107] and CycleGAN [108] could convert paired images and arbitrary images in the A and B domain respectively. Zheng et al. [109]applied GAN to create unlabeled samples to expand data, and used the label smoothing regularization for outliers (LSRO) to smooth ID tags. StyleGAN [110] and CamStyle [111] could achieve style transfer between any two cameras.
PTGAN [112], SPGAN [113] solved the obvious deviation between the data collected in different scenarios, improved the network trained on the A dataset in the B scene by generating the image of the former into the style of the latter. PNGAN [114] generated the image of the desired posture according to the input image and the target posture, the features extracted from multiple fixed poses of the same pedestrian make the model more robust. Several GAN-based methods are compared and summarized in Table 8. Fig. 8 shows the images generated by several GAN networks, in which CamStyle transfers between two cameras, PNGAN generates a variety of poses, and PTGAN realizes style conversion in different domains.
Table 8.
The realization methods and targets of a typical GAN network.
| Method | Basis | Tricks | Target |
|---|---|---|---|
| DCGAN | GAN | Label smooth | Extended data |
| CamStyle | CycleGAN | Label smooth | Camera style deviation |
| PTGAN | CycleGAN + PSPNet [103] | Background segmentation | Domain dissimilarity |
| SPGAN | CycleGAN + Sianet | Constrain mapping | Domain dissimilarity |
| PNGAN | GAN+openpose [104] | Extract body pose | pose-normalization |
Figure 8.
Examples of the transferred images by Camstyle, DCGAN, PNGAN and PTGAN.
4.4. Attention mechanism-based method
Attention mechanism [115] is similar to the human method of selective signal processing by focusing attention on specific areas while ignoring other information. It can be divided into time attention mechanism and spatial attention mechanism. The former mainly focuses on which sequence is more important, while the latter mainly focuses on specific part. Song [116]used a mask to divide the complete image into background and pedestrian body regions, and used the triple loss supervised network to learn human body region features and ignore background features. Li et al. [101] solved the problems of pedestrian occlusion and misalignment in video sequences by new proposed spatio-temporal attention model. HA-CNN [117] jointly learned soft pixel as well as hard regional attention. Xu et al. [118] introduced Attention-Aware Compositional Network (AACN) for ReID, which consists of PPA and AFC, which represent pose-guided and attention-aware feature, respectively. There is also a typical HydraPlus-Net [39], that multi-directionally provide the multi-level maps with attention to different layers.
5. Challenges and frontier research
With increasing public safety requirements, pedestrian re-identification often faces the task of cross-camera retrieval. The primary difficulty in solving this problem is the appearance change caused by the difference in viewing angle or cloth-changing in different scenes. At the same time, some fine-grained attribute information needs to be considered to distinguish different people. In summary, pedestrian re-identification is challenging in learning a discriminative and robust visual representation for changes in perspective. Specifically, pedestrian re-identification may face problems and difficulties such as limited training data, algorithm performance, including effectiveness and efficiency, domain gap, unconstrained environment, etc. There are also some frontier researches to explore new structure, loss function design, and effective training tricks, etc.
5.1. Network structure design
5.1.1. ReID specific architecture
Existing ReID methods usually make some architectural modifications on the basis of image classification, mostly follow the CaffeNet [119], VGG-16 [120] and ResNet-50 backbone, for example, SVDNet [121] used the Eigenlayer as the second last FC layer, PCB [66] set the last convolutional stripe to 1 to increase feature layer pixels. Researchers have also proposed some new networks designed specifically for ReID. Noting that the metric and the classification loss are inconsistent in an embedding space, BNNeck [122] added a BN layer after global pooling to separate them into two different feature spaces. Omni-Scale Network (OSNet) [123] designed a residual block consists of feature stream that detects features of certain size. Auto-ReID [124] involved a new reID search space and a new retrieval-based model, it combined a typical classification search space and a novel part-aware module. Faster-ReID [125] gave an All-in-One framework which learns multiple codes of different lengths in a single framework with a code pyramid structure and self-distillation learning. Besides, there are some other typical networks including FPNN [65], MLFN [126], BraidNet [127] and Siamese network [128]. AGW [129] provided a baseline and achieves competitive performance on cross-modality ReID tasks.
5.1.2. Loss function design
To search for pedestrians with the same probe ID in the gallery, pedestrian re-identification often needs to achieve similarity-based clustering effects. Therefore, unlike the frequently used loss functions such as cross entropy, the commonly used metric learning methods and loss functions in pedestrian re-identification research are ID loss [16],[121], verification loss [92],[130],[131], triplet Loss [132], [133], [134], [135] as shown in Fig. 9, and contrastive loss [136], quadruplet loss [137], fusion of part loss and global loss [36], multiple loss combinations [138], circle loss [139], etc. In unsupervised learning, other non-parametric classification losses such as InfoNCE [140], [141], ClusterNCE loss [142], OIM loss [143].
Figure 9.

Typical widely used loss functions [61]. (a) Identity Loss, (b) Verification Loss, (c) Triplet Loss.
Classification loss: It is also well known as ID loss. The ID number of pedestrians in the train set is consistent with the network categories. The feature layer is connected with a FC layer for classification, and the cross entropy loss is calculated through the softmax activation function.
| (7) |
As shown in eq. (7), where the label of the input image is represented as , p is the probability of network prediction, n represents the count of training samples in per batch.
Verification Loss: It fuses two feature information to calculate a contrastive loss [70], and each time a pair of pictures are input into the same Siamese network to extract features. The contrastive loss optimizes the comparison of paired sample distances, it can be formulated by eq. (8)
| (8) |
Where when and are the same, and on the contrary. α is a margin parameter. + represents max(z,0), indicates the Euclidean distance between pairs of input features of and , which can be expressed by eq. (9)
| (9) |
Triplet Loss: Arising in the context of nearest neighbor classification [144], x is an image that is embedded into Euclidean space and constrained to a d -dimensional hypersphere . For a specific person, to ensure that the anchor is closer to the positive sample than the negative , define the loss function as eq. (10).
| (10) |
Unified Loss: Maximizing intra-class similarity and minimizing inter-class similarity are main purposes of deep feature learning. In the feature space, given a sample x, assume that the similarity scores of within-class and between-class are K and L. γ is a scale factor and m is a margin for better similarity separation. We denote these similarity scores as and , respectively. iterates through every similarity pair to reduce . It degenerates to triplet loss or classification loss through slight modifications. To minimize each sjn as well as to maximize each as well as to maximize , unified loss function is defined by:
| (11) |
Circle Loss: It allows each similarity score to be learned at their own speed, thus increasing the flexibility of optimization, depending on its current optimization status. and are non-negative weighting factors. Firstly neglect the margin item m in eq. (11) and transfer the unified loss function into the proposed circle loss and get eq. (12).
| (12) |
Softened Cross-entropy Loss: Lin et al. [145] proposed softened similarity learning and relieved the error of the hard quantization loss. Unlike original one-hot labels, an image is encouraged to be associated with k possible related classes, whose labels are treat as a distribution. The softened cross-entropy loss is formulated as eq. (13)
| (13) |
Where the probability of the class is defined as , V is the lookup table that stores the feature of each class. Hyper-parameter λ balances the effect of the ground truth and the reliable classes.
5.1.3. Bag of tricks
Research on deep learning-based person ReID has reaped tremendous progress and remarkable performance, while most of them rely on complex structural design. Luo [122] has proved through experiments that some tricks can be used to improve model performance significantly without increasing training costs. These tricks include preprocessing of the data as well as minor modifications to the network design.
Learning rate has a non-negligible impact of a ReID model. Warmup strategy [146] in Fig. 10(a) can be applied to bootstrap the network for better performance. To address the occlusion issue and Optimize the generalization ability, REA [147] was proposed to enhance data. To prevent overfitting for a classification task, Label smoothing (LS) [148], [149] in Fig. 10(b) is widely used. To obtain feature maps with higher resolution and richer granularity, Sun et al. [66] deleted the spatial down-sampling, more conveniently, the last stride of which in the backbone network was set to 1. Besides, Luo et al. added a new layer BNNeck [122] and put the triple loss and ID loss in different embedding spaces which has improved performance of the model to a large extent.
Figure 10.

Training tricks. (a). With label smoothing [149], the distribution centers at the theoretical value. (b). With warmup strategy [146], there is gradual increase in learning rate over the first 10 epochs.
5.2. Cloth-changing
Most current researches assume that no change in the dress of the same person. While in real-world scenarios, pedestrians are likely to reappear in the surveillance network after a long-time interval and change their wearing [150]. Existing researches on re-identification in changing clothes scenarios can be broadly classified into two categories according to whether pedestrians' dresses are known. Given what clothes pedestrians are wearing, Yu [151] designed Biometric-Clothes Network (BC-Net) which could effectively integrate biometric and clothes feature, it was promising in tracking suspects and finding lost children/elders in real-world scenarios. For general re-identification in the case of changing clothes with a long time span, Wu [152] exploited depth information to provide more invariant body shape and skeleton information regardless of illumination and color change. Contour sketch [153] of person image was proposed to take advantage of the shape of the human body for extracting features. 3D information caught by RGB-D Sensors [154] was considered to be a very fruitful research direction because of its main advantage of a soft biometric policy. Huang et al. [62] adopted Feature Sparse Representation and Soft Embedding Attention, integrated capsules to deal with the person re-ID task [155]. There are also Cloth-Clothing Change Aware Network (CCAN) [156] which performed identity recognition and clothing change detection simultaneously. The above methods also established some typical datasets, including COCAS [151], RGB-D [154], PRCC [153], Celeb [157], VC-Clothes [150], DG-Net [158], some samples of which are shown in Fig. 11.
Figure 11.
Pedestrian re-identification dataset samples in cloth-changing scene. (a). Celeb [157], obtained by crawling celebrity images. (b). VC-Clothes [150], generated by the game engine. (c). DG-Net [158], expanded by style transfer.
5.3. Domain adaptation
After the application scenario changes, the performance of the algorithm trained with a specific dataset could vary greatly. It is well known that there are large domain gaps among different datasets [64],[159]. For example, in the data collected by the same camera in summer, short-sleeved shorts will account for more pedestrian attributes, and backpack attributes will be more common in the data collected by outdoor cameras. For the problem of insufficient data and limitation of scene area, the data can be augmented by synthesizing images through style transfer methods such as GAN [105].
Training across datasets to achieve domain adaptation is a common method used in existing researches. In addition, it is possible to train a domain generalized model that contains multiple source datasets, so it can be transferred to new datasets without additional learning. Domain Invariant Mapping Network [160]was proposed to enable a ReID model to be deployed out-of-the-box for any new camera network domain. There are also PUL [161] methods that don't label the data of the new scene, but assign pseudo-labels to the new data through the method of clustering, and finally fine-tune the existing model. HHL [162] and Tracklet Association [163] used prior knowledge and soft label to further mine information in the target domain. Some progresses have also been made on more general classification and semantic segmentation [164] issues. Kang [165] aligned the attention maps of different domain so that the information is better adapted from source to the target network, which inspired a new way and made contributions to unsupervised domain adaptation (UDA).
5.4. Occlusion conditions
In actual scenes, occlusion is often impossible to avoid, only a portion of the human body is available [67], resulting in a decrease in the accuracy of the model, which requires the model to have a high generalization ability. The general idea to solve this problem in person recognition can be divided into two types. One is to use the key point detection [166] to extract the non-occluded part of the characterization for similarity matching, and another is to build a large-scale occlusion datasets, among which representative ones are Occluded-ReID [167], Partial-ReID [67] and P-ETHZ [168].
Using the posture information to extract the key points is a better approach to locate the occluded part of the body. OG-Net [169] map the human body to 3D space, used the point cloud to integrate the structure to learn the human body expression and obtain robust features. PGFA [166] exploited marks of pose to disentangle the useful information from the occlusion noise. HO-Net [170] learned high-order relation information assisted by posture for robust alignment.
For partial re-identification, Visibility-aware Part Model (VPM) [168] perceived the visibility of regions through self-supervision, which was capable to estimate the shared regions between two images and thus suppresses the spatial mis-alignment. Deep Spatial feature Reconstruction DSR [94] used half-length images to search for partially occluded full-body images, and generated spatial feature maps with fixed size to remain features consistent at pixel level. FPR [171] was proposed for Alignment-free Occluded Person ReID. Aligned-ReID [172],[173] used Dynamically Matching Local Information (DMLI) dynamic connection to solve horizontal block misalignment.
The performances of typical methods under various occlusion datasets are shown in Table 9. Besides, multi-modal input can also improve model robustness in unrestricted case, which makes the model closer to practical use, such as searching by verbal description. Textual-visual matching [197] measured similarities between sentence descriptions and images.
Table 9.
Performance comparisons with the occluded methods on the reported datasets.
| Method | Occluded-REID |
Partial-REID |
P-DukeMTMC |
|||
|---|---|---|---|---|---|---|
| Rank1 | mAP | Rank1 | mAP | Rank1 | mAP | |
| IDE [63] | 52.6 | 46.4 | 51.7 | 52.4 | 36.0 | 19.7 |
| PCB [66] | 59.3 | 53.2 | 66.3 | 63.8 | 43. | 24.7 |
| PGFA [166] | 57.1 | 56.2 | 68.0 | 56.2 | 44.2 | 23.1 |
| PVPM [174] | 66.8 | 59.5 | 75.3 | 71.4 | 50.1 | 29.4 |
| Occluded-ReID [167] | 68.14 | - | 78.52 | - | 46.15 | - |
| VPM [175] | - | - | 67.7 | - | - | - |
| DSR [94] | 72.80 | 62.83 | 73.67 | 68.07 | - | - |
| FPR [171] | 78.30 | 68.00 | 81.00 | 76.60 | - | - |
5.5. Model efficiency
Lightweight ReID model is getting more and more attention in current research. For higher accuracy, most of the existing methods utilize a large deep network to learn high-dimensional features for computing similarities. But the query time would increase massively with the expand of the tested gallery size.
The most direct idea to reduce model complexity is mainly focused on network pruning for image retrieval acceleration. Compared with the conventional global center-based methods, which only keep the filters far away from the geometric center, Progressive Local Filter Pruning (PLFP) [198] pruned filters according to the local relationships to the neighbors to preserve the representation ability of the model. Based on a new search space and a new retrieval algorithm, Auto-ReID [124] enabled the automated approach to find an efficient and effective CNN architecture.
To search person images quickly and accurately. The main idea of recent fast ReID methods is hashing algorithm which learns a binary code instead of real value. Several fast ReID methods [199], [200], [201], [162], [202] have been proposed to increase algorithm speed while maintaining competitive accuracy, a typical example is CtF [125] search strategy. Another research direction is to design a lightweight network by modifying the model. [123],[203].
5.6. Limited training data
Existing pedestrian datasets are generally obtained for common scenarios. In most cases, in order to generate more effective models, training specific networks require specific data sets. However, collecting data and labeling is a very time-consuming and laborious work. In order to expand the limited dataset, the most popular approach in the research is GAN [105] and its derivative models including Camstyle [111], PTGAN [112], SPGAN [113], PNGAN [114] etc. Besides, APR [16] added annotated appearance attributes for the Market-1501 dataset and the Duke-MTMC dataset, making the dataset originally used for pedestrian ReID can be applied to attribute recognition. DG-Net [158] used clothes-changing pedestrian images to enhance data. SOMAnet [204], PersonX [205] and VC-Clothes [150] used 3D game engine to generate human data in different views. Huang [157] crawled the street snap-shots of celebrities from the Internet and built Celeb-ReID.
5.7. Resolution changes
Due to changes in camera angles and inconsistent target distances, the resolution of pedestrian pictures varies greatly. SING [206] simultaneously optimized image super-resolution and person ReID matching, which use different methods to extract features of different resolutions. DSPDL [207] designed a Discriminative Semi-coupled Projective Dictionary Learning framework for matching pedestrian images of great resolution divergences. SDALF [208] (Symmetry-Driven Accumulation of Local Features) consisted in the extraction of features that model three complementary aspects of the human appearance, in this way, robustness to viewpoint and illumination variations is achieved. A new work in CVPR2020 named as INTACT [209] (Inter-Task Association Critic) gave an effective solution to use an image super-resolution model to improve the low resolution of query images, which can reduce unaligned feature distributions and promote identity matching performance.
6. State-of-the-arts
6.1. Analysis of ReID on SOTA
This part mainly reviews the ReID works published in top conferences in computer vision and artificial intelligence, and makes a comparative analysis with some other representative works. As Table 10 shown, we summarize the outcomes mainly on Market-1501 and DukeMTMC datasets. Correspondingly, the visualized scatter plot is depicted in Fig. 12.
Table 10.
Comparison with state-of-the-arts for ReID on Market-1501 and DukeMTMC.
| Method | Source | Market-1501 |
DukeMTMC |
||
|---|---|---|---|---|---|
| mAP | Rank1 | mAP | Rank1 | ||
| Supervised Learning | |||||
| Deep-Person [71] | Pattern Recognition | 79.58 | 92.31 | 64.80 | 80.90 |
| AlignedReID++ [173] | Pattern Recognition | 77.6 | 91.0 | 68.0 | 80.7 |
| FMN [176] | Pattern Recognition Letters | 67.12 | 85.99 | 56.88 | 74.51 |
| MGN [177] | ACM Multimedia | 86.9 | 95.7d | 78.4 | 88.7 |
| PAN [178] | TCSVT | 63.5 | 82.81 | 51.51 | 71.59 |
| GP-reID [179] | ArXiv 18 | 81.2 | 92.2 | 72.8 | 85.2 |
| OG-Net [169] | ArXiv 20 | 59.97 | 80.94 | 50.81 | 71.77 |
| ACRN [21] | CVPR 17 | 62.60 | 83.61 | 51.96 | 72.58 |
| SVDNet [121] | CVPR 17 | 62.1 | 82.3 | 56.8 | 76.7 |
| AACN [118] | CVPR 18 | 66.87 | 85.90 | - | 41.37 |
| SPreID [89] | CVPR 18 | 79.67 | 91.45 | 68.78 | 83.3 |
| HA-CNN [117] | CVPR 18 | 75.7 | 91.2 | 63.8 | 80.5 |
| HOReID [170] | CVPR 20 | 84.9 | 94.2 | 75.6 | 86.9 |
| CamStyle [111] | CVPR 18 | 71.55 | 89.49 | 57.61 | 78.32 |
| PCB [66] | ECCV 18 | 77.3 | 92.4 | 65.3 | 81.9 |
| PISNet [180] | ECCV 20 | 87.1 | 95.6 | 78.7 | 88.8 |
| Unsupervised Learning | |||||
| UnityStyle [181] | CVPR 20 | 95.8 | 98.5 | 93.6 | 95.1 |
| M3[182] | CVPR 20 | 82.6 | 5.4 | 68.5 | 84.7 |
| Unsupervised re-ID [183] | CVPR 20 | 37.8 | 71.7 | 28.6 | 52.5 |
| SADA [184] | CVPR 20 | 59.8 | 83 | 55.8 | 74.5 |
| Generalizing REID [185] | ECCV 20 | 71.5 | 88.1 | 65.2 | 79.5 |
| MEB-Net [186] | ECCV 20 | 76.0 | 89.9 | 66.1 | 79.6 |
| CrowdReID [187] | ECCV 20 | 84.7 | 95.3 | 74.4 | 88.3 |
| Light-ReID [125] | ECCV 20 | 84.9 | 93.7 | 74.8 | 87.6 |
| UNRN [188] | AAAI 21 | 78.1 | 91.9 | 69.1 | 82.0 |
| Transfer Learning | |||||
| SPGAN [113] | CVPR 18 | 26.9 | 58.1 | 26.4 | 46.9 |
| VPM [175] | CVPR 19 | 80.8 | 93.0 | 72.6 | 83.6 |
| HCT [189] | CVPR 20 | 56.4 | 80.0 | 50.7 | 69.6 |
| SCSN [190] | CVPR 20 | 88.5 | 95.7 | 79 | 91 |
| AD-Cluster [191] | CVPR 20 | 68.3 | 86.7 | 54.1 | 72.6 |
| HHL [162] | ECCV 18 | 31.4 | 62.2 | 27.2 | 46.9 |
| BUC [192] | AAAI 18 | 38.3 | 66.2 | 27.5 | 47.4 |
| ARN [193] | CVPR 18 | 39.4 | 70.3 | 33.4 | 60.2 |
| SSG [194] | ICCV 19 | 58.3 | 80.0 | 53.4 | 73.0 |
| MMCL [195] | CVPR 20 | 60.4 | 84.4 | 51.4 | 72.4 |
| NRMT [196] | ECCV 20 | 71.7 | 87.8 | 62.2 | 77.8 |
Figure 12.
State-of-the-arts (SOTA) of ReID on Market-1501 datasets.
Analyzing the improvement effect of the above new algorithms on the model performance, it can be seen that some tricks have brought objective efficiency, as discussed below. Applying of GAN models (e.g., DI-ReID [210] and UnityStyle [181]) could obtain more style-robust depth features for querying. Domain adaptive person ReID, including SADA [184], AD-Cluster [191], MEB-Net [186] and Generalizing ReID [185] gradually become a research hotspot. Besides, unsupervised ReID [183] adopt iterative training. PRI [211] solved low-resolution problem. Ahmed et al. [212] used transfer learning that aims to transfer the knowledge using only source models and limited labeled data. HCT [189] conducted training with hard-batch triplet loss. There are also novel methods that focus on solving some challenges, representatives are HOReID [170], PISNet [180] and CrowdReID-GASM [187], which studied ReID under occlusion, pedestrian interference and crowded conditions respectively.
For cross-domain re-identification, as Table 10 displayed, we evaluated some cross-domain ReID methods published on top conferences on two public person ReID datasets: Market-1501 and DukeMTMC. Quantitatively, we compared these state-of-the-art domain adaptation person ReID approaches. The comparison shows the influences on Map and Rank-1,5,10(%), with Market-1501 and DukeMTMC-ReID as source and target datasets, respectively.
Most work published on 2020 ECCV focused on unsupervised ReID, especially unsupervised domain adaptation (UDA), as the methods listed in the table. Typically, GDS [213] and JVTC [214] designed new training strategies. DG-Net++ [215] segregated the id-unrelated noises to get more effective adaptation from the adaptation process. AD-Cluster [191] introduced adaptive sample augmentation to generate more diverse samples.
6.2. Potential research direction of ReID
Domain Adaptation: According to Table 10, it can be seen that domain adaptive re-recognition has been extensively studied in recent years, but it is still full of challenges to achieve rank-1 above 80% when the model trained under a specific dataset is transferred to others or applications in real scenarios or even unseen domains. Therefore, ReID in rich scenarios is a promising research direction. In particular, most datasets are collected in scenes with a wide field of view, so “partial” pedestrian re-recognition technology is very important.
Visible-Infrared ReID: Pedestrian re-identification at night or under poor light conditions is also a problem worthy of study, which is an important task in night-time surveillance applications. The main idea to solve this problem is to use infrared cameras to collect pictures. Hi-CMD [216], cm-SSFT [217], CMAlign [218] and DDAG [219] provided new ideas for cross- modality person re-identification respectively.
Unsupervised and transfer learning: Since the collected data is limited in most cases, the workload of manual labeling is huge and costly. Semi-supervised [220], unsupervised learning [221] methods and transfer learning [222] have achieved widespread attention in recent studies, which also reflected by the increasing amount of work published in top venues. There has been some research works in this direction [183], [223], [224], but there is still a lot of rooms for improvement and full of challenges.
High-quality datasets: Most of the current pedestrian re-identification approaches are evaluated based on well-defined datasets. In practical scenarios, the ReID data might be collected from a variety of heterogeneous modalities. Due to domain differences, only the establishment of a large enough high-quality standard dataset can better train the robustness of the algorithm. For some clothing-change occasions, the author has created new datasets [150], [151], [153], [12], human interaction or active learning [225], [226] also provides another possible way to reduce the dependence on manual annotation.
6.3. Analysis of Attribute-Assisted ReID on SOTA
Pedestrian can be described by a combination of various different kinds of attributes, which can be a decisive clue to distinguish people with very similar global appearance. Through the Attribute-Assisted ReID algorithms mentioned in chapter 2.1, it can be drawn that person attribute recognition plays an auxiliary role in re-identification.
PAR and ReID can transfer information by attributes sharing. There are two ways to integrate attribute and ID information: (a) Independent Supervision [20]: Independently train a deep CNN for either attribute or identity label then use the concatenated feature for ReID matching. (b) Joint Supervision [27]: Multi-task joint learning CNN subjecting the identity and attribute supervision to a shared feature representation in the end-to-end model training.
Table 11, Table 12 show the performance of several Attribute Assisted ReID algorithms on Market-1501/DukeMTMC dataset and VIPeR dataset, respectively. Experimental results demonstrated the effective ReID performance gain obtained by attribute assisting [11],[15]. At the same time, the integration of ID effectively introduces complementary information to some extent and helps to obtain the improvement of overall attribute identification [19],[16].
Table 11.
Comparison of several state-of-art Attribute-Assisted ReID methods on Market-1501 and DukeMTMC-ReID datasets.
Table 12.
Comparison of the accuracy and re-identification performance of several state-of-art Attribute-Assisted ReID methods on VIPeR datasets.
Analyzing the attribute-assisted ReID method, we can draw the following conclusions:
1) Attribute attention model generate features by “seeing” and “comparing” people images to locate the most discriminative parts [14], which provide new clues for searching for target persons given appearance descriptions [17].
2) Attribute consistency principle can be exploited to achieve a clear advantage in unsupervised fusion of multiple supervisions for cross-domain ReID [27], as well as in addressing human misalignment caused by pose changes [11].
3) Attribute semantic integration model jointly learns the discriminative projection of the appearance attribute subspace can effectively assist ReID [23].
4) Attribute dataset fine-tuning improves CNN features, combined with metric learning on ReID dataset can further improve the discriminative ability. [12],[21],[22].
7. Conclusion
Through the analysis and comparison of the different methods mentioned in this article, we can draw the following conclusions. Firstly, attributes are consistent with human recognition mechanisms for identifying people. Referring to perceptual ability, person attributes can be exploited as mid-level human semantic information to help promote the performance of person re-identification task. Secondly, the fusion of global and local features can more comprehensively represent pedestrian images and enhance the robustness of the extracted features. Thirdly, the method based on posture and key point extraction of the human body is effective for occlusion and partial re-recognition. Fourthly, training datasets can be expanded using pose or style transfer methods, which can effectively solve uncontrollable environmental factors such as domain deviation and clothing change. In general, with the booming development of deep learning technology and the urgent pace of building an intelligent society, re-identification and attribute recognition are facing many challenges and have significant potential research value.
Declarations
Author contribution statement
All authors listed have significantly contributed to the development and the writing of this article.
Declaration of interests statement
The authors declare no conflict of interest.
Data availability statement
Data will be made available on request.
Funding statement
Jie Hu was supported by National Social Science Fund of China [17ZDA020], National Natural Science Foundation of China [51975360 & 52035007]. This work was supported by Cross Fund for medical and Engineering of Shanghai Jiao Tong University (YG2021QN118).
Additional information
No additional information is available for this paper.
References
- 1.Zhao R., Ouyang W., Wang X. Proceedings of the IEEE International Conference on Computer Vision. 2013. Person re-identification by salience matching; pp. 2528–2535. [Google Scholar]
- 2.Martinel N., Micheloni C., Foresti G.L. European Conference on Computer Vision. Springer; 2014. Saliency weighted features for person re-identification; pp. 191–208. [Google Scholar]
- 3.Zheng W.-S., Gong S., Xiang T. Reidentification by relative distance comparison. IEEE Trans. Pattern Anal. Mach. Intell. 2012;35:653–668. doi: 10.1109/TPAMI.2012.138. [DOI] [PubMed] [Google Scholar]
- 4.Dalal N., Triggs B. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 1. IEEE; 2005. Histograms of oriented gradients for human detection; pp. 886–893. [Google Scholar]
- 5.Lowe D.G. Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2. IEEE; 1999. Object recognition from local scale-invariant features; pp. 1150–1157. [Google Scholar]
- 6.Li Z., Chang S., Liang F., Huang T.S., Cao L., Smith J.R. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013. Learning locally-adaptive decision functions for person verification; pp. 3610–3617. [Google Scholar]
- 7.D. Wu, H. Huang, J. Qi, G. Xue, Overview of deep learning based pedestrian attribute recognition and reidentification, 2022, available at SSRN 4082891. [DOI] [PMC free article] [PubMed]
- 8.Han K., Guo J., Zhang C., Zhu M. Proceedings of the 26th ACM International Conference on Multimedia. 2018. Attribute-aware attention model for fine-grained representation learning; pp. 2040–2048. [Google Scholar]
- 9.Layne R., Hospedales T.M., Gong S., Mary Q. BMVC, vol. 2. 2012. Person re-identification by attributes; p. 8. [Google Scholar]
- 10.Layne R., Hospedales T.M., Gong S. European Conference on Computer Vision. Springer; 2012. Towards person identification and re-identification with attributes; pp. 402–412. [Google Scholar]
- 11.Li S., Yu H., Hu R. Attributes-aided part detection and refinement for person re-identification. Pattern Recognit. 2020;97 [Google Scholar]
- 12.Su C., Zhang S., Xing J., Gao W., Tian Q. European Conference on Computer Vision. Springer; 2016. Deep attributes driven multi-camera person re-identification; pp. 475–491. [Google Scholar]
- 13.Yin Z., Zheng W.-S., Wu A., Yu H.-X., Wan H., Guo X., Huang F., Lai J. Adversarial attribute-image person re-identification. 2017. arXiv:1712.01493 arXiv preprint.
- 14.Liu H., Feng J., Qi M., Jiang J., Yan S. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 2017;26:3492–3506. doi: 10.1109/TIP.2017.2700762. [DOI] [PubMed] [Google Scholar]
- 15.Taherkhani F., Dabouei A., Soleymani S., Dawson J., Nasrabadi N.M. Attribute guided sparse tensor-based model for person re-identification. 2021. arXiv:2108.04352 arXiv preprint.
- 16.Lin Y., Zheng L., Zheng Z., Wu Y., Hu Z., Yan C., Yang Y. Improving person re-identification by attribute and identity learning. Pattern Recognit. 2019;95:151–161. [Google Scholar]
- 17.Vaquero D.A., Feris R.S., Duan T., Brown L., Turk M. Applications of Computer Vision. 2009. Attribute-based people search in surveillance environments. [Google Scholar]
- 18.Lampert C.H., Nickisch H., Harmeling S. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2009. Learning to detect unseen object classes by between-class attribute transfer. [Google Scholar]
- 19.Layne R., Hospedales T.M., Gong S. Person Re-Identification. Springer; 2014. Attributes-based re-identification; pp. 93–117. [Google Scholar]
- 20.Su C., Zhang S., Xing J., Gao W., Tian Q. Multi-type attributes driven multi-camera person re-identification. Pattern Recognit. 2018;75:77–89. [Google Scholar]
- 21.Schumann A., Stiefelhagen R. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. Person re-identification by deep learning attribute-complementary information; pp. 20–28. [Google Scholar]
- 22.Matsukawa T., Suzuki E. 2016 23rd International Conference on Pattern Recognition (ICPR) IEEE; 2016. Person re-identification using CNN features learned from combination of attributes; pp. 2428–2433. [Google Scholar]
- 23.Khamis S., Kuo C.-H., Singh V.K., Shet V.D., Davis L.S. European Conference on Computer Vision. Springer; 2014. Joint learning for attribute-consistent person re-identification; pp. 134–146. [Google Scholar]
- 24.Su C., Zhang S., Yang F., Zhang G., Tian Q., Gao W., Davis L.S. Attributes driven tracklet-to-tracklet person re-identification using latent prototypes space mapping. Pattern Recognit. 2017;66:4–15. [Google Scholar]
- 25.Su C., Yang F., Zhang S., Tian Q., Davis L.S., Gao W. Proceedings of the IEEE International Conference on Computer Vision. 2015. Multi-task learning with low rank attribute embedding for person re-identification; pp. 3739–3747. [Google Scholar]
- 26.Jeong B., Park J., Kwak S. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. ASMR: learning attribute-based person search with adaptive semantic margin regularizer; pp. 12016–12025. [Google Scholar]
- 27.Wang J., Zhu X., Gong S., Li W. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. Transferable joint attribute-identity deep learning for unsupervised person re-identification; pp. 2275–2284. [Google Scholar]
- 28.Li A., Liu L., Wang K., Liu S., Yan S. Clothing attributes assisted person reidentification. IEEE Trans. Circuits Syst. Video Technol. 2014;25:869–878. [Google Scholar]
- 29.Yang W., Mori G. European Conference on Computer Vision. 2010. A discriminative latent model of object classes and attributes. [Google Scholar]
- 30.Wu G., Rahimi A., Chang E.Y., Goh K., Tsai T., Jain A., Wang Y.-F. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), vol. 1. IEEE; 2006. Identifying color in motion in video sensors; pp. 561–569. [Google Scholar]
- 31.Huang J., Feris R.S., Chen Q., Yan S. Proceedings of the IEEE International Conference on Computer Vision. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network; pp. 1062–1070. [Google Scholar]
- 32.Zheng L., Shen L., Tian L., Wang S., Wang J., Tian Q. Proceedings of the IEEE International Conference on Computer Vision. 2015. Scalable person re-identification: a benchmark; pp. 1116–1124. [Google Scholar]
- 33.Zheng L., Huang Y., Lu H., Yang Y. Pose-invariant embedding for deep person re-identification. IEEE Trans. Image Process. 2019;28:4500–4509. doi: 10.1109/TIP.2019.2910414. [DOI] [PubMed] [Google Scholar]
- 34.Su C., Li J., Zhang S., Xing J., Gao W., Tian Q. Proceedings of the IEEE International Conference on Computer Vision. 2017. Pose-driven deep convolutional model for person re-identification; pp. 3960–3969. [Google Scholar]
- 35.Wei L., Zhang S., Yao H., Gao W., Tian Q. Proceedings of the 25th ACM International Conference on Multimedia. 2017. GLAD: global-local-alignment descriptor for pedestrian retrieval; pp. 420–428. [Google Scholar]
- 36.Zhao H., Tian M., Sun S., Shao J., Yan J., Yi S., Wang X., Tang X. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Spindle Net: person re-identification with human body region guided feature decomposition and fusion; pp. 1077–1085. [Google Scholar]
- 37.Deng Y., Luo P., Loy C.C., Tang X. Proceedings of the 22nd ACM International Conference on Multimedia. 2014. Pedestrian attribute recognition at far distance; pp. 789–792. [Google Scholar]
- 38.Zheng Z., Zheng L., Yang Y. Proceedings of the IEEE International Conference on Computer Vision. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro; pp. 3754–3762. [Google Scholar]
- 39.Liu X., Zhao H., Tian M., Sheng L., Shao J., Yi S., Yan J., Wang X. 2017 IEEE International Conference on Computer Vision (ICCV) 2017. HydraPlus-Net: attentive deep features for pedestrian analysis. [Google Scholar]
- 40.Li W., Zhao R., Xiao T., Wang X. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. DeepReID: deep filter pairing neural network for person re-identification; pp. 152–159. [Google Scholar]
- 41.Wei L., Zhang S., Gao W., Tian Q. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. Person transfer GAN to bridge domain gap for person re-identification; pp. 79–88. [Google Scholar]
- 42.Gray D., Brennan S., Tao H. Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), vol. 3. 2007. Evaluating appearance models for recognition, reacquisition, and tracking; pp. 1–7. [Google Scholar]
- 43.Zheng L., Bie Z., Sun Y., Wang J., Su C., Wang S., Tian Q. European Conference on Computer Vision. Springer; 2016. MARS: a video benchmark for large-scale person re-identification; pp. 868–884. [Google Scholar]
- 44.Wang T., Gong S., Zhu X., Wang S. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 2016;38:2501–2514. doi: 10.1109/TPAMI.2016.2522418. [DOI] [PubMed] [Google Scholar]
- 45.Wu Y., Lin Y., Dong X., Yan Y., Ouyang W., Yang Y. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning; pp. 5177–5186. [Google Scholar]
- 46.Song G., Leng B., Liu Y., Hetang C., Cai S. vol. 32. 2018. Region-based quality estimation network for large-scale person re-identification. (Proceedings of the AAAI Conference on Artificial Intelligence). [Google Scholar]
- 47.Zheng L., Zhang H., Sun S., Chandraker M., Yang Y., Tian Q. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Person re-identification in the wild; pp. 1367–1376. [Google Scholar]
- 48.Loy C.C., Liu C., Gong S. 2013 IEEE International Conference on Image Processing. IEEE; 2013. Person re-identification by manifold ranking; pp. 3567–3571. [Google Scholar]
- 49.Li W., Zhao R., Wang X. Asian Conference on Computer Vision. Springer; 2012. Human reidentification with transferred metric learning; pp. 31–44. [Google Scholar]
- 50.Li W., Wang X. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013. Locally aligned feature transforms across views; pp. 3594–3601. [Google Scholar]
- 51.Felzenszwalb P.F., Girshick R.B., McAllester D., Ramanan D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009;32:1627–1645. doi: 10.1109/TPAMI.2009.167. [DOI] [PubMed] [Google Scholar]
- 52.Das A., Chakraborty A., Roy-Chowdhury A.K. European Conference on Computer Vision. Springer; 2014. Consistent re-identification in a camera network; pp. 330–345. [Google Scholar]
- 53.Ma L., Liu H., Hu L., Wang C., Sun Q. Orientation driven bag of appearances for person re-identification. 2016. arXiv:1605.02464 arXiv preprint.
- 54.Dehghan A., Modiri Assari S., Shah M. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking; pp. 4091–4099. [Google Scholar]
- 55.Gou M., Wu Z., Rates-Borras A., Camps O., Radke R.J., et al. A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. IEEE Trans. Pattern Anal. Mach. Intell. 2018;41:523–536. doi: 10.1109/TPAMI.2018.2807450. [DOI] [PubMed] [Google Scholar]
- 56.Dollár P., Appel R., Belongie S., Perona P. Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014;36:1532–1545. doi: 10.1109/TPAMI.2014.2300479. [DOI] [PubMed] [Google Scholar]
- 57.Gou M., Karanam S., Liu W., Camps O., Radke R.J. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. DukeMTMC4ReID: a large-scale multi-camera person re-identification dataset; pp. 10–19. [Google Scholar]
- 58.Benenson R., Omran M., Hosang J., Schiele B. European Conference on Computer Vision. Springer; 2014. Ten years of pedestrian detection, what have we learned? pp. 613–627. [Google Scholar]
- 59.Wang T., Gong S., Zhu X., Wang S. European Conference on Computer Vision. Springer; 2014. Person re-identification by video ranking; pp. 688–703. [Google Scholar]
- 60.Li J., Wang J., Tian Q., Gao W., Zhang S. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. Global-local temporal representations for video person re-identification; pp. 3958–3967. [Google Scholar]
- 61.Ye M., Shen J., Lin G., Xiang T., Shao L., Hoi S.C. Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021 doi: 10.1109/TPAMI.2021.3054775. [DOI] [PubMed] [Google Scholar]
- 62.Huang Y., Xu J., Wu Q., Zhong Y., Zhang P., Zhang Z. Beyond scalar neuron: adopting vector-neuron capsules for long-term person re-identification. IEEE Trans. Circuits Syst. Video Technol. 2019;30:3459–3471. [Google Scholar]
- 63.L. Zheng, Y. Yang, A.G. Hauptmann, 2016, Person re-identification: past, present and future.
- 64.Yi D., Lei Z., Liao S., Li S.Z. 2014 22nd International Conference on Pattern Recognition. IEEE; 2014. Deep metric learning for person re-identification; pp. 34–39. [Google Scholar]
- 65.Li W., Zhao R., Xiao T., Wang X. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. DeepReID: deep filter pairing neural network for person re-identification; pp. 152–159. [Google Scholar]
- 66.Sun Y., Zheng L., Yang Y., Tian Q., Wang S. Proceedings of the European Conference on Computer Vision (ECCV) 2018. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline) pp. 480–496. [Google Scholar]
- 67.Zheng W.-S., Li X., Xiang T., Liao S., Lai J., Gong S. Proceedings of the IEEE International Conference on Computer Vision. 2015. Partial person re-identification; pp. 4678–4686. [Google Scholar]
- 68.Su C., Li J., Zhang S., Xing J., Gao W., Tian Q. Proceedings of the IEEE International Conference on Computer Vision. 2017. Pose-driven deep convolutional model for person re-identification; pp. 3960–3969. [Google Scholar]
- 69.Zhao L., Li X., Zhuang Y., Wang J. Proceedings of the IEEE International Conference on Computer Vision. 2017. Deeply-learned part-aligned representations for person re-identification; pp. 3219–3228. [Google Scholar]
- 70.Varior R.R., Bing S., Lu J., Dong X., Gang W. European Conference on Computer Vision. 2016. A Siamese long short-term memory architecture for human re-identification. [Google Scholar]
- 71.Bai X., Yang M., Huang T., Dou Z., Yu R., Xu Y. Deep-person: learning discriminative deep features for person re-identification. Pattern Recognit. 2020;98 [Google Scholar]
- 72.Z. Xuan, L. Hao, F. Xing, W. Xiang, S. Jian, AlignedReID: surpassing human-level performance in person re-identification, 2017.
- 73.Zhang Z., Si T., Liu S. Integration convolutional neural network for person re-identification in camera networks. IEEE Access. 2018:1. [Google Scholar]
- 74.Fan X., Luo H., Zhang X., He L., Zhang C., Jiang W. Springer; Cham: 2018. SCPNet: Spatial-Channel Parallelism Network for Joint Holistic and Partial Person Re-Identification. [Google Scholar]
- 75.Suh Y., Wang J., Tang S., Mei T., Lee K.M. Springer; Cham: 2018. Part-Aligned Bilinear Representations for Person Re-Identification. [Google Scholar]
- 76.J. Liu, Z.J. Zha, W. Wu, K. Zheng, Q. Sun, Spatial-temporal correlation and topology learning for person re-identification in videos, 2021.
- 77.R. Hou, H. Chang, B. Ma, R. Huang, S. Shan, BiCnet-TKS: learning efficient spatial-temporal representation for video person re-identification, 2021.
- 78.X. Liu, P. Zhang, C. Yu, H. Lu, X. Yang, 2021, Watching you: global-guided reciprocal learning for video-based person re-identification.
- 79.Yang J., Zheng W.S., Yang Q., Chen Y.C., Tian Q. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Spatial-temporal graph convolutional network for video-based person re-identification. [Google Scholar]
- 80.Y. Yan, J. Qin, J. Chen, L. Liu, F. Zhu, Y. Tai, L. Shao, Learning multi-granular hypergraphs for video-based person re-identification, 2021.
- 81.Zhang Z., Lan C., Zeng W., Chen Z. IEEE; 2020. Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. [Google Scholar]
- 82.X. Gu, H. Chang, B. Ma, H. Zhang, X. Chen, Appearance-Preserving 3D Convolution for Video-based Person Re-identification, 2020.
- 83.Chen G., Rao Y., Lu J., Zhou J. Computer Vision – ECCV 2020. 2020. Temporal coherence or temporal motion: which is more critical for video-based person re-identification? [Google Scholar]
- 84.R. Hou, H. Chang, B. Ma, S. Shan, X. Chen, 2020, Temporal complementary learning for video person re-identification.
- 85.Aich A., Zheng M., Karanam S., Chen T., Roy-Chowdhury A.K., Wu Z. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. Spatio-temporal representation factorization for video-based person re-identification; pp. 152–162. [Google Scholar]
- 86.C. Eom, G. Lee, J. Lee, B. Ham, Video-based person re-identification with spatial and temporal memory networks, 2021.
- 87.T. He, X. Jin, X. Shen, J. Huang, X.S. Hua, Dense interaction learning for video-based person re-identification, 2021.
- 88.L. Ma, Q. Sun, S. Georgoulis, L.V. Gool, B. Schiele, M. Fritz, 2019, Disentangled person image generation supplementary material.
- 89.Kalayeh M.M., Basaran E., Gokmen M., Kamasak M.E., Shah M. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. Human semantic parsing for person re-identification. [Google Scholar]
- 90.Chen L.C., Papandreou G., Kokkinos I., Murphy K., Yuille A.L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018;40:834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]
- 91.He K., Gkioxari G., Dollár P., Girshick R. Proceedings of the IEEE International Conference on Computer Vision. 2017. Mask R-CNN; pp. 2961–2969. [Google Scholar]
- 92.Ahmed E., Jones M., Marks T.K. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015. An improved deep learning architecture for person re-identification. [Google Scholar]
- 93.W. Lin, C. Shen, A. Hengel, PersonNet: person re-identification with deep convolutional neural networks, 2016.
- 94.He L., Liang J., Li H., Sun Z. IEEE; 2018. Deep spatial feature reconstruction for partial person re-identification: alignment-free approach. [Google Scholar]
- 95.Hao L., Jie Z., Jayashree K., Qi M., Feng J. Video-based person re-identification with accumulative motion context. IEEE Trans. Circuits Syst. Video Technol. 2016:1. [Google Scholar]
- 96.Li Y., Li Z., Li J., Jing Z., Qi T. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2017. Video-based person re-identification by deep feature guided pooling. [Google Scholar]
- 97.G. Song, B. Leng, L. Yu, C. Hetang, S. Cai, Region-based quality estimation network for large-scale person re-identification, 2017.
- 98.B. Saha, K.S. Ram, J. Mukhopadhyay, A. Roy, A. Navelkar, Video based person re-identification by re-ranking attentive temporal information in deep recurrent convolutional networks, 2018, pp. 1663–1667.
- 99.Zhang W., Li Y., Lu W., Xu X., Liu Z., Ji X. Learning intra-video difference for person re-identification. IEEE Trans. Circuits Syst. Video Technol. 2019;29:3028–3036. [Google Scholar]
- 100.Yu W., Lin Y., Dong X., Yan Y., Yi Y. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2018. Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning. [Google Scholar]
- 101.Li S., Bak S., Carr P., Wang X. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. Diversity regularized spatiotemporal attention for video-based person re-identification. [Google Scholar]
- 102.Chen D., Zha Z.J., Liu J., Xie H., Zhang Y. Pacific Rim Conference on Multimedia. 2018. Temporal-contextual attention network for video-based person re-identification. [Google Scholar]
- 103.Zhao H., Shi J., Qi X., Wang X., Jia J. IEEE Computer Society; 2016. Pyramid scene parsing network. [Google Scholar]
- 104.Zhe C., Simon T., Wei S.E., Sheikh Y. IEEE; 2017. Realtime multi-person 2D pose estimation using part affinity fields. [DOI] [PubMed] [Google Scholar]
- 105.Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014;27 [Google Scholar]
- 106.Mirza M., Osindero S. Conditional generative adversarial nets. Comput. Sci. 2014:2672–2680. [Google Scholar]
- 107.Isola P., Zhu J.Y., Zhou T., Efros A.A. IEEE Conference on Computer Vision Pattern Recognition. 2016. Image-to-image translation with conditional adversarial networks. [Google Scholar]
- 108.Zhu J.Y., Park T., Isola P., Efros A.A. IEEE; 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. [Google Scholar]
- 109.Zheng Z., Liang Z., Yi Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. IEEE Comput. Soc. 2017 [Google Scholar]
- 110.Karras T., Laine S., Aila T. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019. A style-based generator architecture for generative adversarial networks. [DOI] [PubMed] [Google Scholar]
- 111.Zhong Z., Liang Z., Zheng Z., Li S., Yi Y. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. Camera style adaptation for person re-identification. [Google Scholar]
- 112.Wei L., Zhang S., Wen G., Qi T. IEEE; 2018. Person transfer GAN to bridge domain gap for person re-identification. [Google Scholar]
- 113.W. Deng, Z. Liang, G. Kang, Y. Yi, J. Jiao, Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification, 2017.
- 114.X. Qian, Y. Fu, T. Xiang, W. Wang, J. Qiu, Y. Wu, Y.G. Jiang, X. Xue, 2017, Pose-normalized image generation for person re-identification.
- 115.Xiao T., Xu Y., Yang K., Zhang J., Peng Y., Zhang Z. IEEE; 2014. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. [Google Scholar]
- 116.Song C., Yan H., Ouyang W., Liang W. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2018. Mask-guided contrastive attention model for person re-identification. [Google Scholar]
- 117.Li W., Zhu X., Gong S. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. Harmonious attention network for person re-identification. [Google Scholar]
- 118.Xu J., Zhao R., Zhu F., Wang H., Ouyang W. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. Attention-aware compositional network for person re-identification; pp. 2119–2128. [Google Scholar]
- 119.Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, 2014, Caffe: convolutional architecture for fast feature embedding, ACM.
- 120.Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. Comput. Sci. 2014 [Google Scholar]
- 121.Sun Y., Zheng L., Deng W., Wang S. 2017 IEEE International Conference on Computer Vision (ICCV) 2017. SVDNet for pedestrian retrieval. [Google Scholar]
- 122.Hao L. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2019. Bags of tricks and a strong baseline for deep person re-identification. [Google Scholar]
- 123.Zhou K., Yang Y., Cavallaro A., Xiang T. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2020. Omni-scale feature learning for person re-identification. [Google Scholar]
- 124.Quan R., Dong X., Wu Y., Zhu L., Yang Y. IEEE; 2019. Auto-ReID: searching for a part-aware ConVnet for person re-identification. [Google Scholar]
- 125.G. Wang, S. Gong, J. Cheng, Z. Hou, Faster Person Re-Identification, 2020.
- 126.Chang X., Hospedales T.M., Xiang T. IEEE; 2018. Multi-level factorisation net for person re-identification. [Google Scholar]
- 127.Wang Y., Chen Z., Feng W., Gang W. IEEE Conference on Computer Vision Pattern Recognition. 2018. Person re-identification with cascaded pairwise convolutions. [Google Scholar]
- 128.Guo Y., Cheung N.M. IEEE; 2018. Efficient and deep person re-identification using multi-level similarity. [Google Scholar]
- 129.Ye M., Shen J., Lin G., Xiang T., Hoi S. Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 2021:1. doi: 10.1109/TPAMI.2021.3054775. [DOI] [PubMed] [Google Scholar]
- 130.Zheng Z., Zheng L., Yang Y. A discriminatively learned CNN embedding for person re-identification. ACM Trans. Multimed. Comput. Commun. Appl. 2018;14 [Google Scholar]
- 131.Chen D., Dan X., Li H., Sebe N., Wang X. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2018. Group consistent similarity learning via deep CRF for person re-identification. [Google Scholar]
- 132.Hermans A., Beyer L., Leibe B. In defense of the triplet loss for person re-identification. 2017. arXiv:1703.07737 arXiv preprint.
- 133.Schroff F., Kalenichenko D., Philbin J. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. FaceNet: a unified embedding for face recognition and clustering; pp. 815–823. [Google Scholar]
- 134.Cheng D., Gong Y., Zhou S., Wang J., Zheng N. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function; pp. 1335–1344. [Google Scholar]
- 135.Liu H., Feng J., Qi M., Jiang J., Yan S. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 2017;26:3492–3506. doi: 10.1109/TIP.2017.2700762. [DOI] [PubMed] [Google Scholar]
- 136.Varior R.R., Haloi M., Wang G. European Conference on Computer Vision. Springer; 2016. Gated Siamese convolutional neural network architecture for human re-identification; pp. 791–808. [Google Scholar]
- 137.Chen W., Chen X., Zhang J., Huang K. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Beyond triplet loss: a deep quadruplet network for person re-identification; pp. 403–412. [Google Scholar]
- 138.Cheng W., Qian Z., Chang H., Liu W., Wang X. Mancs: a multi-task attentional network with curriculum sampling for person re-identification. European Conference on Computer Vision, Proceedings, Part IV; 15th European Conference, Munich, Germany, September 8-14, 2018; 2018. [Google Scholar]
- 139.Sun Y., Cheng C., Zhang Y. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Circle loss: a unified perspective of pair similarity optimization; pp. 6398–6407. [Google Scholar]
- 140.Oord A.v.d., Li Y., Vinyals O. Representation learning with contrastive predictive coding. 2018. arXiv:1807.03748 arXiv preprint.
- 141.He K., Fan H., Wu Y., Xie S., Girshick R. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Momentum contrast for unsupervised visual representation learning; pp. 9729–9738. [Google Scholar]
- 142.Dai Z., Wang G., Yuan W., Liu X., Zhu S., Tan P. Cluster contrast for unsupervised person re-identification. 2021. arXiv:2103.11568 arXiv preprint.
- 143.Xiao T., Li S., Wang B., Lin L., Wang X. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Joint detection and identification feature learning for person search; pp. 3415–3424. [Google Scholar]
- 144.Weinberger K.Q., Saul L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 2009;10:207–244. [Google Scholar]
- 145.Lin Y., Xie L., Wu Y., Yan C., Tian Q. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Unsupervised person re-identification via softened similarity learning; pp. 3390–3399. [Google Scholar]
- 146.X. Fan, W. Jiang, H. Luo, M. Fei, SphereReID: deep hypersphere manifold embedding for person re-identification, 2018.
- 147.Zhong Z., Zheng L., Kang G., Li S., Yang Y. Random erasing data augmentation. Proc. AAAI Conf. Artif. Intell. 2017;34 [Google Scholar]
- 148.Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna Z. IEEE; 2016. Rethinking the inception architecture for computer vision; pp. 2818–2826. [Google Scholar]
- 149.T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, M. Li, Bag of tricks for image classification with convolutional neural networks, 2018.
- 150.Wan F., Wu Y., Qian X., Chen Y., Fu Y. IEEE; 2020. When person re-identification meets changing clothes. [Google Scholar]
- 151.S. Yu, S. Li, D. Chen, R. Zhao, J. Yan, Y. Qiao, COCAS: a large-scale clothes changing person dataset for re-identification, 2020.
- 152.Wu A., Zheng W.-S., Lai J.-H. Robust depth-based person re-identification. IEEE Trans. Image Process. 2017;26:2588–2603. doi: 10.1109/TIP.2017.2675201. [DOI] [PubMed] [Google Scholar]
- 153.Yang Qize, Wu Ancong, Zheng Wei-Shi. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019. Person re-identification by contour sketch under moderate clothing change. [DOI] [PubMed] [Google Scholar]
- 154.Barbosa I.B., Cristani M., Bue A.D., Bazzani L., Murino V. International Conference on Computer Vision. 2012. Re-identification with RGB-D sensors. [Google Scholar]
- 155.Peng Z., Qiang W., Xu J., Jian Z. IEEE Winter Conference on Applications of Computer Vision. 2018. Long-term person re-identification using true motion from videos. [Google Scholar]
- 156.Jia X., Meng Z., Katipally K., Wang H., Zon K.V. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018. Clothing change aware person identification. [Google Scholar]
- 157.Huang Y., Wu Q., Xu J., Zhong Y. 2019 International Joint Conference on Neural Networks (IJCNN) 2019. Celebrities-ReID: a benchmark for clothes variation in long-term person re-identification. [Google Scholar]
- 158.Zheng Z., Yang X., Yu Z., Zheng L., Yang Y., Kautz J. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Joint discriminative and generative learning for person re-identification. [Google Scholar]
- 159.S. Liao, L. Shao, Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting, 2019.
- 160.Song J., Yang Y., Song Y.Z., Xiang T., Hospedales T.M. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019. Generalizable person re-identification by domain-invariant mapping network. [Google Scholar]
- 161.Fan H., Zheng L., Yan C., Yang Y. Unsupervised person re-identification: clustering and fine-tuning. ACM Trans. Multimed. Comput. Commun. Appl. 2018;14:1–18. [Google Scholar]
- 162.Zhong Z., Zheng L., Li S., Yang Y. Springer; Cham: 2018. Generalizing a Person Retrieval Model Hetero- and Homogeneously. [Google Scholar]
- 163.Fan H., Zheng L., Yan C., Yang Y. Unsupervised person re-identification by deep learning tracklet association. ACM Trans. Multimed. Comput. Commun. Appl. 2018;14:1–18. [Google Scholar]
- 164.Zheng Z., Yang Y. Unsupervised scene adaptation with memory regularization in vivo. 2019. arXiv:1912.11164 arXiv preprint.
- 165.Kang G., Zheng L., Yan Y., Yang Y. Springer; Cham: 2018. Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: the Benefit of Target Expectation Maximization. [Google Scholar]
- 166.Miao J., Wu Y., Liu P., Ding Y., Yang Y. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019. Pose-guided feature alignment for occluded person re-identification. [Google Scholar]
- 167.Zhuo J., Chen Z., Lai J., Wang G. 2018 IEEE International Conference on Multimedia and Expo (ICME) IEEE; 2018. Occluded person re-identification; pp. 1–6. [Google Scholar]
- 168.A. Ess, A mobile vision system for robust multi-person tracking, 2008, pp. 1–8.
- 169.Zheng Z., Yang Y. Person re-identification in the 3D space. 2020. arXiv:2006.04569 arXiv preprint. [DOI] [PubMed]
- 170.G. Wang, S. Yang, H. Liu, Z. Wang, Y. Yang, S. Wang, G. Yu, E. Zhou, J. Sun, High-order information matters: learning relation and topology for occluded person re-identification, 2020.
- 171.He L., Wang Y., Liu W., Liao X., Zhao H., Sun Z., Feng J. IEEE; 2019. Foreground-aware pyramid reconstruction for alignment-free occluded person re-identification. [Google Scholar]
- 172.Z. Xuan, L. Hao, F. Xing, W. Xiang, S. Jian, AlignedReID: surpassing human-level performance in person re-identification, 2017.
- 173.Luo H., Jiang W., Zhang X., Fan X., Zhang C. AlignedReID++: dynamically matching local information for person re-identification. Pattern Recognit. 2019;94 [Google Scholar]
- 174.Gao S., Wang J., Lu H., Liu Z. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Pose-guided visible part matching for occluded person ReID. [Google Scholar]
- 175.Sun Y., Xu Q., Li Y., Zhang C., Li Y., Wang S., Sun J. IEEE; 2020. Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. [Google Scholar]
- 176.Ding G., Khan S., Tang Z., Porikli F. Feature mask network for person re-identification. Pattern Recognit. Lett. 2020;137:91–98. [Google Scholar]
- 177.Wang G., Yuan Y., Chen X., Li J., Zhou X. Proceedings of the 26th ACM International Conference on Multimedia. 2018. Learning discriminative features with multiple granularities for person re-identification; pp. 274–282. [Google Scholar]
- 178.Zheng Z., Zheng L., Yang Y. Pedestrian alignment network for large-scale person re-identification. IEEE Trans. Circuits Syst. Video Technol. 2018;29:3037–3045. [Google Scholar]
- 179.Almazan J., Gajic B., Murray N., Larlus D. Re-ID done right: towards good practices for person re-identification. 2018. arXiv:1801.05339 arXiv preprint.
- 180.S. Zhao, C. Gao, J. Zhang, H. Cheng, X. Sun, Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians, 2020.
- 181.Liu C., Chang X., Shen Y.D. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Unity style transfer for person re-identification. [Google Scholar]
- 182.Zhou J., Su B., Wu Y. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Online joint multi-metric adaptation from frequent sharing-subset mining for person re-identification. [Google Scholar]
- 183.Lin Y., Xie L., Wu Y., Yan C., Tian Q. IEEE; 2020. Unsupervised person re-identification via softened similarity learning. [Google Scholar]
- 184.Wang G., Lai J.H., Liang W., Wang G. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Smoothing adversarial domain attack and P-memory reconsolidation for cross-domain person re-identification. [Google Scholar]
- 185.Luo C., Song C., Zhang Z. European Conference on Computer Vision. 2020. Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup. [Google Scholar]
- 186.Y. Zhai, Q. Ye, S. Lu, M. Jia, R. Ji, Y. Tian, Multiple expert brainstorming for domain adaptive person re-identification, 2020.
- 187.He L., Liu W. Computer Vision – ECCV 2020. 2020. Guided saliency feature learning for person re-identification in crowded scenes. [Google Scholar]
- 188.Zheng K., Lan C., Zeng W., Zhang Z., Zha Z.-J. vol. 35. 2021. Exploiting sample uncertainty for domain adaptive person re-identification; pp. 3538–3546. (Proceedings of the AAAI Conference on Artificial Intelligence). [Google Scholar]
- 189.K. Zeng, Hierarchical clustering with hard-batch triplet loss for person re-identification, 2019.
- 190.Chen X., Fu C., Zhao Y., Zheng F., Yang Y. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Salience-guided cascaded suppression network for person re-identification. [Google Scholar]
- 191.Zhai Y., Lu S., Ye Q., Shan X., Tian Y. IEEE; 2020. AD-CLUSTER: augmented discriminative clustering for domain adaptive person re-identification. [Google Scholar]
- 192.Lin Y., Dong X., Zheng L., Yan Y., Yang Y. vol. 33. 2019. A bottom-up clustering approach to unsupervised person re-identification; pp. 8738–8745. (Proceedings of the AAAI Conference on Artificial Intelligence). [Google Scholar]
- 193.Li Y.-J., Yang F.-E., Liu Y.-C., Yeh Y.-Y., Du X., Frank Wang Y.-C. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018. Adaptation and re-identification network: an unsupervised deep transfer learning approach to person re-identification; pp. 172–178. [Google Scholar]
- 194.Fu Y., Wei Y., Wang G., Zhou Y., Shi H., Huang T.S. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. Self-similarity grouping: a simple unsupervised cross domain adaptation approach for person re-identification; pp. 6112–6121. [Google Scholar]
- 195.Wang D., Zhang S. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. Unsupervised person re-identification via multi-label classification; pp. 10981–10990. [Google Scholar]
- 196.Zhao F., Liao S., Xie G.-S., Zhao J., Zhang K., Shao L. European Conference on Computer Vision. Springer; 2020. Unsupervised domain adaptation with noise resistible mutual-training for person re-identification; pp. 526–544. [Google Scholar]
- 197.Li S., Xiao T., Li H., Yang W., Wang X. IEEE Computer Society; 2017. Identity-Aware Textual-Visual Matching with Latent Co-Attention. [Google Scholar]
- 198.X. Wang, Z. Zheng, Y. He, F. Yan, Z. Zeng, Y. Yang, 2020, Progressive local filter pruning for image retrieval acceleration.
- 199.Chen J., Wang Y., Rui W. 2016 IEEE International Conference on Image Processing (ICIP) 2016. Person re-identification by distance metric learning to discrete hashing. [Google Scholar]
- 200.Wen F., Hu H.M., Hu Z., Liao S., Bo L. Perceptual hash-based feature description for person re-identification. Neurocomputing. 2017;272 [Google Scholar]
- 201.Z. Feng, S. Ling, Learning cross-view binary identities for fast person re-identification, 2016.
- 202.Wu L., Wang Y., Ge Z., Hu Q., Li X. Structured deep hashing with convolutional neural networks for fast person re-identification. Comput. Vis. Image Underst. 2017;167 [Google Scholar]
- 203.Li W., Zhu X., Gong S. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. Harmonious attention network for person re-identification. [Google Scholar]
- 204.Barbosa Igor Barros, Cristani Marco, Caputo Barbara, Rognhaugen Aleksander, Theoharis Looking beyond appearances: synthetic training data for deep CNNs in re identification. Comput. Vis. Image Underst. 2018 [Google Scholar]
- 205.X. Sun, L. Zheng, 2018, Dissecting person re-identification from the viewpoint of viewpoint.
- 206.G. Zhang, Y. Ge, Z. Dong, H. Wang, Y. Zheng, S. Chen, Deep high-resolution representation learning for cross-resolution person re-identification, 2021. [DOI] [PubMed]
- 207.Li K., Ding Z., Li S., Fu Y. vol. 32. 2018. Discriminative semi-coupled projective dictionary learning for low-resolution person re-identification. (Proceedings of the AAAI Conference on Artificial Intelligence). [Google Scholar]
- 208.Bazzani L., Cristani M., Perina A., Murino V. Multiple-shot person re-identification by chromatic and epitomic analyses. Pattern Recognit. Lett. 2012;33:898–903. [Google Scholar]
- 209.Cheng Z., Dong Q., Gong S., Zhu X. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Inter-task association critic for cross-resolution person re-identification. [Google Scholar]
- 210.Huang Y., Zha Z.J., Fu X., Hong R., Li L. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Real-world person re-identification via degradation invariance learning. [Google Scholar]
- 211.Han K., Huang Y., Chen Z., Wang L., Tan T. Computer Vision – ECCV 2020. 2020. Prediction and recovery for adaptive low-resolution person re-identification. [Google Scholar]
- 212.S.M. Ahmed, A.R. Lejblle, R. Panda, A.K. Roy-Chowdhury, Camera on-boarding for person re-identification using hypothesis transfer learning, 2020.
- 213.X. Jin, C. Lan, W. Zeng, Z. Chen, Global distance-distributions separation for unsupervised person re-identification, 2020.
- 214.J. Li, S. Zhang, Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification, 2020.
- 215.Zou Y., Yang X., Yu Z., Kumar B.V., Kautz J. Joint disentangling and adaptation for cross-domain person re-identification. Computer Vision–ECCV 2020, Proceedings, Part II 16; 16th European Conference, Glasgow, UK, August 23–28, 2020; Springer; 2020. pp. 87–104. [Google Scholar]
- 216.Choi S., Lee S., Kim Y., Kim T., Kim C. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Hi-CMD: hierarchical cross-modality disentanglement for visible-infrared person re-identification. [Google Scholar]
- 217.Lu Y., Wu Y., Liu B., Zhang T., Li B., Chu Q., Yu N. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Cross-modality person re-identification with shared-specific feature transfer. [Google Scholar]
- 218.H. Park, S. Lee, J. Lee, B. Ham, 2021, Learning by aligning: visible-infrared person re-identification using cross-modal correspondences.
- 219.M. Ye, J. Shen, D.J. Crandall, L. Shao, J. Luo, Dynamic dual-attentive aggregation learning for visible-infrared person re-identification, 2020.
- 220.Chapelle O., Scholkopf B., Zien E.A. Semi-supervised learning. IEEE Trans. Neural Netw. 2009;20:542. O. Chapelle et al., eds., 2006, book reviews. [Google Scholar]
- 221.Ghahramani Z. Advanced Lectures on Machine Learning. 1995. Unsupervised learning. [Google Scholar]
- 222.Pan S.J., Qiang Y. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010;22:1345–1359. [Google Scholar]
- 223.D. Mekhazni, A. Bhuiyan, G. Ekladious, E. Granger, Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification, 2020.
- 224.Zhu Xiatian, Gong Jianming, Lam Shaogang, Zhong Kin-Man, Xiaolong Yisheng. Person re-identification by unsupervised video matching. Pattern Recognit. 2017;65:197–210. [Google Scholar]
- 225.Das A., Panda R., Roy-Chowdhury A. 2015 IEEE International Conference on Image Processing (ICIP) 2015. Active image pair selection for continuous person re-identification. [Google Scholar]
- 226.Wang H., Gong S., Zhu X., Xiang T. European Conference on Computer Vision. 2016. Human-in-the-loop person re-identification. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.






