Cross-Viewpoint Semantic Mapping: Integrating Human and Robot Perspectives for Improved 3D Semantic Reconstruction

. 2023 May 27;23(11):5126. doi: 10.3390/s23115126

Algorithm 2: Supervoxel-based projection
	Input: Lower-view RGBD image ( $I \in {[0, 255]}^{H \times W \times 3}$ and $D \in {[0, 255]}^{H \times W}$ ), upper-view 3D semantic reconstruction (including point cloud $P_{u p p e r} \in R^{N \times 3}$ and labels $L_{u p p e r} \in {0, 1, \dots, L}^{N}$ , where $L \in N$ denotes the number of semantic labels). Output: Lower-view semantic segmentation ( $S \in {0, 1, \dots, L}^{H \times W}$ )
1	Create a colored 3D point cloud ( $P_{l o w e r}$ ) from the lower-view RGB (I) and depth image (D).
2	Downsample the point cloud ( $P_{l o w e r}$ ) and keep track of each point’s original location.
3	Convert the lower-view point cloud ( $P_{l o w e r}$ ) to a voxel grid ( $V_{l o w e r}$ ).
4	Utilize SLIC on the voxel grid ( $V_{l o w e r}$ ), using masking to obtain supervoxels ( $S V$ ).
5	Match the lower-view point cloud ( $P_{l o w e r}$ ) with the upper-view 3D semantic reconstruction ( $P_{u p p e r}, L_{u p p e r}$ ), determining the semantic label ( $L_{l o w e r}$ ) of each supervoxel ( $S V$ ).
6	Project the semantic labels of the lower-view point cloud ( $L_{l o w e r}$ ) onto the image plane to obtain the semantic segmentation (S).