Study on virtual tooth image generation utilizing CF-fill and Pix2pix for data augmentation

Soo-Yeon Jeong; Eun-Jeong Bae; Hyun Soo Jang; SeongJu Na; Sun-Young Ihm

doi:10.1038/s41598-024-78190-z

. 2024 Nov 5;14:26772. doi: 10.1038/s41598-024-78190-z

Study on virtual tooth image generation utilizing CF-fill and Pix2pix for data augmentation

Soo-Yeon Jeong ¹, Eun-Jeong Bae ², Hyun Soo Jang ³, SeongJu Na ⁴, Sun-Young Ihm ^4,^✉

PMCID: PMC11538245 PMID: 39501064

Abstract

Traditional dental prosthetics require a significant amount of work, labor, and time. To simplify the process, a method to convert teeth scan images, scanned using an intraoral scanner, into 3D images for design was developed. Furthermore, several studies have used deep learning to automate dental prosthetic processes. Tooth images are required to train deep learning models, but they are difficult to use in research because they contain personal patient information. Therefore, we propose a method for generating virtual tooth images using image-to-image translation (pix2pix) and contextual reconstruction fill (CR-Fill). Various virtual images can be generated using pix2pix, and the images are used as training images for CR-Fill to compare the real image with the virtual image to ensure that the teeth are well-shaped and meaningful. The experimental results demonstrate that the images generated by the proposed method are similar to actual images. In addition, only using virtual images as training data did not perform well; however, using both real and virtual images as training data yielded nearly identical results to using only real images as training data.

Keywords: Data augmentation, GAN (generative adversarial network), Image inpainting, CR-Fill, Pix2pix

Subject terms: Health care, Engineering

Introduction

In the fabrication of dental prosthetics, a wax-like material is commonly utilized to accurately replicate the morphology of the original tooth¹. This method is labor-intensive and time-consuming; because it is performed manually, the results can vary depending on the person making the prosthesis, and it takes a considerable amount of time and discomfort for the patient to adapt and adjust to the prosthesis². To simplify this process, a new method using an intraoral scanner and a 3D printer has emerged. The oral cavity was photographed with a scanner and converted into a 3D image, and the teeth were designed using computer-aided design (CAD) software. This method improves prosthesis precision while also reducing user discomfort caused by bruxism. With the introduction of fourth-generation industrial technology, the process of fabricating dental prostheses is changing to a digital one³. Furthermore, several studies in the field of dentistry have used deep learning to automate the process of fabricating dental prostheses and applying them to dental treatment. Chen et al. proposed a model for generating images of corrected teeth using styleGAN as a prediction model from images of patients with exposed teeth to show visual results to patients prior to orthodontic treatment⁴. Shen et al. aimed to show predicted result images that considered the actual patient’s oral structure rather than just the aligned teeth⁵. They proposed a multi-modal encoder-decoder-based generative model that can show the patient’s orthodontic results more realistically by inputting the patient’s facial image and 3D scanned teeth. Gu et al. proposed a CGAN-based image-generation technique for dental fissures by considering the occlusion of the upper and lower teeth⁶. Chau et al. conducted a study to generate teeth using various generative adversarial network (GAN)-based algorithms, considering the occlusion and dentition of missing teeth during tooth restoration⁷.

However, research on the application of deep learning to the generation of virtual tooth images is limited. Sufficient tooth imaging data are required to conduct such studies. In the medical field, sensitive patient information places many restrictions on the data available for research. In particular, teeth are not readily available because they can be used to identify individuals and are thus sensitive. It is difficult to get enough data for training, and this is true in many medical fields.

In this study, we propose a virtual tooth image generation technique that considers the oral structure and tooth sulcus features to compensate for the limited data available in the dental field. First, we selected an image-to-image translation method (pix2pix) to generate virtual tooth images considering the oral structure. GANs have been widely used in the field of data generation^8,9, and it is possible to overcome limited medical data by generating virtual tooth images. As an extension of GAN, pix2pix can generate virtual images from sketch images. In addition, pix2pix can be used to generate multiple-tooth images because it generates virtual images from sketches. To extract detailed tooth sulcus features, contextual reconstruction fill (CR-Fill), a GAN-based inpainting technique, was used.

Therefore, in this study, we propose a method for generating virtual tooth images by inputting actual tooth images into pix2pix and using them to paint areas of missing teeth. This method aims to generate meaningful teeth that appropriately reflect the tooth features in missing tooth areas. Image evaluation metrics and expert evaluations were used to verify the effectiveness of the generated virtual images.

The remainder of this paper is organized as follows: Section "Related works" describes the image generation models, where pix2pix is introduced and the image inpainting is described. In Section "Virtual Tooth image generation model", we describe the proposed method and the virtual tooth image generation model. In Section "Experiment and results", we measure the performance of the proposed method and discuss the results. Finally, Section "Conclusion and Future works" presents conclusions and directions for future research.

Related works

This section describes the image generation and inpainting models. The difference is that the image generation model creates an entire new image based on the conditions provided, whereas image inpainting focuses on restoring or modifying a specific part of the image. Section "Pix2pix" describes the image generation model and pix2pix for creating images. Section "Image inpainting" describes the image inpainting.

Pix2pix

Image generation models use deep learning techniques to generate new images based on the conditions provided. Typical models include pixelCNN, variational autoencoder (VAE), and GAN. First, pixelCNN is a model that learns and predicts the next pixel based on the previous pixel¹⁰. It uses a probability distribution-based autoregressive method to generate each pixel sequentially. VAE is a variant of the autoencoder, which consists of an encoder and a decoder to compress and restore the features of the input data and minimize the difference between the input image and the output image¹¹. VAE transforms and recovers the latent vector by adding a normal distribution term to the latent vector of the existing autoencoder. GANs, the most popular generative model, learn data through a competition between generators that generate data and discriminators that distinguish data¹². The model consists of a generator that generates data and a discriminator that discriminates the true data. When the generator generates data, it generates data by learning to make it difficult for the discriminator to distinguish between the generated and real data. The discriminator is trained to better distinguish between generated and real data. There are many different GANs that have been studied to solve the problem with different focuses^13,14.

With the advent of GAN, it has been applied to many generation models in various ways to solve various problems. Among them, pix2pix is a representative model of a conditional GAN (CGAN) for image-to-image translation¹⁵. The CGAN uses conditional information in the training process, which can be a label or another form of data¹⁶. Pix2pix uses sketch images as input instead of noise to generate images, which differs from the existing GAN method and can generate various images in the same framework. In addition, the results of existing models cannot be controlled; however, they can be controlled by adding conditional information. U-Net, the constructor used in pix2pix, is an encoder-decoder structure with a skip connection added to form the encoder and decoder layers symmetrically. Skip connections are network structures used in deep learning in which input data are connected directly to the output layer by skipping multiple layers of the network. By using skip connections, you can keep the detailed information as well as the key information. The pix2pix is based on the GAN architecture, which consists of a generator and discriminator in the image generation learning process. Therefore, various GANs can be applied to the loss function of pix2pix. In this study, we aim to compare the results of generating virtual tooth images by applying GAN, WGAN, and LSGAN to the loss of pix2pix.

Image inpainting

Currently, image inpainting is used in many fields, such as computer vision, photo retouching, and video production, and remains an active area of research^17,18. Image inpainting refers to the natural restoration of damaged or missing parts in an image¹⁹. Image inpainting methods use the surrounding context of a corrupt region to predict and generate the region.

Previous image inpainting techniques were mostly machine learning algorithms based on statistical probability; however, various methods and algorithms have been developed using deep learning technology. In particular, neural network architectures, such as VAEs and GANs, are being used for image inpainting, similar to previous image generation models. The VAEs model data in a probabilistic latent space can be used to generate or reconstruct images^20,21. VAEs use probabilistic latent spaces to reconstruct images while accounting for diversity. GANs have a generator that is developed to make the image look like a real object and a discriminator that is trained to better distinguish between real and fake images. However, this ambiguity can result in blurry or distorted results because there is no correct answer when inpainting an image to fill in the damaged areas. VAE-based image inpainting techniques also have the disadvantage of diluting features in the process of representing different representations in the latent space, which can result in blurry results and lower image quality²².

The diffusion-based method, which has attracted considerable attention owing to its excellent performance, diffuses information from neighboring pixels to fill in the damaged area from the outer border to the inner area; however, it has the disadvantage that it is difficult to depict complex textures and details^23,24. Therefore, it is necessary to develop an appropriate model to solve these problems. The patch-based method finds and fills in on a patch-by-patch basis from the undamaged part to the damaged part of the image, which is effective in restoring small damaged areas²⁵. It also highlights local characteristics by dividing the image into regions and is cost-effective by dividing a large image into small patches. However, patch-based methods can import inappropriate regions, resulting in artifacts in the inpainted results.

To solve the aforementioned problems, a CR-Fill model was proposed to fetch the appropriate regions²⁶. CR-Fill proposes a new loss function known as contextual reconstruction to understand the context of an entire image. In CR-Fill, the CR loss minimizes L1 and adversarial losses so that the best region can be referenced. CR-Fill is based on a GAN and consists of a generator and discriminator. The generator is similar to DeepFillv2²⁷, however it is characterized by the fact that it does not have a contextual attention layer and has CR loss; therefore, it is not involved in image generation such as the CA layer but is only used for learning. Considering the quality of the image data in this study, the CR-Fill was applied to the missing tooth area for inpainting.

Virtual tooth image generation model

In this study, we proposed a GAN-based virtual tooth image generation model. A block diagram of the proposed method is shown in Fig. 1.

The proposed method aims to generate and use virtual tooth images by applying deep learning to dental management systems. The scenario for the proposed method is as follows: There are attempts to apply deep learning in the dental field^28,29. It was assumed that a management system was used in several dental clinics. The server contains a database with users’ medical records and a virtual tooth image generation model. The generation model uses existing images of the patient’s teeth to enlarge the data. The generation model consists of pix2pix and CR-Fill. By organizing it in this manner, even if a dental clinic lacks available images, it is possible to share a generation model with good performance using virtual images. In addition, as mentioned in the previous section, the use of virtual tooth generation models can reduce the time and labor required for prosthesis production.

The tooth images used in this study were obtained using an intraoral scanner. Panoramic CT images combine 2D images to show the entire tooth and oral region, which can result in distortion or inaccurate representation of certain structures. In addition, panoramic CT images may have difficulty representing the depth and detail of structures, particularly when they show inaccurate anatomical features, which limits the ability of dental professionals to obtain accurate anatomical information. Unlike panoramic CT images, newer intraoral scanners have the advantage of reproducing actual tooth geometry with a high degree of accuracy. They use 3D scanning technology to build a digital model that provides dental professionals with a more accurate picture of the patient’s actual anatomy. intraoral scanners are less cumbersome for patients and reduces scanning time, making it less burdensome for both patients and doctors due to its speed and cost-effectiveness.

Figure 2 shows the model process for generating virtual tooth images in the proposed method using pix2pix and CR-Fill. The proposed method first uses pix2pix to generate a virtual image from an actual tooth image. The pix2pix model was trained using the existing real tooth images. The sketch data of the image was then input into the pix2pix model. Sketch data were generated by extracting the outline of the tooth image using the python’s cv2 module. Therefore, multiple virtual tooth images could be generated using the same sketch data. CR-Fill was then trained to generate teeth again by inpainting the image with areas where the teeth were erased. We want to use CR-Fill to preserve details such as fissures in the teeth. CR-Fill was trained using a real tooth image and a virtual tooth image generated using pix2pix. To generate a virtual image of the desired tooth, the tooth was removed from the real image, and a virtual tooth image was generated using the CR-Fill.

Fig. 2 — Process of virtual tooth image generation model.

Generating various virtual tooth images using Pix2pix

Figure 3 shows the process of applying pix2pix to a real tooth image to generate a virtual image.

First, we generated sketch data to apply pix2pix to the existing images of real teeth. The pix2pix is based on the GAN architecture, which consists of a generator and discriminator. In this study, we experimented with GAN, LSGAN, and WGAN losses to obtain better results. In this section, a GAN is used for simplicity.

The overall pix2pix process is shown in Fig. 3. Pix2pix is composed of a generator and discriminator, similar to a conventional GAN. The generator learns to render the discriminator unable to distinguish between real and fake images, whereas the discriminator learns to distinguish fake images well. Pix2pix is based on the CGAN but differs from the existing model in that it includes Inline graphic , which represents conditional information. The loss of the CGAN for generator G and discriminator D is expressed as Eq. (1) below, where represents the conditional information, represents random noise, and represents the target label.

In our previous study, we applied dropout instead of Inline graphic and explained that traditional losses are more effective. Because L1 shows less blurred results than L2, the final loss for pix2pix is the same as that in Eq. (2) as follows:

Image inpainting of missing tooth areas using CR-fill

It removes a single specified tooth from the virtual tooth image generated using pix2pix and the existing data and fills in the removed tooth during image inpainting. CF-Fill has a CR loss instead of the CA layer in the DeepFillv2 structure. As shown in Fig. 4, the CR loss is not involved in image generation but is only used for training the generator.

The CR loss function consists of an auxiliary encoder-decoder and a similarity encoder. The L1 and adversarial losses of the auxiliary result, which consists of image patches from known regions, were minimized so that the network could find the optimal reference region. The similarity encoder encodes the similarity between image patches. The similarity between patches was calculated using cosine similarity. The auxiliary encoder-decoder generates the resulting image by filling in the missing regions based on the similarity calculated by the similarity encoder. Patch replacement occurs in the image extracted from the auxiliary encoder based on the similarity of the similarity encoder, which can be viewed as a similarity filter for all patches in missing regions. Following patch replacement, the auxiliary image is passed through an auxiliary decoder to generate an auxiliary image.

The CR-Fill generator is a convolutional encoder-decoder type network that includes coarse and refinement networks. First, the coarse network generates a predicted image using the incomplete image and mask of the missing area as input. The refinement network generates the inpainting results as inputs to the refinement network. In the discriminator, we used the SN-PatchGAN discriminator with spectral normalization applied to the PatchGAN. The discriminator loss is expressed by Eq. (3) as follows:

where Inline graphic denotes the discriminator, denotes the generator, and denotes the actual image. represents the incomplete image and represents the mask, which is the missing region.

Experiment and results

In this section, experiments were conducted to verify whether the virtual tooth images generated by the proposed method can replace real images. It was examined whether the empty tooth areas in the resulting images were well-generated using the corresponding tooth features.

All experiments were conducted in accordance with relevant guidelines and regulations to ensure ethical procedures. Furthermore, informed consent was obtained from all subjects and/or their legal guardians. Subjects were adequately informed about the research purpose, procedures, and potential risks before providing consent.

Data sets and evaluation measures

To train pix2pix, 121 real tooth images with dimensions of 512 × 512 were used. To train the CR-Fill, 80 real tooth images were used that were completely different from the pix2pix images.

The generated images were evaluated to demonstrate the performance of the proposed method. Frechet inception distance (FID), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) were used to evaluate the model. First, the FID is one of the most popular metrics currently used to compare the generated model’s image to the true image. FID is a method used to measure the similarity between the distributions of the generated and original images. The FID score is expressed by Eq. (4). FID is assumed to be a comparison of two Gaussian distributions, where Inline graphic is the distribution of the generated image and is the distribution of the actual image. The FID is a measure of the distance between two distributions. is the sum of the main diagonals in the matrix. The lower the score, the more similar the two distributions are indicating that the model is performing well.

The PSNR is a metric used to evaluate the amount of quality loss in an image, with higher values indicating less loss and better quality. The mean square error (MSE) is a method for calculating the error between images, and the PSNR has MSE in the denominator. This is expressed as follows:

SSIM evaluates the correlation coefficient between two images in terms of luminance, contrast, and structure. This is expressed in Eq. (7). Inline graphic are the images to be compared, and are weights representing the importance of luminance, contrast, and structure, respectively. The SSIM ranges from 0 to 1, with higher values indicating greater similarity between the two images.

Results of the experiments

The FID, PSNR, and SSIM were measured to ensure that the virtual image was generated by reflecting the features of each tooth without losing its shape. In the proposed method, the result of pix2pix is the training image used to train CR-Fill. Therefore, the resulting image of pix2pix must be of high quality to show good results in CR-Fill; therefore, we examined the result of pix2pix based on the GAN. The models used in the experiment are GAN, LSGAN, and WGAN^30,31. In addition, to demonstrate the effectiveness of virtual tooth images, we trained two datasets: one consisting of only real tooth images and one containing virtual tooth images as inpainting images.

Experiment 1: performance variation of pix2pix based on various GANs

Figure 5 shows a visual comparison of the images resulting from training pix2pix with the GAN, LSGAN, and WGAN. Figure 5 shows an input image, a sketch image of that image, and an image generated by the GAN, LSGAN, and WGAN from the sketch image. In the case of the GAN, the image visually appears less complete, with unfilled pixels in the middle. The WGAN had noise at the edges of the tooth image, which appeared to be less detailed than the other images. On the contrary, the LSGAN produced a smoother image and showed little noise or empty pixels. However, we found that WGANs and GANs were better at representing detailed features, such as pits and fissures.

Fig. 5 — Compare pix2pix visual results based on GANs.

Table 1 lists the quantitative evaluation results of training pix2pix with the GAN, LSGAN, and WGAN. The bold text indicates good values for each measure. The FID, PSNR, and SSIM metrics show that the LSGAN performs the best, whereas the WGAN performs poorly. LSGANs have a continuous range of loss values and are differentiable, which indicates that they produce smoother images with relatively less noise. Therefore, the LSGANs appear to perform better in quantitative evaluations.

Table 1.

Quantitative evaluation results of pix2pix based on GANs.

Method	FID	PSNR	SSIM
GAN	34.82	16.57	0.728
LSGAN	32.14	31.59	0.940
WGAN	47.80	15.85	0.692

Open in a new tab

Pix2pix’s visual and quantitative evaluations show that LSGANs produce visually stunning results with smooth representations of images and few noise or empty pixels. They also perform well in quantitative evaluations. Based on these results, we believe that the LSGANs are suitable candidates as training data for image inpainting.

Experiment 2: performance variation of CR-fill according to dataset

To verify the usefulness of the virtual images, we trained CR-Fill using various training data configuration methods. The mandibular right first molar was selected as the virtual tooth to be generated from among the 32 teeth, and it was removed from the image generated by pix2pix prior to the in-painting process. It was selected as the sample for this study because it has a large occlusal surface and a higher proportion of prosthetics than other teeth³².

The virtual tooth used here is a pix2pix result image generated using LSGAN that scored highly in quantitative evaluation and has an epoch of 8,000. In addition, the real tooth images used in CR-Fill and pix2pix were completely different in terms of the angle and shape of the tooth.

The training data were organized as follows, using the ratio of physical to virtual images:

When training only on real tooth images
When training on an equal amount of real and virtual tooth images (1:1)
When training on a larger amount of real tooth images compared to virtual tooth images (2:1)
When training only on virtual tooth images

Figure 6 shows the results of inpainting the missing tooth region with CR-Fill based on the dataset. Visual inspection revealed that the overall tooth shape was well represented. Figure 6b shows the details of the fissure. (a) and (c) Are not as good as (b), but they are similar. (d) Is difficult to distinguish whether it is a tooth or not, not only because of its color. Table 2 lists the quantitative evaluation of the CR-Fill results using the FID, PSNR, and SSIM for different dataset configurations. Based on (a), the FID in (b) showed better results, the SSIM was similar, and the PSNR was approximately 0.2 lower. This indicates that the virtual image had some positive impact. (c) Showed a lower result overall, although not by a margin larger than that of (a). (d) Showed a lower result by a large margin than (a). The poor performance of the FID indicated that the difference between the two image distributions was large. For PSNR and SSIM, the difference in values was not significant because they measured the image quality or structural similarity between images. Therefore, except for (b), the distributions were similar in quality and structure to the real image. However, the difference between the distributions and the real image was significant. According to these results, it is difficult to expect satisfactory results when building a dataset using virtual images only. Relying on real images also resulted in lower performance than a combination of virtual images. It can be observed that a combination of virtual images can learn the teeth better than using real images alone. This shows that virtual images have good features for missing teeth and can be utilized if used in the right proportions with real images.

Fig. 6 — Visual comparison of CR-Fill results for the dataset.

Table 2.

Quantitative evaluation results of CR-Fill.

Method	FID	PSNR	SSIM
(a) Real image data	6.066	35.31	0.9874
(b) Real + virtual image data(1:1)	5.848	35.13	0.9874
(c) Real + virtual image data(2:1)	6.492	34.88	0.9868
(d) virtual image data	8.087	33.03	0.9872

Open in a new tab

FID Frechet inception distance, PSNR Peak signal-to-noise ratio, SSIM Structural similarity index measure.

Expert evaluation of CR-fill result images

The virtual tooth generation study was conducted to evaluate if it could replace actual images and to assess its usability. In addition to the FID score, virtual teeth generated by dental morphology experts were evaluated for their clinical utility. Expert evaluations were analyzed by scoring from 1 to 5 based on how closely the tooth shape matched actual images. Figure 7 depicts the materials distributed to experts for evaluation. This evaluation included criteria such as the appearance of the occlusal area, the number of cusps, and the positions of grooves and pits, referring to studies like that of³³.

Fig. 7 — Evaluation sheet images for expert assessment.

Expert evaluations comparing four groups (experiment 2: (a), (b), (c), (d)) against actual images showed that the (b) group yielded the most similar tooth shapes. When examining the occlusal surface of the mandibular first molars, most virtual teeth displayed similar occlusal appearances, but differences were observed in the sharpness of main grooves and the number of cusps. Particularly, main grooves and cusps were more distinct in the (b) group, while virtual teeth generated solely from virtual data showed minimal traces.

These results suggest the meaningfulness of using virtually generated tooth images for training. However, additional research is needed regarding potential differences in results based on the proportion of real to virtual teeth.

Conclusion and future works

In this study, we proposed a method for generating virtual tooth images to supplement restricted data for privacy protection in the dental field. The proposed method uses CR-Fill and pix2pix based on GAN to generate virtual tooth images. First, we checked the pix2pix image experimentally and observed that GANs had unfilled pixels, whereas WGANs showed noise at the image’s edges. LSGAN outperformed other methods based on visual and quantitative evaluations; therefore, we used the resulting image as training data for image inpainting. Despite training with less data, the image inpainting results trained with the LSGAN’s resulting images showed good results, demonstrating the feasibility of using virtual tooth images. While it is difficult to use only virtual images as training data, we identified that using virtual images appropriately helps to generate good overall tooth features. We also identified that these methods could learn nearly as well as those trained on real images alone.

The reason for the favorable outcome observed in combining real and virtual dental images lies in the presence of damaged teeth, such as erosion or wear, in the actual dental images. These images, when exclusively trained on real dental images, exerted an influence. Additionally, since the virtual images are created by learning the features of the corresponding teeth, the combination of virtual teeth produced images with better tooth features than the method trained on real teeth alone.

The results indicate that using real and virtual images in an appropriate ratio for training can result in not only data augmentation but also improved model performance. In addition, it has been shown to be applicable in the medical field by addressing issues caused by data acquisition constraints and supplementing the lack of data in the dental field. In the future, if the shape of the teeth can be accurately reproduced, it may be used in various artificial intelligence fields.

While this study demonstrates promising results, further research is necessary to represent more complex and accurate tooth shapes. In future work, we plan to compare the proposed method with other image-generation techniques and explore approaches for reproducing more sophisticated dental structures. A key limitation of this study is its reliance on 2D images, whereas most modern dental procedures employ 3D patient models for enhanced precision. Moreover, the automatic design of 3D dental prostheses is already possible using digital dental CAD software. Thus, future research should focus on extending this method to 3D models to align more closely with contemporary dental practices and improve the accuracy of prosthesis design. By exploring the models developed in this study and generating virtual tooth images in 3D, we anticipate significant benefits for the field of dentistry, alongside potential applications in other areas of medicine.

Author contributions

All authors contributed to the study conception and design. Methodology, Review and editing of the study, data curation, validation and visualization were performed by S.-Y.J., E.-J.B. and S.-Y.I. Software coding was performed by S.-J.N. and H.-S.J.

Funding

This research was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021R1C1C2011105) and “Regional Innovation Strategy (RIS)” through the National Research Foundation of Korea (NRF) funded by the Ministry of Education(MOE)(2021RIS-004).

Data availability

The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Shamseddine, L., Mortada, R., Rifai, K. & Chidiac, J. J. Marginal and internal fit of pressed ceramic crowns made from conventional and computer-aided design and computer-aided manufacturing wax patterns: An in vitro comparison. J. Prosthet. Dent.116(2), 242–248. 10.1016/j.prosdent.2015.12.005 (2016). [DOI] [PubMed] [Google Scholar]
2.Bessadet, M., Drancourt, N. & El Osta, N. Time efficiency and cost analysis between digital and conventional workflows for the fabrication of fixed dental prostheses: A systematic review. J. Prosthet. Dent.10.1016/j.prosdent.2024.01.003 (2024). [DOI] [PubMed] [Google Scholar]
3.Farah, R. F. I. & Alresheedi, B. Evaluation of the marginal and internal fit of CAD/CAM crowns designed using three different dental CAD programs: A 3-dimensional digital analysis study. Clin. Oral Invest.27(1), 263–271. 10.1007/s00784-022-04720-6 (2023). [DOI] [PubMed] [Google Scholar]
4.Chen, B., Fu, H., Zhou, K. & Zheng, Y. OrthoAligner: Image-based teeth alignment prediction via latent style manipulation. IEEE Trans. Vis. Comput. Graph.10.1109/TVCG.2022.3166159 (2022). [DOI] [PubMed] [Google Scholar]
5.Shen, F. et al. OrthoGAN: High-precision image generation for teeth orthodontic visualization. 10.48550/arXiv.2212.14162 (2022).
6.Gu, Z., Wu, Z. & Dai, N. Image generation technology for functional occlusal pits and fissures based on a conditional generative adversarial network. PLoS ONE18(9), e0291728. 10.1371/journal.pone.0291728 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chau, R. C. W. et al. Artificial intelligence-designed single molar dental prostheses: A protocol of prospective experimental study. PLoS ONE17(6), e0268535. 10.1371/journal.pone.0268535 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Han, C. et al. GAN-based synthetic brain MR image generation. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 734–738. 10.1109/ISBI.2018.8363678 (2018)
9.Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, C. Bagan: Data augmentation with balancing GAN. 10.48550/arXiv.1803.09655 (2018).
10.VanDenOord, A., Kalchbrenner, N. & Kavukcuoglu, K. Pixel recurrent neural networks. In International Conference on Machine Learning PMLR 1747–1756 (2016).
11.Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. 10.48550/arXiv.1312.6114 (2013).
12.Goodfellow, I., et al. Generative adversarial nets. Advances in neural information processing systems27. 10.48550/arXiv.1406.2661 (2014).
13.Mao, X. et al. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision 2794–2802. 10.1109/ICCV.2017.304 (2017).
14.Arjovsky, M., Chintala, S., Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning PMLR vol. 70, 214–223. 10.48550/arXiv.1701.07875 (2017).
15.Isola, P., Zhu, J. Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1125–1134. 10.1109/CVPR.2017.632. (2017)
16.Mirza, M. & Osindero, S. Conditional generative adversarial nets. 10.48550/arXiv.1411.1784 (2014).
17.Agarwal, C., Bhatnagar, C. & Mishra, A. Evaluation of image inpainting methods for face reconstruction of masked faces. In 2023 International Conference on Electrical, Communication and Computer Engineering (ICECCE) 1–6. 10.1109/ICECCE61019.2023.10442807 (2023).
18.Li, X. et al. Leveraging inpainting for single-image shadow removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision 13055–13064. 10.48550/arXiv.2302.05361 (2023).
19.Bertalmio, M., et al. Image inpainting. In Proceedings of the 27th Annual Conference on COMPUTER GRAPHICS and Interactive Techniques 417–424. 10.1145/344779.344972 (2000).
20.Zheng, C., Cham, T. J. & Cai, J. Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 1438–1447. 10.1109/CVPR.2019.00153. (2019)
21.Zhao, L. et al. Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5741–5750. 10.1109/CVPR42600.2020.00578. (2020)
22.Zhao, S., Song, J. & Ermon, S. Towards deeper understanding of variational autoencoding models. 10.48550/arXiv.1702.08658 (2017).
23.Saharia, C. et al. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings 1–10. 10.1145/3528233.3530757 (2022).
24.Lugmayr, A. et al. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 11461–11471. 10.48550/arXiv.2201.09865 (2022).
25.Demir, U. & Unal, G. Patch-based image inpainting with generative adversarial networks. 10.48550/arXiv.1803.07422 (2018).
26.Zeng, Y., Lin, Z., Lu, H. & Patel, V. M. Cr-fill: Generative image inpainting with auxiliary contextual reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision 14164–14173. 10.1109/ICCV48922.2021.01390 (2021).
27.Yu, J., et al. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision 4471–4480. 10.1109/ICCV.2019.00457 (2019).
28.Elsayed, A. et al. Oral dental diagnosis using deep learning techniques: A review. In Annual Conference on Medical Image Understanding and Analysis 814–832. 10.1007/978-3-031-12053-4_60 (2022).
29.Schwendicke, F., Golla, T., Dreher, M. & Krois, J. Convolutional neural networks for dental image diagnostics: A scoping review. J. Dent.91, 103226. 10.1016/j.jdent.2019.103226 (2019). [DOI] [PubMed] [Google Scholar]
30.Wang, Z., She, Q. & Ward, T. E. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput. Surv. (CSUR)54(2), 1–38. 10.1145/3439723 (2021). [Google Scholar]
31.Lucic, M., Kurach, K., Michalski, M., Gelly, S. & Bousquet, O. Are GANs created equal? A large-scale study. Adv. Neural. Inf. Process. Syst.31. 10.48550/arXiv.1711.10337 (2018).
32.Kikuchi, H., Hasegawa, Y. & Kageyama, I. The relationship of tooth crown dimensions between first molar and central incisor in maxilla. Odontology111(4), 1003–1008. 10.1007/s10266-023-00795-z (2023). [DOI] [PubMed] [Google Scholar]
33.Bae, E. J., Jeong, J. H., Son, Y. S. & Lim, J. Y. A study on virtual tooth image generation using deep learning - based on the number of learning. J. Technologic Dentistry42(1), 1. 10.14347/kadt.2020.42.1.1 (2020).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request.

[CR1] 1.Shamseddine, L., Mortada, R., Rifai, K. & Chidiac, J. J. Marginal and internal fit of pressed ceramic crowns made from conventional and computer-aided design and computer-aided manufacturing wax patterns: An in vitro comparison. J. Prosthet. Dent.116(2), 242–248. 10.1016/j.prosdent.2015.12.005 (2016). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Bessadet, M., Drancourt, N. & El Osta, N. Time efficiency and cost analysis between digital and conventional workflows for the fabrication of fixed dental prostheses: A systematic review. J. Prosthet. Dent.10.1016/j.prosdent.2024.01.003 (2024). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Farah, R. F. I. & Alresheedi, B. Evaluation of the marginal and internal fit of CAD/CAM crowns designed using three different dental CAD programs: A 3-dimensional digital analysis study. Clin. Oral Invest.27(1), 263–271. 10.1007/s00784-022-04720-6 (2023). [DOI] [PubMed] [Google Scholar]

[CR4] 4.Chen, B., Fu, H., Zhou, K. & Zheng, Y. OrthoAligner: Image-based teeth alignment prediction via latent style manipulation. IEEE Trans. Vis. Comput. Graph.10.1109/TVCG.2022.3166159 (2022). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Shen, F. et al. OrthoGAN: High-precision image generation for teeth orthodontic visualization. 10.48550/arXiv.2212.14162 (2022).

[CR6] 6.Gu, Z., Wu, Z. & Dai, N. Image generation technology for functional occlusal pits and fissures based on a conditional generative adversarial network. PLoS ONE18(9), e0291728. 10.1371/journal.pone.0291728 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Chau, R. C. W. et al. Artificial intelligence-designed single molar dental prostheses: A protocol of prospective experimental study. PLoS ONE17(6), e0268535. 10.1371/journal.pone.0268535 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Han, C. et al. GAN-based synthetic brain MR image generation. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 734–738. 10.1109/ISBI.2018.8363678 (2018)

[CR9] 9.Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, C. Bagan: Data augmentation with balancing GAN. 10.48550/arXiv.1803.09655 (2018).

[CR10] 10.VanDenOord, A., Kalchbrenner, N. & Kavukcuoglu, K. Pixel recurrent neural networks. In International Conference on Machine Learning PMLR 1747–1756 (2016).

[CR11] 11.Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. 10.48550/arXiv.1312.6114 (2013).

[CR12] 12.Goodfellow, I., et al. Generative adversarial nets. Advances in neural information processing systems27. 10.48550/arXiv.1406.2661 (2014).

[CR13] 13.Mao, X. et al. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision 2794–2802. 10.1109/ICCV.2017.304 (2017).

[CR14] 14.Arjovsky, M., Chintala, S., Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning PMLR vol. 70, 214–223. 10.48550/arXiv.1701.07875 (2017).

[CR15] 15.Isola, P., Zhu, J. Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1125–1134. 10.1109/CVPR.2017.632. (2017)

[CR16] 16.Mirza, M. & Osindero, S. Conditional generative adversarial nets. 10.48550/arXiv.1411.1784 (2014).

[CR17] 17.Agarwal, C., Bhatnagar, C. & Mishra, A. Evaluation of image inpainting methods for face reconstruction of masked faces. In 2023 International Conference on Electrical, Communication and Computer Engineering (ICECCE) 1–6. 10.1109/ICECCE61019.2023.10442807 (2023).

[CR18] 18.Li, X. et al. Leveraging inpainting for single-image shadow removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision 13055–13064. 10.48550/arXiv.2302.05361 (2023).

[CR19] 19.Bertalmio, M., et al. Image inpainting. In Proceedings of the 27th Annual Conference on COMPUTER GRAPHICS and Interactive Techniques 417–424. 10.1145/344779.344972 (2000).

[CR20] 20.Zheng, C., Cham, T. J. & Cai, J. Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 1438–1447. 10.1109/CVPR.2019.00153. (2019)

[CR21] 21.Zhao, L. et al. Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5741–5750. 10.1109/CVPR42600.2020.00578. (2020)

[CR22] 22.Zhao, S., Song, J. & Ermon, S. Towards deeper understanding of variational autoencoding models. 10.48550/arXiv.1702.08658 (2017).

[CR23] 23.Saharia, C. et al. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings 1–10. 10.1145/3528233.3530757 (2022).

[CR24] 24.Lugmayr, A. et al. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 11461–11471. 10.48550/arXiv.2201.09865 (2022).

[CR25] 25.Demir, U. & Unal, G. Patch-based image inpainting with generative adversarial networks. 10.48550/arXiv.1803.07422 (2018).

[CR26] 26.Zeng, Y., Lin, Z., Lu, H. & Patel, V. M. Cr-fill: Generative image inpainting with auxiliary contextual reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision 14164–14173. 10.1109/ICCV48922.2021.01390 (2021).

[CR27] 27.Yu, J., et al. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision 4471–4480. 10.1109/ICCV.2019.00457 (2019).

[CR28] 28.Elsayed, A. et al. Oral dental diagnosis using deep learning techniques: A review. In Annual Conference on Medical Image Understanding and Analysis 814–832. 10.1007/978-3-031-12053-4_60 (2022).

[CR29] 29.Schwendicke, F., Golla, T., Dreher, M. & Krois, J. Convolutional neural networks for dental image diagnostics: A scoping review. J. Dent.91, 103226. 10.1016/j.jdent.2019.103226 (2019). [DOI] [PubMed] [Google Scholar]

[CR30] 30.Wang, Z., She, Q. & Ward, T. E. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput. Surv. (CSUR)54(2), 1–38. 10.1145/3439723 (2021). [Google Scholar]

[CR31] 31.Lucic, M., Kurach, K., Michalski, M., Gelly, S. & Bousquet, O. Are GANs created equal? A large-scale study. Adv. Neural. Inf. Process. Syst.31. 10.48550/arXiv.1711.10337 (2018).

[CR32] 32.Kikuchi, H., Hasegawa, Y. & Kageyama, I. The relationship of tooth crown dimensions between first molar and central incisor in maxilla. Odontology111(4), 1003–1008. 10.1007/s10266-023-00795-z (2023). [DOI] [PubMed] [Google Scholar]

[CR33] 33.Bae, E. J., Jeong, J. H., Son, Y. S. & Lim, J. Y. A study on virtual tooth image generation using deep learning - based on the number of learning. J. Technologic Dentistry42(1), 1. 10.14347/kadt.2020.42.1.1 (2020).

PERMALINK

Study on virtual tooth image generation utilizing CF-fill and Pix2pix for data augmentation

Soo-Yeon Jeong

Eun-Jeong Bae

Hyun Soo Jang

SeongJu Na

Sun-Young Ihm

Abstract

Introduction

Related works

Pix2pix

Image inpainting

Virtual tooth image generation model

Fig. 1.

Fig. 2.

Generating various virtual tooth images using Pix2pix

Fig. 3.

Image inpainting of missing tooth areas using CR-fill

Fig. 4.

Experiment and results

Data sets and evaluation measures

Results of the experiments

Experiment 1: performance variation of pix2pix based on various GANs

Fig. 5.

Table 1.

Experiment 2: performance variation of CR-fill according to dataset

Fig. 6.

Table 2.

Expert evaluation of CR-fill result images

Fig. 7.

Conclusion and future works

Author contributions

Funding

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases