Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 30.
Published in final edited form as: CEA13 (2013). 2013 Oct;2013:75–80. doi: 10.1145/2506023.2506037

Image-Based Food Volume Estimation

Xu Chang 1, He Ye 2, Parra Albert 3, Delp Edward 4, Khanna Nitin 5, Boushey Carol 6
PMCID: PMC5448987  NIHMSID: NIHMS823612  PMID: 28573255

Abstract

In this paper, we propose an extension to our previous work on food portion size estimation using a single image and a multi-view volume estimation method. The single-view technique estimates food volume by using prior information (segmentation and food labels) generated from food identification methods we described earlier. For multi-view volume estimation, we use“Shape from Silhouettes”to estimate the food portion size. The experimental results of our volume estimation methods demonstrate our results with respect to accuracy and reliability.

Keywords: dietary assessment, portion estimation, 3D reconstruction, pose estimation, mobile application

INTRODUCTION

There is a growing concern about chronic diseases and other health problems related to diet including hypertension, obesity, heart disease and cancer. The need for accurate methods and tools to measure food and nutrient intake becomes imperative for epidemiological and clinical research linking diet and disease. Dietary assessment, the process of determining what someone eats during the course of a day, provides valuable insights into addressing these problems. Our studies have shown that the use of mobile telephones can improve the accuracy and reliability of dietary assessment [3, 21]. We are developing an image analysis-based system for dietary assessment, known as the mobile telephone food record (mpFR), to automatically identify and quantify foods and beverages consumed at a eating occasion from images of food acquired using a mobile device [27, 1,7].

Classifying foods in an image poses unique challenges because of the large visual similarity between food classes such as brownie and chocolate cake, margarine and butter. In addition, foods are non-rigid objects that can deform in many ways, and consequently there is also a large variation within classes such as scrambled eggs and boiled eggs, green grapes and red grapes. Appearance variations may also arise from changes in illumination and viewpoint. One difficult aspect of food image analysis is that of estimating the volume of the food present in the image. Estimating the volume is important because without it one cannot determine the nutrient content of the food in the image (e.g. the number of calories in the food item). This paper concentrates on examining volume estimation and assumes that the food item has been properly segmented and classified. We have described several methods for food image segmentation and classification in [27, 2, 10, 26, 9]. In this paper, we propose a volume estimation approach based on geometric constraints and a contextual 3D model. We also describe a multi-view volume estimation method using shape from silhouettes to automatically estimate the food portion size.

RELATED WORK

Food portion size estimation is extremely difficult since many foods have large variations in shape and appearance due to eating or food preparation conditions. Most image-based dietary assessment systems use a single image [23, 5], multiple images [14], video [22], or 3D rangefinding [19]. For example, “DietCam” [15] is a mobile application where automatic food intake assessment is based on images acquired from multiple views. It requires users to acquire three images separated by about 120° which increases user burden. “SapoFitness”[20], a mobile health application for a dietary assessment, provides users a simple program to record their meal. A mobile structured light system (SLS) to measure daily food intake is being developed by Sheng et al. [19]. A laser device which attaches to a mobile telephone is used to capture depth images of the food objects. This system seems burdensome and not suitable for daily use. Jia et al. [12] developed a wearable camera device to collect eating occasion information. It makes use of a known-size plate as the geometric reference. They define several simple geometric shapes to model food shapes and manual adjustment is required. Chen et al. [5] proposed a 3D/2D model-based image registration method for quantitative food intake assessment. The method utilizes a global contour to solve the position, orientation and scale of the user-selected 3D shape model. It obtains reliable food volume estimation for most simple-model food items. However, it does not have a solution for foods that do not fit a simple model (e.g. banana, pear) or complex structured food items (e.g. fries, salad). In addition, it only uses the outline of the object and discards the internal structure (lines, curves, and ridges) of the segments, which could lead to low accuracy in pose registration.

SINGLE VIEW RECONSTRUCTION USING PRIOR KNOWLEDGE

In the early development of our system, we used a shape template method for 3D reconstruction of some food objects [4]. We used the corner points from the segmented image to compute the geometric information for the shape template. However, this method is highly dependent on the accuracy of the segmentation method and the corner points detection is not robust. Moreover, it fails when the food item has a complex and amorphous shape.

We then proposed a volume estimation method [24] that utilized a 3D graphical model of the food object from multiple training images. First, we create a 3D graphical model during the training step using 3D reconstruction from multiple views. Then, for each food image, we determine the pose and scale of each of the food items according to the camera matrix and food segment.

In this paper, we extend our method in [24] by pre-defining the shape models. Instead of reconstructing the 3D models in the training step, we generate several conventional 3D shape models. Some food items can be represented by these conventional shape models (e.g. cylinder, sphere, and cone). Using this prior knowledge can greatly improve the accuracy of the volume estimation for food objects. For other food items whose shape cannot be approximated to a regular shape model, we use a prismatic model to approximate the shape. A prismatic model is based on the assumption that the horizontal cross-section is the same along its height.

For our model-based method (Figure 1) we designed a shape dictionary consisting of pre-defined or pre-built 3D models. Examples of food items along with their 3D models are shown in Figure 2. There are conventional shapes in the shape dictionary such as sphere, cylinder, box, and frustum. However, the food shapes can be quite complex, therefore we pre-build the 3D model for some food shapes (bananas and pears) in the dictionary by using the method described in [24].

Figure 1.

Figure 1

Figure 1

Our single view volume estimation system.

Figure 2.

Figure 2

Figure 2

Shape dictionary: food items with their corresponding 3D shapes.

The conventional shape models can be generated without training. For the sphere, the dimensions (x, y, z) for each voxel can be obtained by Equation 1:

x2+y2+z2radius2 (1)

A cylinder can be represented by Equation 2:

{x2+y2radius2zheight (2)

A square box is defined by Equation 3:

{xminxxmaxyminyymaxzminzzmax (3)

A slice of a cone is represented by Equation 4:

x2+y2(bottom_r+zheight(top_rbottom_r))2 (4)

In our system, we first need to calibrate the camera. A credit card sized colored checkerboard is used as the fiducial marker, which is included in every image as a geometric reference for the scale of the world coordinates and to provide color calibration information [25]. We then estimate the camera pose from the checkerboard and establish the world coordinates. We assume that the food image has been segmented and the food items identified creating a segmentation mask and a food label [2, 26, 9]. The segmentation mask provides the location of a food item and the food label indicates the food identification. A locator M on the segmentation mask is also used to define the 3D coordinate (x, y, z). The locator is chosen in a manner that it is easy to locate on the 2D image and back project to 3D space. For all the regular shape models, the locator is defined to be the lowest point on the segment food image. To make the position of the locator more robust, we draw a vertical line through the centroid of the mask, and the lowest point on the line and the food segment is the locator M as shown in Figure 2. The locator M for the prismatic model or irregular shape model is simply defined as the centroid of the segmentation mask.

In most of the circumstances, this point on the food object will be close to the table cloth or eating surface. Therefore, the 3D coordinate of this point is easy to obtain by using Equation 5:

s[uυ1]=K[R|T][XYZ=01] (5)

where s is a scale factor, u, v are the location of pixel on the image, K,R,T are respectively the intrinsic camera matrix, the rotation vector, and translation vector.

The identification of the food is used to indicate the shape model we shall use. Since the food model is either pre-built or pre-defined, we can convert the 3D reconstruction problem into an optimization problem. The object has 9 degrees of freedom (DOF)

W=(X,Y,Z,ΘX,ΘY,ΘZ,sx,sy,sz)T (6)

Equation 6 consists of the object translation along three coordinate axes, three rotation angles to the axes, and three relative scale parameters. The best match pose and scale parameters of the food item can be found by using Equation7:

W=arg maxW(similarity_measure(Iseg,IprojectW) (7)

where Iseg is the segment of the food item, IprojectW is the projected image using a set of pose or scale parameters. The similarity_measure is a function indicating the similarity between two images. For conventional shapes, we will use the XOR binary operation to find the similarity since we only have the silhouette of the projected object. For reconstructed model [24], we use a normalized cross correlation function since the textured projection image is obtained in this method. The 3D locator M can be used to determine X, Y, Z. The degrees-of-freedom of the pose for different foods is dependent on the 3D shape model. We utilize several geometric constraints and food placement constraints (e.g. a banana will not be standing on its long axis) to solve the pose registration problem [24]. After the pose of a food item is determined, we are able to estimate the volume of the food.

For the prism approximation technique, the base area is computed by rectifying the segmentation mask using the 2D homography matrix obtained from the checkerboard corners. Twelve corner points are detected on the checkerboard and labeled as psrc1,psrc2psrc12 in the following order: top to bottom, left to right. We define the 3 × 3 projective transformation matrix H as a mapping from the corner points in the original image to the rectified image.

pdst=Hpsrc (8)

where pdst is the corresponding corner points in the rectified image. The homographs matrix H has nine elements with only their ratio significant, so the transformation is specified by 8 parameters. We use the Direct Linear Transformation (DLT) method [8] to estimate H. The area of the rectified segment is then used as the base of prismatic shape. We choose an average height for each food type and multiply the area of the base to estimate the food volume.

Once the volume is estimated, the nutrient content of the food is obtained using the density for that particular category of food [13].

VOLUME ESTIMATION USING SHAPE FROM SILHOUETTES

We have also developed a variation of a multi-view shape recovery method - Shape from Carving [16], also known as Shape from Silhouettes. This method attempts to reconstruct a 3D model from a set of contours that outline the projection of an object onto a sequence of 2D image planes.

s[uυ1]=[fx0Cx0fyCy001][r11r12r13t1r21r22r23t2r31r32r33t3][XYZ1] (9)

The ideal image acquisition for using shape from carving are images of an object from various view angles such as images acquired from a turn-table device (see Fig. 3). The video capturing step begins by acquiring a video of the food objects. Empirical evidence indicates that we obtain the best results using 14 to 20 frames from the video sequence.

Figure 3.

Figure 3

Figure 3

Training angles for a food object and the reconstructed 3D model

The intrinsic matrix K and the extrinsic camera parameters R, T are determined for each image. In our case, in order to calibrate the images, each image needs to include the checkerboard in the scene. Since the same camera is used for capturing the sequence of images for multiple-view method, the intrinsic matrix K is identical of each view and can be determined using the camera calibration procedure presented in [8]. After detecting the corners on the checkerboard, the pose of the camera, the rotation vector R and the translation vector T, can be found by minimizing the reprojection error. This reprojection error corresponds to the image distance between the projected 3D corners using the intrinsic matrix K and the extrinsic camera parameters R, T to the location of the detected corners on the image [8]. After determining the camera parameters, each camera image is converted to a binary image using the segmentation mask which indicates the object silhouette using“1”for object pixels and “0” for background. This method requires accurate segmentation. We use morphological operators to clean up the boundary and remove small holes in the segmentation mask.

There are mainly two types of 3D representation: volumetric models and 3D surface grids[6, 11]. We use the volumetric representation of a 3D object. The bounding box of the cleaned object masks obtained from the segmentation masks is then back-projected onto 3D world coordinates using the camera parameters (camera matrix) and Equation 9. In Equation 9, (X, Y, Z) are the coordinates of a 3D point M in world coordinates, (u, υ) are the coordinates of the 2D projection point m in pixels of the image coordinate. Cx, Cy are the coordinates of the principal point (we locate it at the center of the image), fx, fy are the focal lengths in the unit of pixels. ri, ti are the the rotation and translation external parameters that relate the world coordinate system to the camera image coordinates. This equation is used to back-project 2D image pixel to its correspondence 3D world point.

Based on the 3D bounding box, we fill a 3D grid of volume voxels for carving. The next step is to back-project the silhouettes onto the 3D volume voxels one by one. These silhouettes are used to bound the object and carve away any voxels outside the reprojected mask. As the number of the silhouettes increases, the object 3D boundary becomes tighter. After carving out every voxels that does not belong to the 3D object model, we estimate the volume of object by counting how many voxels are left and the size of voxels obtained from the world coordinate.

In the following Section 5, we evaluate both methods, i.e. single view-based reconstruction and multi-view/video-based reconstruction, and compare them with ground truth information.

EXPERIMENTAL RESULTS

The evaluation of our proposed volume estimation methods were done by conducting a experiment with 15 Adolescent participants under controlled conditions. Images of their eating occasions over a 24 hour period were captured with three different types of cameras: Cannon PowerShot S3, Cannon PowerShot SD200, HTC p4351 mobile telephones. A total of 19 types of foods and beverages were weighed and the ground truth weight collected. The ground truth segmentation (i.e. manual segmentation) is used in this experiment because the segmentation noise is relative large in this study since each eating occasion image contains about 7 food items. After the automated volume estimation is completed, we convert it into weight (g) using the food density based on the method derived from [13]. The volume and weight results for the 19 foods is presented in Table 1. The apparent density is “the density of a particle including all pores (porosity) remaining in the material” [13]. The weight percentage error is obtained by |WeWg|/Wg, where We is the estimated volume and Wg is the ground truth volume. As shown in Table 1, most the errors for the foods and beverages which can be approximate using a conventional model (e.g. not an irregular shape) is small (less than 10%). However, the error of the foods which is approximated by prismatic model is relatively high (from 7.4% to 57.3%). The reason for this is that the method using conventional shape models is more precise for foods with regular shapes. The average error for all type of foods is 17.9%. The error of weight estimation is not just coming from the volume estimation, it also comes from the density measurement method. The food density we obtained is “apparent density”. However, for foods such as lettuce(salad), it will be more precise to use a density “when particles are packed or stacked in bulk including void spaces (void fraction)”[13] namely bulk density. Overall, the result is a significant improvement when compared with our previous experimental result described in [17].

Table 1.

Estimated weight for 19 foods items using the estimated volume and apparent density compared with the ground truth weight. The mean and standard deviation of each value is also presented. (n = number of food images that contains a particular food item)

Food name n Apparent
Density
(g/cc)
Estimated vol-
ume (cc ± SD)
Estimated
weight (g ± SD)
Ground truth
weight (g ± SD)
Weight
percentage
error (%)
2% Milk (C) 54 0.973 226.3 ± 19.1 220.2 ± 18.5 220 ± 0.0 0.9
Sausage links (P) 22 0.863 49.9 ± 14.8 43.1 ± 12.8 46.5 ± 1.0 7.3
Scrambled eggs (P) 22 1.123 69.8 ± 30.9 78.4 ± 34.7 61.5 ± 0.7 27.5
Toast (P) 22 0.276 185.2 ± 82.8 51.1 ± 22.9 47.4 ± 3.4 7.8
Garlic bread (P) 15 0.564 98.6 ± 12. 8 55.6 ± 7.2 41.1 ± 3.0 35.3
Chocolate cake w/
icing (SB)
15 0.683 128.5 ± 26.0 87.8 ± 17.8 81.5 ± 12.5 7.7
Sugar cookie (P) 17 0.860 35.9 ± 6.0 30.8 ± 5.2 27.8 ± 1.9 10.8
Spaghetti w/ sauce,
cheese (P)
15 0.670 385.4 ± 62.5 258.2 ± 41.9 240.3 ± 2.6 7.4
Orange juice (C) 22 1.011 124.6 ± 9.1 125.9 ± 9.2 124.0 ± 0.0 1.5
Peach slices (P) 17 0.953 94.8 ± 31.7 90.4 ± 30.2 69.3 ± 9.9 30.4
Pear, canned halves
(P)
15 1.047 80.8 ± 21.4 84. 6 ± 22.4 75.6 ± 4.9 11.9
French fries (P) 17 0.241 230.0 ± 76.6 55.4 ± 18.5 70.5 ± 4.3 21.4
Ketchup (C) 17 1.141 12.7 ± 2.5 14.5 ± 2.8 15.5 ± 0.4 14.7
Lettuce (salad) (C) 15 0.316 259.9 ± 33.7 82.1 ±10.7 48.3 ± 4.8 70.0
Margarine (C) 22 0.957 29.4 ± 6.5 28.1 ± 6.2 27.8 ± 0.6 10.8
French dressing (C) 15 1.108 32.6 ± 3.9 36.1 ± 4.3 35.7 ± 1.0 1.1
Strawberry jam (C) 22 1.307 25.4 ± 6.5 33.2 ± 8.5 21.1 ± 1.1 57.3
Coke (SC) 32 1.027 223.9 ± 13.7 229.9 ± 14.1 227.2 ± 2.3 1.2
Cheeseburger sand-
wich (P)
17 0.598 380.2 ± 99.0 227.3 ± 59.2 198.8 ± 11.5 14.3

We also compared the single-view method with the multi-view method by conducing an experiment with four plastic food items (orange juice, bagel, rice krispy treat and banana). In this experiment, automatic segmentation is used because in each meal image, only one food item is present. We are interested in comparing the single-view and the multi-view volume estimation methods.

Orange juice and a rice krispy treat can be reconstructed using a single view since they have very regular shapes (cylinder and square box). A bagel could also be considered as a regular shaped object, but due to the ambiguity of its color homogeneity, height, and depth information, it cannot be clearly distinguished. Moreover, its textureless uniform color composition does not allow us to use shape information to distinguish height from depth. A banana has a complex shape and there is no regular 3D geometrical template that can be used from the 2D segmentation mask. Orange juice and rice krispy treats are examples of foods where our previous template based approach [4] can be used, whereas bagels and bananas require a more complex model for their volume reconstruction.

We obtained 14 to 20 images for multi-view volume estimation and acquired 35 images per food from various view angles and estimated their corresponding volume using the singe-view method. The results of the estimated volume and estimation error for four food items are shown in Table 2 in terms of milliliters and compared with ground truth volume obtained from a water displacement measurement. The estimation error is determined by |VeVg|/Vg, where Ve is the estimated volume and Vg is the ground truth volume. The volume estimation results for a bagel using the prior-based method are satisfactory, but our multi-view method further improves the volume estimation. We also observed that the single-view method performs better for rice krispy treats since the performance of the multi-view approach depends on the manner of image capture. The multi-view method performs better when we take the images from the side view around the food object.

Table 2.

Comparison of the model based method in[24] and our model based methods for 4 food items. The estimation error is shown in ().

Food Item Multi-
View
Volume
Estima-
tion(ml)
Single
View
Volume
Estima-
tion(ml)
Ground
Truth(ml)
Banana 180.6(5.0%) 183.9(6.9%) 172
Bagel 161.2(7.3%) 157.2(9.7%) 174
Orange
Juice
179.9(10.0%) 221.3((10.7%) 200
Rice
Krispy
Treat
82.8(18.2%) 77.5(10.1%) 70

Overall, the 3D model based method achieves an average volume estimation error of 10%. Given that portion size estimation errors of more than 50% from human observation have been reported in traditional dietary assessment methods [3, 18], our results are promising. Given the method we are using, we would not need a 3D model for each type of food. For example, the cylinder model proposed in the paper can model most of the liquids. Also, the spherical model proposed in the paper can model foods like orange, apple, peach etc. In a real life experiment, where thousands of food types are possible, we would be able to reconstruct most of them with less than 50 models.

CONCLUSION

In this paper we extended our previously reported single view volume estimation method and proposed a multi-view volume estimation method using “Shape from Silhouettes” to automatically estimate the food portion size.

Based on the experimental results, we observed that our single-view volume estimation technique not only improves the volume estimation accuracy for foods with simple shapes, but also provides a quantitative approach to estimate volume for foods with irregular shapes.

For the multi-view volume estimation method, the image sequence must be taken from different viewing angles. The intrinsic and extrinsic camera parameters need to be determined for each image. However, compared with other methods, it appears to be robust to segmentation noise. Furthermore, this approach does not require any prior shape information from the food identification and it may work for many arbitrary shaped food objects (e.g. scrambled eggs, cut carrots). Our next steps will include investigation of stereo views and three view geometric reconstruction.

Acknowledgments

This work was sponsored by grants from the National Institutes of Health under grants NIDDK 1R01DK073711-01A1 and NCI 1U01CA130784-01. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institutes of Health.

Contributor Information

Xu Chang, School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana.

He Ye, School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana.

Parra Albert, School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana.

Delp Edward, School of Electrical and Computer Engineering Purdue University West Lafayette, Indiana.

Khanna Nitin, Department of Electronics and Communication Engineering Graphic Era University Dehradun, India.

Boushey Carol, Cancer Epidemiology Program University of Hawaii Cancer Center Honolulu, Hawaii.

References

  • 1.Bosch M, Schap T, Zhu F, Khanna N, Boushey C, Delp E. Integrated database system for mobile dietary assessment and analysis; Proceedings of the 1st IEEE International Conference Workshop on Multimedia Services and Technologies for E-health in conjunction with the International Conference on Multimedia and Expo; Barcelona, Spain. 2011. Jul, pp. 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bosch M, Zhu F, Khanna N, Boushey C, Delp E. Combining global and local features for food identification and dietary assessment; Proceedings of the International Conference on Image Processing; Brussels, Belgium. 2011. Sep, pp. 1789–1792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Boushey CJ, Kerr DA, Wright J, Lutes KD, Ebert DS, Delp EJ. Use of technology in children’s dietary assessment. European Journal of Clinical Nutrition. 2009 Feb;63:50–57. doi: 10.1038/ejcn.2008.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chae J, Woo I, Kim S, Maciejewski R, Zhu F, Delp E, Boushey C, Ebert D. Volume estimation using food specific shape templates in mobile image-based dietary assessment; Proceedings of the IS&T/SPIE Conference on Computational Imaging IX; San Francisco, CA. 2011. Feb, p. 78730K–1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen H, Jia W, Li Z, Sun Y, Sun M. 3D/2D model-to-image registration for quantitative dietary assessment; Proceedings of 38th Annual Northeast Bioengineering Conference; 2012. Mar, pp. 95–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Curless B, Levoy M. Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM; 1996. Aug, A volumetric method for building complex models from range images; pp. 303–312. [Google Scholar]
  • 7.Daugherty BL, Schap TE, Ettienne-Gittens R, Zhu F, Bosch M, Delp EJ, Ebert DS, Kerr DA, Boushey CJ. Novel technologies for assessing dietary intake: Evaluating the usability of a mobile telephone food record among adults and adolescents. Journal of Medical Internet Research. 2012 Apr;14(2):156–167. doi: 10.2196/jmir.1967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hartley RI, Zisserman A. Multiple View Geometry in Computer Vision. second Cambridge University Press; 2004. [Google Scholar]
  • 9.He Y, Khanna N, Boushey C, Delp E. Snakes assisted food image segmentation; Proceedings of IEEE International Workshop on Multimedia Signal Processing; Banff, Canada. 2012. Sep, pp. 181–185. [Google Scholar]
  • 10.He Y, Xu C, Khanna N, Boushey C, Delp E. Food image analysis: Segmentation, identification and weight estimation; Proceedings of the IEEE International Conference on Multimedia and Expo; San Jose, CA. 2013. Jul, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hill FS, Jr, Kelley SM. Computer Graphics Using OpenGL. third Pearson Press; 2006. [Google Scholar]
  • 12.Jia W, Yue Y, Fernstrom J, Zhang Z, Yang Y, Sun M, et al. 3D localization of circular feature in 2D image and application to food volume estimation; Proceeding of 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2012. Mar, pp. 4545–4548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kelkar S, Stella S, Boushey C, Okos M. Developing novel 3D measurement techniques and prediction method for food density determination. Procedia Food Science. 2011;1:483–491. [Google Scholar]
  • 14.Kong F, Tan J. Dietcam: Regular shape food recognition with a camera phone; Proceedings of the International Conference on Body Sensor Networks; 2011. May, pp. 127–132. [Google Scholar]
  • 15.Kong F, Tan J. Dietcam: Automatic dietary assessment with mobile camera phones. Pervasive and Mobile Computing. 2012;8(1):147–163. [Google Scholar]
  • 16.Kutulakos KN, Seitz SM. A theory of shape by space carving. International Journal of Computer Vision. 2000;38(3):199–218. [Google Scholar]
  • 17.Lee CD, Chae J, Schap TE, Kerr DA, Delp EJ, Ebert DS, Boushey CJ. Comparison of known food weights with image-based portion-size automated estimation and adolescents’ self-reported portion size. Journal of diabetes science and technology. 2012;6(2):428. doi: 10.1177/193229681200600231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schap TRE, Six BL, Delp EJ, Ebert DS, Kerr DA, Boushey CJ. Adolescents in the united states can identify familiar foods at the time of consumption and when prompted with an image 14 h postprandial, but poorly estimate portions. Public Health Nutrition. 1(1):1–8. doi: 10.1017/S1368980010003794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shang J, Duong M, Pepin E, Zhang X, Sandara-Rajan K, Mamishev A, Kristal A. A mobile structured light system for food volume estimation; Proceeding of IEEE International Conference on Computer Vision Workshops; 2011. Nov, pp. 100–101. [Google Scholar]
  • 20.Silva BM, Lopes IM, Rodrigues JJ, Ray P. Sapofitness: a mobile health application for dietary evaluation; Proceeding of 13th IEEE International Conference on e-Health Networking Applications and Services; 2011. Jun, pp. 375–380. [Google Scholar]
  • 21.Six BL, Schap TE, Zhu F, Mariappan A, Bosch M, Delp EJ, Ebert DS, Kerr DA, Boushey CJ. Evidence-based development of a mobile telephone food record. Journal of the American Dietetic Association. 2010 Jan;110(1):74–79. doi: 10.1016/j.jada.2009.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sun M, Fernstrom JD, Jia W, Hackworth SA, Yao N, Li Y, Li C, Fernstrom MH, Sclabassi RJ. A wearable electronic system for objective dietary assessment. Journal American Dietetic Association. 2010 Jan;110(1):45–47. doi: 10.1016/j.jada.2009.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Woo I, Ostmo K, Kim S, Ebert DS, Delp EJ, Boushey CJ. Automatic portion estimation and visual refinement in mobile dietary assessment; Proceedings of the IS&T/SPIE Conference on Computational Imaging VIII; San Jose, CA. 2010. Jan, p. 75330O–1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Xu C, He Y, Khanna N, Boushey C, Delp E. Model-based food volume estimation using 3D pose; Proceedings of the IEEE International Conference on Image Processing; Melbourne, Australia. 2013. Sep, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xu C, Zhu F, Khanna N, Boushey C, Delp E. Image enhancement and quality measures for dietary assessment using mobile devices; Proceedings of the IS&T/SPIE Conference on Computational Imaging X; San Francisco, USA. 2012. Feb, p. 82960Q–1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhu F, Bosch M, Khanna N, Boushey C, Delp E. Multilevel segmentation for food classification in dietary assessment; Proceedings of the 7th International Symposium on Image and Signal Processing and Analysis; Dubrovnik, Croatia. 2011. Sep, pp. 337–342. [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhu F, Bosch M, Woo I, Kim S, Boushey C, Ebert D, Delp E. The use of mobile devices in aiding dietary assessment and evaluation. IEEE Journal of Selected Topics in Signal Processing. 2010 Aug;4(4):756–766,. doi: 10.1109/JSTSP.2010.2051471. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES