Abstract
A novel method to estimate the 3D location of a circular feature from a 2D image is presented and applied to the problem of objective dietary assessment from images taken by a wearable device. Instead of using a common reference (e.g., a checkerboard card), we use a food container (e.g., a circular plate) as a necessary reference before the volumetric measurement. In this paper, we establish a mathematical model formulating the system involving a camera and a circular object in a 3D space and, based on this model, the food volume is calculated. Our experiments showed that, for 240 pictures of a variety of regular objects and food replicas, the relative error of the image-based volume estimation was less than 10% in 224 pictures.
I. Introduction
The accurate measurement of food intake in real-world settings is extremely important in determining overall diet composition, its contribution to weight gain, and thus its role in the development of obesity and obesity-associated diseases (e.g., diabetes and heart disease). Recently, image-based dietary analysis has become an active area of research, since food photographs can be easily acquired by a hand-held device, such as a cell phone or a tablet computer. After the food items in photographs are manually or automatically identified and their volumes are determined, their caloric and nutrient contents can be calculated using on-line databases (e.g., the USDA Food and Nutrient Database for Dietary Studies [1]). Software and cell phone apps are being developed to help individuals assess their dietary habits and pursue a healthy diet [2,3]. However, such approaches usually require the active involvement of users in acquiring photos before and after meal and snack consumption, which is burdensome and may disrupt normal dietary habits. Moreover, individuals may hesitate to take pictures of foods in certain social environments. The accuracy and usefulness of the information collected may therefore be questionable. To eliminate these problems, we have developed a small, wearable device (“eButton”) to automate food intake data collection, see Fig.1 and [4]. The device is worn on a shirt or blouse and takes pictures of the scene in front of the wearer at a pre-set rate (e.g., 1 frame per second) [4,5]. All recorded images are saved to a microSD card in the device, and later uploaded to a computer for off-line evaluation of the amounts of foods ingested. An event segmentation algorithm, together with the detection of elliptical object (e.g., the image of the dinning plate) has been developed to facilitate the manual selection of food pictures for dietary analysis [4–6].
Fig. 1.

(a) eButton worn naturally on chest; (b) and (c) typical pictures taken by eButton.
The calculation of the amounts of foods consumed from a 2D image requires a method for determining food volumes. It is well known that 3D volume estimation from a 2D image is impossible if no reference is provided. Typically, a checkerboard card is used for this purpose [2,3]. However, subjects may forget to place the card into the field of view at each eating episode, leading to a loss of data. A better approach would be to use a referent that is usually present in the field of view, such as a circular plate with known diameter. In the free-living environment, the diameter of the dinning plate can be measured before or after food images are taken. We therefore propose a new approach to estimate food volume by determining the 3D location of a circular plate. Although circular objects are very common in daily life, 3D localization of a circular feature from a single image is a non-trivial problem in the field of computer vision [7,8]. In 1992, a novel method was presented by Safaee-Rad and his colleagues [9], in which four transformations were used to facilitate the solution. Actually, only two transforms are necessary if the coordinate system is re-defined as in this paper. We have also proved that only one unique solution of the 3D location exists if the diameter of the circular feature is provided and the circular feature is located on a horizontal plane [10]. In our application, the plate is placed on a horizontal table when the wearer is eating a meal. So food location can be determined from one single image taken by our wearable device.
In this paper, we first introduce our algorithm to determine the 3D location of a circular object. Then, two methods to estimate food volume based on its determined location are proposed. Finally, experimental results are presented which validate the performance of our approach.
II. 3D Location Estimation of a Circular Object
Assuming the camera can be modeled as a pin-hole camera, the relationship between a circular feature (e.g. a circular dining plate) on the object plane and its image on the image plane is illustrated in Fig. 2(a). Given such a perspective projection between a circular feature in 3D space and its 2D image, a 3-D quadric cone is determined by the base (the image of the circular feature) and the vertex (the focus of the camera lens). Then the problem of determining the circular-feature's 3D orientation can be simplified as finding a plane that intersects the cone in a circle.
Fig. 2.
Geometric model of the camera perspective system and schematic diagram of rotational transformations. Origin o is the focus of the camera lens. f =|ooi| is the focal length of the camera. xyz is the camera coordinate system and xiyizi is the image coordinate system. oz is the optical axis of the camera. ox is parallel to the horizontal axis of the image. (a) is the perspective relationship between the circular feature and its image with respect to the camera coordinate system. (b) is the coordinate system after transformation T1 in which the cone is centralized. (c) is another coordinate system after transformation T2 in which the Z' is perpendicular to the object plane.
In the standard coordinate system of camera xoy defined in Fig. 2(a), the base of the cone is defined by the image of the circular feature in the image plane (i.e., an ellipse). Its equation can be described as:
| (1) |
where f is the focal length of the camera.
In this coordinate system, the equation of the cone can be formulated in the following quadratic surface function [11]:
| (2) |
which can also be written as:
| (3) |
The object plane where the circular feature locates can be defined as [11]:
| (4) |
i.e., [lmn] · [xyz]T = p, where (l, m, n) represents the orientation of the object plane with l2 + m2 + n2 = 1 and p is a constant. Thus, determining the orientation of the circular feature is to solve (l, m, n) under the condition that the intersection between the quadric cone and the object plane is a circle.
In order to solve this problem in a simpler way, two rotational transformations are applied. The transform T1 is utilized to centralize the quadric cone, see Fig.2 (b). The relationship between the xyz and XYZ coordinate becomes:
| (5) |
If the three eigen values of matrix Q are defined as λ1, λ2, λ3, T1 can be calculated as:
| (6) |
where [li mi ni]T is the eigen vector corresponding to the eigen value λi, i=1,2,3 [11]. The estimated values for li, mi, ni must satisfy the right-hand rule.
After applying this transformation, the cone equation (3) becomes:
| (7) |
| (8) |
In the new coordinate system XYZ, the equation of the object plane (4) can be re-written as:
| (9) |
where (L, M, N) represents the orientation of the object plane and
| (10) |
So the relationship between the (l, m, n) and (L, M, N) can be easily derived as:
| (11) |
In this system, the cone equation has been simplified, but the plane equation has not. Another transformation T2 is performed to simplify the equation of the object plane by making the Z' axis normal to the object plane, see Fig.2(c).
| (12) |
| (13) |
In the new coordinate system X'Y'Z', the equation of the object plane becomes:
| (14) |
Substituting (12) and (14) into (8), we can get the intersection curve between the quadratic cone and the object plane as:
| (15) |
| (16) |
The necessary and sufficient condition that (16) represents a circle is: A = C, B = 0
| (17) |
| (18) |
Combining (17) and (18) with (10), three equations with three unknowns are obtained. Although there are four symmetrical solutions with respect to the origin of X'Y'Z' frame, it can be uniquely determined in our real application since the camera was worn by a participant and capture pictures when he/she is eating beside a table. In this case, it has been shown that the solutions of L, M, N are in the following form according to the defined coordinate in Fig. 2(a) [10]:
| (19) |
Then the plate orientation (l, m, n) can be calculated according to (11). If L=M =0, it means that the object plane is perpendicular to the optical axis of the camera and the center of the circle is located at the optical axis. In this case, the image of the circular feature is still a circle, and its location can be derived easily, if the diameter of the circular feature is known.
In the X'Y'Z' system, Z' axis is perpendicular to the object plane. Then, the z-distance of the object plane has been derived as [9]:
| (20) |
Here r is the radius of the circular feature and C1 – C4 are parameters determined by the eigen vectors and eigen values of Q:
| (21) |
III. Food Volume Estimation from a Single Image
To estimate food volume, we must define several simple geometric shapes to model food shape (e.g., cuboid, sphere). The volumes of such geometric shapes can be determined by dimensional measurements (e.g., length, width, height, and/or diameter). We propose two methods to calculate the volume of these geometric shapes (see Fig. 3). The first method, called the “point-clicking” method, involves manually clicking points in the image (the red points in Fig. 3(a).) to represent the length, width and height of the object. The length of each dimension is then calculated using derived equations to estimate the distance of two arbitrary points parallel/perpendicular to the plane of the plate [10]. The second method is called the “wireframe-fitting” method. A 3D virtual wireframe in the camera coordinate system is projected onto the 2D plate plane, and the projection in the image is displayed (see Fig. 3(b)). The operator then manually adjusts (drags or zooms) the projection to fit the shape of the food item. Since the volume of this fitted wireframe is known, the volume of the food item can be estimated [12].
Fig. 3.

The measurement of geometric dimensions in images. (a) “point-clicking” method; (b) “wireframe-fitting” method.
IV. Experiments and Results
Two experiments were conducted to validate the feasibility and accuracy of the volume estimation. One regular-shaped object (i.e., LEGO block) and seven food models (Nasco, Fort Atkinson, Wisconsin) were selected as samples to represent different shapes. We used a cuboid to model a LEGO block, a piece of bread and a block of cornbread, a cylinder to model a hamburger, a cone for a glass of milk and half grapefruit, a sphere for an orange, and a half-sphere for a serving of ice cream. The diameter of the circular plate used in these experiments was 260mm. Since the volumes of the food replicas provided by the manufacturer were not accurate, the true volume for each food model (except milk) was determined as the average value over three measurements using the water displacement method [13]. A webcam (Logitech Pro 9000) was used to take pictures of all the samples at 2M-pixel resolution. This camera's intrinsic parameters were pre-calibrated using a checkerboard pattern. Typical pictures are shown in Fig.4. For each picture, a plate detection algorithm was first used to automatically find the boundary of circular plate [6]. Detection results are also shown in Fig. 4. Then an operator was asked to manually select a shape model for the food item and then measure the food volume using our method.
Fig. 4.

Typical pictures taken in the experiment and the detected boundary of the plate shown as a red ellipse in each picture.
A. Volume estimation for a regularly-shaped object
In the first experiment, 107 pictures were taken at 107 different locations for a LEGO block and measured by the “point-clicking” method. For this block, clicking four points is enough for calculating its volume, which is much simpler than manually dragging and fitting it with a virtual wireframe. To identify the error introduced by manually selecting points, two operators were asked to estimate the volume of the LEGO. The mean distance between the camera's optical center and the plate center over the 107 locations was 40.8cm (range: 25.1~68.4cm). Table I shows the results for one expert and one novice. The mean relative error was less than 2%. A student t-test showed there to be no significant difference between measurements made by the two operators.
Table I.
Relative error estimated by an expert and a novice
| Ground Truth | Relative Error (Expert) | Relative Error (Novice) | |
|---|---|---|---|
| Bottom Area | 10.14cm2 | −1.96% ± 3.69% | −2.79% ± 4.16% |
| Height | 7.68cm | 0.46% ± 1.41% | 0.84% ± 1.81% |
| Volume | 77.95cm3 | −1.52% ± 3.64% | −1.96% ± 4.70% |
B. Volume estimation for food replicas
In the second experiment, thirty pictures were taken for each food replica, as well as the LEGO block, at thirty different locations. Their volumes were estimated by both the “point-clicking” and “wireframe-fitting” methods. A total of 240 pictures were tested in our study. The estimated errors for all the samples are listed in Table II and Fig. 5. For the LEGO block, similar accuracy was obtained for each method. However, for the food replicas, the “wireframe-fitting” method produced much better results. For all 240 pictures, the errors in the estimated volumes were less than 25% in 209 pictures using the “point-clicking method”, and less than 10% in 224 pictures using the “wireframe-fitting” method. From Fig. 5, we can also see that the “wireframe-fitting” method is much more robust than the “point-clicking” method, especially for the food items whose dimensional length is difficult to find (e.g., hamburger). The error for the volume of bread is relative large because of its low height (about 1.2cm) and irregular border shape. Notably, the height estimation is difficult to make accurately when the angle between the camera optical axis and the table is large (see bread image in Fig. 4).
Table II.
Relative error for food replicas and a LEGO block
| Ground Truth | Point-Clicking Method | Wireframe-Fitting Method | |
|---|---|---|---|
| LEGO | 77.95 cm3 | −2.92% ± 3.10% | −1.32% ±2.22% |
| Cornbread | 93 cm3 | −20.43 % ± 9.98% | 0.99% ± 2.40% |
| Orange | 152 cm3 | −15.63% ± 6.20% | 0.10% ± 2.86% |
| Hamburger | 308 cm3 | 14.75% ± 19.35% | 0.31% ± 6.32% |
| Bread | 106 cm3 | 15.01% ± 28.50% | 2.85% ± 8.56% |
| Milk | 240 cm3 | −1.45% ± 8.33% | 2.49% ± 1.84% |
| Grapefruit | 272 cm3 | −0.27% ± 12.35% | −4.31% ± 4.10% |
| Ice Cream | 80 cm3 | −19.22% ± 4.6% | −6.58% ± 4.28% |
Fig. 5.

The distribution of relative errors for all the testing samples. Each sample was tested at thirty different locations. The blue circle represents the relative error estimated by point-clicking method, and the red diamond represents the error estimated by wireframe-fitting method.
V. Conclusions
In this paper, we have introduced a novel method to calculate the 3D location of a circular feature and applied this method to the estimation of food volume. We have also developed two model-based approaches to estimate food volume from a single input image given the 3D food locations. A cuboid and seven food replicas on a circular plate have been used in our experiments for performance evaluation. Our results have indicated that, using the “wireframe-fitting” method, the average volume estimation error was less than 10%, calculated using 224 input pictures. This represents a significant improvement over the current self-reporting method which is inaccurate and unreliable.
Footnotes
Research supported by National Institutes of Health grant U01 HL91736.
References
- [1].USDA Food and Nutrient Database for Dietary Studies. Agricultural Research Service; Food Surveys Research Group; Beltsville, MD: 2010. [Google Scholar]
- [2].Thompson FE, Subar AF. Chapter 1. dietary assessment methodology. In: Coulston AM, Rock CL, Monsen ER, editors. Nutrition in the Prevention and Treatment of Disease. Academic Press; San Diego, CA: 2001. pp. 3–30. [Google Scholar]
- [3].Zhu F, Bosch M, Woo I, Kim S, Boushey CJ, Ebert DS, Delp EJ. The use of mobile devices in aiding dietary assessment and evaluation. IEEE J Sel Top Signal Process. 2010;4:756–766. doi: 10.1109/JSTSP.2010.2051471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].eButton: a wearable computer developed by the University of Pittsburgh. available at www.lcn.pitt.edu/eButon.
- [5].Sun M, Fernstrom JD, Jia W, Hackworth SA, Yao N, Li Y, Li C, Fernstrom MH, Sclabassi RJ. A wearable electronic system for objective dietary assessment. J Am Diet Assoc. 2010;110:45–7. doi: 10.1016/j.jada.2009.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Nie J, Wei Z, Jia W, Li L, Fernstrom JD, Sclabassi RJ, Sun M. Automatic detection of dining plates for image-based dietary evaluation. Conf Proc IEEE Eng Med Biol Soc; Buenos Aires, Argentina. 2010. pp. 4312–4315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Shiu YC, Ahmad S. 3D location of circular and spherical features by monocular model-based vision. Proc. of IEEE International Conference on Systems, Man and Cybernetics; Cambridge, MA. 1989. pp. 576–581. [Google Scholar]
- [8].Sheu RD, Bond AH. A generalized method for 3D object location from single 2D images. Pattern Recogn. 1992;25(8):771–786. [Google Scholar]
- [9].Safaee-Rad R, Tchoukanov I, Smith KC, Benhabib B. Three-dimensional location estimation of circular features for machine vision. IEEE Trans.Robot. Autom. 1992;8(5):624–640. [Google Scholar]
- [10].Jia W, Yue Y, Fernstrom JD, Yao N, Sclabassi RJ, Fernstrom MH, Sun M. Image-based estimation of food volume using circular referents in dietary assessment. J. Food Eng. 2012;109(1):76–86. doi: 10.1016/j.jfoodeng.2011.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Bell RJT. An elementary treatise on coordinated geometry of three dimensions. 3rd ed. Macmillan, UK; London: 1944. [Google Scholar]
- [12].Zhang Z, Yang Y, Yue Y, Fernstrom JD, Jia W W, Sun M. Food volume estimation from a single image using virtual reality technology. Proc. IEEE 37th Annual Northeast Biomedical Engineering Conference; Troy, NY. 2011 April 1–3. [Google Scholar]
- [13].Hughes SW. Archimedes revisited: a faster, better, cheaper method of accurately measuring the volume of small objects. Physics Education. 2005;40(5):468–474. [Google Scholar]

