Novel Method for Pairing Wood Samples in Choice Tests

Sebastian Oberst; Theodore A Evans; Joseph C S Lai

doi:10.1371/journal.pone.0088835

. 2014 Feb 14;9(2):e88835. doi: 10.1371/journal.pone.0088835

Novel Method for Pairing Wood Samples in Choice Tests

Sebastian Oberst ^1,^*, Theodore A Evans ², Joseph C S Lai ¹

Editor: Lee A Newsom³

PMCID: PMC3925169 PMID: 24551173

Abstract

Choice tests are a standard method to determine preferences in bio-assays, e.g. for food types and food additives such as bait attractants and toxicants. Choice between food additives can be determined only when the food substrate is sufficiently homogeneous. This is difficult to achieve for wood eating organisms as wood is a highly variable biological material, even within a tree species due to the age of the tree (e.g. sapwood vs. heartwood), and components therein (sugar, starch, cellulose and lignin). The current practice to minimise variation is to use wood from the same tree, yet the variation can still be large and the quantity of wood from one tree may be insufficient. We used wood samples of identical volume from multiple sources, measured three physical properties (dry weight, moisture absorption and reflected light intensity), then ranked and clustered the samples using fuzzy c-means clustering. A reverse analysis of the clustered samples found a high correlation between their physical properties and their source of origin. This suggested approach allows a quantifiable, consistent, repeatable, simple and quick method to maximize control over similarity of wood used in choice tests.

Introduction

Choice-tests are perhaps the most common experimental method used to determine preferences of insects, especially for food. A simple ISI search for papers on ‘choice tests’ to determine food preferences of insects found around 600 papers from the past decade. One common use of food choice tests is to determine food additives (e.g. bait substrates, attractants, toxicants) for pest control applications, e.g. for cockroaches [1], [2], moths [3], ants [4], [5] and termites [6].

Choice-tests for food additives in artificial foods are straightforward as the base food matrix is identical across the choices under test. Those for food additives in natural foods are more problematic as the natural foods are often highly variable, thus the palatability of the base matrix may confound the effect of the food additive. Wood is such a variable food, not just between tree species but within any one tree species, owing to the age of the fibre, the horizontal (sapwood versus heartwood) and vertical position within the stem as well as ecological (e.g. growth site and conditions: natural vs. plantation, growth rate) and functional (e.g. stems vs. branches, reaction wood formed in leaning stems and branches and juvenile vs. mature growth) variations [7], [8], [9], [10]. Such variation has been demonstrated to affect termite consumption of wood, especially due to the age of a tree [11], [12], [13]. Other factors identified include moisture [14], [15] or previous termite attack [16].

Hence the interpretation of wood loss in choice experiments [17], [18], [19], [20], [21] may not be straight forward because wood consumption may have differed due to inter-sample palatability as well. Attempts to reduce this natural variation have used sequentially cut wood [22], [23], [24]; however, there is variation in the wood quality within logs cut from the same tree [25], [26], owing to growth increment variation, for example. Complicating this limitation is unknown provenance, when wood for laboratory experiments on termites may have been sourced from retailers. This is most problematic for large experiments that require more wood samples than one retail sourced cut length of timber can provide.

A simple, consistent, repeatable method to characterise wood using easy to measure physical properties would reduce the variability within samples for choice tests, and thus increase confidence in results. The aim of this paper is to test the similarity of wood samples cut sequentially from different Pinus radiata sources, by applying fuzzy c-means clustering [27], [28], to three simple measurements of physical properties: dry weight, moisture absorption and reflected light intensity. Fuzziness in the algorithm allows selecting wood pieces from different clusters which accounts therefore for the uncertainty of material properties that are not measured.

Materials and Methods

Physical Properties of Wood

The wood used was plantation grown P. radiata, cut as veneer into sheets (ca. 1250 mm×25 mm×1 mm), from a retailer; thus the source trees were unknown. Veneer discs (60 mm Inline graphic ) were punched from each sheet and were given a unique identification code, which included the original sheet from which the samples were cut (hence ‘sheet membership’). Only undamaged veneer discs without knotholes and obvious fungal attack were chosen.

Two sets of veneer discs were prepared, for which the veneer was most likely from different trees or even different geographical locations. In any case the two sets represented a variation in sets for the statistical analysis: for the Small Set, N = 505 discs were cut from 10 sheets; and for the Large Set, N = 1417 discs were cut from 22 sheets. For both sets three physical properties were measured. (1) The dry weight was recorded after discs were held for 7 hrs at 105°C in a drying oven (4 hrs for weight to be stabilised). (2) The moisture absorption was recorded as a percentage of the dry weight, with the oven dried veneer discs kept for 4 days at 28°C and 80% RH, calculated as (wet weight – dry weight)/dry weight×100. All weights were measured to four significant figures (AEA 250 g, Adam Equipment Co Ltd, Milton Keynes, UK). (3) The brightness was recorded as the mode-skewness of the reflected light (pixel intensity (I) distribution) from digital photographs (1660×1200 pixels) taken with a tripod mounted camera (µ tough 12 MP, Olympus). Flash was not used to avoid 50 Hz flickering; instead constant lighting was provided by six arrays of white LED packages (ea. 25×3mm LEDs, avg. I = 5cd (ea.), peak wavelength at 465 nm). The mode skewness of the reflected light calculated as (mean - mode)/standard deviation [29] was used as an estimate of the ratio of early- and late wood in the sapwood to heartwood, as the former is pale (negative mode skewness) and the latter is dark.

Photographs of 2 MP were found sufficient as distributions of photographs with higher resolutions (3 or 5 MP) did not show different results in distribution parameters but increased the computational load from 8 hrs to 1 and 3 days respectively. Images were processed from 16 bit colour depth to 8 bit (gray) and a cut-off pixel intensity of 60 was selected from plotting sorted veneer intensities over pixels for all discs; pixels with I <60 represented black background and were discarded. The intensity distribution and its mode-skewness were calculated for each veneer disc. Signal processing and optimisation was performed in Matlab R2012 and the statistical analyses were performed with RStudio 0.97.332 and R x64.2.13.1.

Determination of Similarity

The analysis process had three steps. Step 1 calculated fuzzy c-means derived from the measured physical properties only; in other words the origin of the veneer discs was not known. Step 2 used the known origin of the veneer discs, here termed ‘sheet membership’, to search for neighbouring veneer discs. Step 3 from fuzzy c-means derived clusters, uniform distributions was assigned to sheet membership in each cluster found in the experimental data with the same length (referred to as ‘experimental uniform data’). This ‘experimental uniform data’ was benchmarked against a hypergeometric distribution, thus allowing the effectiveness of fuzzy c-means clustering to find similar veneer discs to be assessed. The hypergeometric distribution describes a process equivalent to drawing balls from an urn but without replacement [30]. For this purpose the clustered groups were subsequently degraded in order to benchmark the clustering algorithm by comparing the effect of having completely randomised draws and the effect of the size of the population on the sorting quality of the fuzzy c-means clustering. Benchmarks with respect to neighbourhood size and the significance of the performance of the clustering algorithm with respect to medians calculated were discussed.

Clustering

A statistical cluster analysis used optimised subsets to group elements. Fuzziness was implemented to take the uncertainty into account, e.g. by assigning one veneer disc to several clusters. Fuzziness also implies that the method approximates “non-crisp” (not optimal) values [28].

The fuzzy c-means clustering algorithm [28] was implemented in Matlab R2012, and based on minimising the cost function [27], [28]

(1)

with m>1 being the fuzzifier (default m = 2), x_i the ith vector of dimension d, u_ij the element of the partition matrix of the veneer disc x_i in a cluster j describing the membership grade, and with ||.||₂ as the Euclidean distance between the vector ( Inline graphic ) and its cluster centre to be evaluated (c_j is the number of clusters). Fuzzy c-means clustering iteratively approximated the mean centroid of each cluster, from which a membership value for each veneer disc was calculated based on the strength of the association between the element and the centroid. A list of veneer discs from each cluster was sorted in descending membership order; the closer the membership to 1, the more the veneer disc belongs to the cluster. In order to partition each cluster in one set with unique membership and one fuzzy set, a second distance function was applied to sub-cluster veneer discs in subsets of lower membership (u_ij <0.5) into a fuzzy region.

Sorting Quality and Statistics

The fuzzy c-means algorithm gives clusters which are optimised according to their material properties and are characterised by (1) a sequence of veneer discs, (2) sheet memberships of veneer discs and (3) number of veneer discs in the cluster (referred to as cluster length). The accuracy of the assignment of veneer discs into similar groups using fuzzy c-means clustering process was tested by comparing these groups with sheet membership.

Veneer discs were mapped using their descending membership grade order in each cluster list, which created a spiral in a 3-dimensional veneer disc property space, starting from a region with unique cluster membership, and ending in a fuzzy region. The clustering can therefore be seen as a mapping from the 3-dimensional space onto a 1-dimensional subspace of membership grades.

(2)

Equation (2) gives a Euclidean distance as an argument of the indicator (or characteristic) function Inline graphic [29]. The indicator function searched for neighbouring veneer discs from the same sheet, which were then given a value of one, all others assigned zero. The percentage of successful neighbour searches relative to the population’s vector length was calculated for each neighbourhood size. From the centroid, the optimisation procedure moved iteratively outwards, such that the neighbourhood size was successively increased, by checking first direct neighbours, then second neighbours and so on. The percentage of neighbours was subsequently interpreted as the probability of encountering at least one neighbour of the same sheet Inline graphic for discrete neighbourhood widths . This was compared to 8 averages of bootstrapped uniform data of clusters of the same length and all sheet memberships as the experimental data and a hypergeomeric distribution (n = 8000 draws).

In order to find out the influence of the cluster length obtained by fuzzy c-means clustering, the probabilities to encounter neighbours were benchmarked against bootstrapped uniform distributions of sheet membership, first for clusters with the same length as the experimental data and then for clusters with identical length.

Thus the order of sheet membership was randomised so that the membership matrix (i.e. the list) for each cluster was not sorted. This corresponded to theoretical sets of identical weighted values, hence reduced the dependency of the measured wood properties, which led to the degradation of results obtained by the optimisation algorithm. For clusters with identical lengths, the bootstrapped averaged samples (8 repeats) for the Small Set had a total number of 360 veneer disc samples per sheet (10 sheets with n = 36 veneer discs each) and for the Large Set 506 veneer discs (22 sheets, n = 23 veneer discs each).

Distribution (Lilliefors-test) and significance tests (Student’s t- test or Fisher’s one-way ANOVA) were used to determine whether differences in the probability in encountering at least one direct neighbour were statistically significant for the ‘experimental uniform data’.