Skip to main content
eLife logoLink to eLife
. 2018 Dec 13;7:e40183. doi: 10.7554/eLife.40183

Structured illumination microscopy combined with machine learning enables the high throughput analysis and classification of virus structure

Romain F Laine 1,†,, Gemma Goodfellow 1, Laurence J Young 1, Jon Travers 2, Danielle Carroll 3, Oliver Dibben 3, Helen Bright 3, Clemens F Kaminski 1,
Editors: Antoine M van Oijen4, Arup K Chakraborty5
PMCID: PMC6331195  PMID: 30543181

Abstract

Optical super-resolution microscopy techniques enable high molecular specificity with high spatial resolution and constitute a set of powerful tools in the investigation of the structure of supramolecular assemblies such as viruses. Here, we report on a new methodology which combines Structured Illumination Microscopy (SIM) with machine learning algorithms to image and classify the structure of large populations of biopharmaceutical viruses with high resolution. The method offers information on virus morphology that can ultimately be linked with functional performance. We demonstrate the approach on viruses produced for oncolytic viriotherapy (Newcastle Disease Virus) and vaccine development (Influenza). This unique tool enables the rapid assessment of the quality of viral production with high throughput obviating the need for traditional batch testing methods which are complex and time consuming. We show that our method also works on non-purified samples from pooled harvest fluids directly from the production line.

Research organism: Virus

eLife digest

Viruses are like the Trojan horses of the biological world; they sneak their genetic code into a living cell and then hijack it, forcing that cell to produce their own viral proteins. Yet, if scientists replace the harmful genes in a virus with other genes, the virus can be transformed into a powerful tool for biology and medical science. For example, viruses can be turned into vaccines that prime the immune system to ward off future infections. Viruses could also be made to deliver the genetic code needed to repair faulty cells, and thus treat the cause of an illness from inside the body. Nevertheless, it is complicated to produce viruses like these on a large scale.

The individual viruses in one batch can be very different shapes and sizes; they can also end up displaying different proteins on their outer surface – which is the part of the virus that our immune system will see first. To optimise the production of standardised viruses, scientists need a way to test the viruses throughout the manufacture process. At the moment, the best way to do this would be with electron microscopes. Yet these microscopes cannot tell exactly which proteins are in the outer surface of the virus. Also, these methods often need purified samples of virus, so cannot be used to look at the viruses until the final stage of production.

Laine et al. now report a method that can test virus production at every step of the process. This new method uses a different type of microscopy called super-resolution imaging, which is quicker than electron microscopy and more able to deal with impurities, but can still see objects that are 500 times smaller than the width of a human hair. First, Laine et al. took pictures of many viruses with this new imaging technique, sorted the images into groups based on their appearance, and then trained computer algorithms with the pre-sorted groups (a technique called “supervised learning”). Next, the trained algorithms were shown new images of viruses and asked to classify them. The algorithms could separate images of a mixed population of viruses into six groups according to their shape and size, and then analyse each group in a specific way. For example, they would measure and report the length of filament-shaped viruses, the radius of spherical viruses and the length and width of rod-shaped viruses.

The first set of test images were of Newcastle Disease Virus, which is currently under development as a treatment for cancer. But further testing revealed that the algorithm also works for the influenza virus, which is used to make flu vaccines. The algorithm could classify the viruses in pure and impure samples, and the imaging technique could handle over 200 viruses each second.

This approach of combining super-resolution imaging with artificial intelligence could help scientists to understand what makes good vaccines and how best to optimise the production of viruses for medical purposes. It could also allow researchers to respond more rapidly to outbreaks of viral infections. The next step is to build this work into a system that can be used by the pharmaceutical industry.

Introduction

The potential of super-resolution microscopy (SRM) to unravel details of the structure and replication of viruses was recognised early on in the development of the methodology (Betzig et al., 2006; Müller and Heilemann, 2013). Since then, SRM has been used to provide unprecedented insights into viral protein architecture (Laine et al., 2015; Zhang et al., 2015; Gray et al., 2016; Albecka et al., 2016). Previous work has focused on those SRM techniques that achieve the highest theoretical resolution, such as Stimulated Emission Depletion (STED) (Hell and Wichmann, 1994) and Single Molecule Localisation Microscopy (SMLM) (Rust et al., 2006; Heilemann et al., 2008). Whilst offering high fidelity data, the downside is the associated long acquisition time required by these methods, limiting their application to the imaging of static samples at low throughput. A much faster technique, although inferior in spatial resolution, is Structured Illumination Microscopy (SIM) (Gustafsson, 2000; Heintzmann and Cremer, 1999) and this has been applied to study large viruses such as the prototypic poxvirus (Gray et al., 2016; Horsington et al., 2012). In addition to understanding the structure of viruses, there is also a need to identify and analyse classes of structures within large viral populations, especially in the biotechnology industry where virus quality is often compromised by large scale production operations and the virus product is often characterised by significant morphological heterogeneities. In particular, campaigns of influenza immunization rely heavily on the timely and efficient production of specific virus strains. Similarly, a deeper understanding of the structural heterogeneity of oncolytic viruses such as Newcastle Disease Virus (NDV) (Ganar et al., 2014; Lichty et al., 2014) would enable optimization of the production processes and in turn improve the development of viriotherapy. However, quantifying and understanding this structural heterogeneity and relating it to virus efficacy requires the imaging of large numbers of viruses at sufficient spatial resolution to reveal characteristic morphological details. Typically, this is achieved by extracting batches from the production process, with elaborate subsequent purification and preparation steps before characterisation by Transmission Electron Microscopy (TEM) (Gad, 2007; Goldsmith and Miller, 2009; Brenner and Horne, 1959). Although TEM can achieve relatively high imaging throughput if the highest resolution is not necessary, its contrast remains unspecific and therefore does not typically permit discerning the presence of particular proteins in the virus envelope. Also, the typical signal-to-noise ratios achieved by TEM are not sufficient to permit automated, robust and efficient downstream analysis of structural features at the single particle level. It is therefore challenging for TEM to be of practical use during production operations.

Here, we demonstrate that rapid high resolution imaging with Total Internal Reflection SIM fluorescence microscopy (TIRF-SIM) (Shao et al., 2011; Kner et al., 2009; Young et al., 2016), combined with a machine learning (ML) approach to analyse and classify structures in virus batches offer a great opportunity to circumvent these problems. We present MiLeSIM (Machine Learning Structured Illumination Microscopy) as an efficient combination of SRM, ML-based classification (Van Valen et al., 2016; Sommer et al., 2011) and advanced image analysis for the quantification of morphological heterogeneities in large virus populations. We use ML algorithms to perform a classification of super-resolved images of a heterogeneous virus population into particle classes with distinct and characteristic structural features (e.g. spherical, filamentous). The classified subpopulations are then further analysed through image analysis pipelines that are specifically adapted for each structural class. We and others have shown that appropriate model fitting can lead to precision in structural parameters beyond the resolution of the images used (Laine et al., 2015; Manetsberger et al., 2015). The method combines speed and specificity and allows an in-depth exploration of large virus populations that is unachievable by TEM. The method has potentials in the industrial production of viruses, for example for oncolytic viriotherapy and vaccine development.

First, we compare TIRF-SIM with alternative imaging modalities and show that it is the method of choice to investigate virus structure at high-throughput (~220 virus particles/s, see Supplementary Note 1) with a spatial resolution reaching ~90 nm. The large datasets obtained with TIRF-SIM are then fed into an ML algorithm for the automated classification of Newcastle Disease Virus (NDV) and live attenuated influenza virus (LAIV) vaccines, enabling further shape-specific quantitative analyses for a structural description of viral subpopulations. The purpose of our study is to validate the MiLeSIM approach as a powerful analysis tool for biotechnological processes involving virus production both in industry and in the research laboratory.

Results

TIRF-SIM offers an optimal combination of throughput and resolution for the imaging of virus structure

First, we explored and compared three common SRM modalities for the structural investigation of purified NDV virus, namely direct stochastic optical reconstruction microscopy, dSTORM, stimulated emission depletion microscopy, STED and TIRF-SIM. NDV viruses were labelled for the envelope glycoprotein Hemagglutinin-Neuraminidase (HN) and imaged with all three SRM imaging techniques (see Figure 1). Labelling for HN allows us to directly and specifically observe the shape of the virus particles. For comparison, a conventional (non-super-resolved) TIRF wide-field image is also shown. Typical shapes observed with TIRF-SIM are shown in Figure 1(b). TIRF-SIM provides clear structural details to discern filamentous, spherical and rod-like structures in large NDV populations. A comparison of performance parameters (resolution and imaging speed) for the different methods is presented in Figure 1—figure supplement 1(a). It is clear that improving resolution beyond the ~90 nm offered by TIRF-SIM (see Figure 1—figure supplement 1(b)) comes at a significant cost in acquisition times and throughput. Furthermore, although dSTORM and STED offer theoretically higher resolution than SIM, the images obtained with these methods do not reveal additional structural details that are not also resolved by TIRF-SIM images. This indicates that the ~2 fold resolution improvement provided by TIRF-SIM is sufficient for the structural study presented here.

Figure 1. Super-resolution microscopy (SRM) for the study of NDV virus structure.

(a) Representative images of purified NDV viruses with different imaging modalities. (b) Representative images of a purified NDV virus population imaged with TIRF-SIM and their corresponding TIRF wide-field image. WF: wide-field TIRF microscopy. Scale bar: 1 µm.

Figure 1.

Figure 1—figure supplement 1. Resolution in SRM.

Figure 1—figure supplement 1.

(a) Typical spatial resolution and acquisition times for the imaging of NDV structures highlighting the trade-off between speed and resolution (b) Representative TIRF-SIM image obtained from purified B Victoria LAIV and its corresponding Fourier transform (c). The Fourier transform highlights the resolution ~90 nm. The plot was obtained using the SIMcheck plugin (Ball et al., 2015). (d) Image and cross section of a single secondary antibody labelled with DyLight 488, showing a FWHM of ~90 nm. FFT: Fast Fourier transform.
Figure 1—figure supplement 2. Electron micrographs of NDV and B-Victoria viruses.

Figure 1—figure supplement 2.

These images were obtained from a Philips CM 100 Compustage (FEI) TEM and negative staining.

Traditionally, EM has been the method of choice for observing sub-diffraction structures of virus particles (see Figure 1—figure supplement 2 for examples of particles). Here we show that TIRF-SIM can offer significant advantages compared to EM (summarized in Table 1). The improvement in molecular specificity allows an unambiguous identification of viral components; the high signal-to-noise ratio (SNR) furthermore enables a robust and straightforward application of further image analysis steps (identification and classification of virus particles). Also, the capability of investigating unpurified and aqueous samples makes TIRF-SIM ideally suited to the present application.

Table 1. Comparison of the key performance parameters of TIRF-SIM (proposed method) and EM in the context of high throughput imaging of virus structure.

The resolution and acquisition time of EM were quoted for a standard TEM imaging (Philips CM 100 Compustage (FEI) Transmission Electron Microscope with an AMT CCD camera). *for a comparable field-of-view.

Tirf-sim EM
Contrast Fluorescence Electron scattering
Molecular specificity Very high Medium to low
Spatial resolution achievable ~90 nm ~1 Å
Acquisition time/1000 virus particles* 2 s 2 s
Typical field of view size 30 µm x 30 µm 500 nm x 500 nm
Sample preparation complexity Low Low to Medium
Compatibility with aqueous buffers High Low
Compatibility with non-purified samples High Low
Signal to noise ratio achievable Very high Medium
Sample preparation time Low (2–3 hr) Low to High
Expertise required for imaging Medium Medium
Cost Low (£100 k) Medium (£250 k)

Workflow of MiLeSIM

The images obtained with TIRF-SIM show a number of stereotypical virus structures in NDV samples labelled for HN, indicating a large morphological diversity in the virus populations that may stem from variability occurring during viral replication or at the purification stage. Understanding the origins and consequences of such heterogeneity informs not only on the life cycle of the virus but can also provide essential insights into the virus production process to manufacturers of virus-based therapeutics. An automated classification of virus shapes would enable the quantification of virus heterogeneity and permit further analysis of each individual class independently. The workflow to achieve these goals is shown in Figure 2.

Figure 2. Workflow of automated detection, classification and analysis of NDV viral particles.

SIM image (a) and segmented particles (b). The classified single-virus images (b) can be further analysed with a set of class-specific tools (c). For the backbone analysis the mask and backbone are showed in blue and white respectively. For the model-fitting approach (spherical and rod-like), the data and model are showed in green and magenta respectively. LF: long filamentous, SF: short filamentous, LS: large spherical, SS: small spherical, RD: rod-shape, UK: unknown. LF, DLS, LRD and WRD represent the length of the filamentous particles, the diameter of the large spherical, the length of the rod-shaped particles and the width of the rod-shaped particles respectively. Images of individual particles cover a field of view of 1.6 × 1.6 µm.

Figure 2.

Figure 2—figure supplement 1. Image model for the analysis of the rod-shaped particles.

Figure 2—figure supplement 1.

The original image is segmented and thinned to obtain the backbone of the particle. The backbone is up-sampled and interpolated outside the particle. It is then used to compute the model scaffold. For this, each end of the backbone is independently grown or reduced to adjust its length, hence LRD = Ltot – L1 – L2, where Ltot is the maximum length of the extended backbone, and L1 and L2 are the adjusted distances by which the backbone length is adjusted on each end respectively. Then, the image is dilated by a disk-shaped kernel of radius equal to half WRD. The outline of this image gives the model scaffold. The scaffold is then convolved with a Gaussian kernel representing the effect of image resolution (here 90 nm) and the image is down-sampled again to the original image size. The optimal LRD and WRD are those that minimize the difference image and the χ (Müller and Heilemann, 2013).

Individual virus particles are first identified by automated segmentation and then fed to the ML routine for classification. We used a supervised ML algorithm (here a random forest algorithm (Breiman, 2001)) to ensure the robustness of the method and for ease of implementation. We identified six major structural classes in the NDV samples which we divide into long and short filamentous, small and large spherical, rod-like and unknown structures. The unknown class is made of clumps of viral material with no consistent and identifiable shapes. A control sample that was prepared identically to the other samples except without virus particles present allowed us to identify that non-specific bindings of antibodies appear as rare, dim and small point-like structures that could easily be discriminated and excluded from further analysis.

The filamentous (long and short) class is further analysed by automatic extraction of the linear backbone of structures and measurement of their length. The width of these filamentous structures appeared to be limited by the resolution of the imaging technique (~90 nm for TIRF-SIM, see Figure 1—figure supplement 1, but also observed in higher resolution approaches such as dSTORM) and therefore, we considered the filamentous class as 1D structures. The spherical structures were analysed by estimating their equivalent radius from the area of the particle. We note that other methods for estimation of the radius, for example the ellipsoid localization microscopy (ELM) analysis (Manetsberger et al., 2015), could also be used here. The latter fits a shape model to imaging data to permit the extraction of structural parameters with precision higher than the inherent resolution of the imaging method (Laine et al., 2015; Manetsberger et al., 2015). A similar model-based fitting approach was used to fit rod-like viral particles and to obtain length and width parameters for this structural class (see Materials and methods section and Figure 2—figure supplement 1 for details).

Classification of virus structures using supervised machine learning algorithms

The structural classification was performed using a supervised ML algorithm which allows for rapid and automated classification of large datasets. The choice of algorithm and the set of features (often called predictors) extracted for each identified particle were optimised to maximise the overall accuracy of the model based on the training dataset (comprising of 370 manually annotated particles). Here, the model accuracy is defined as the fraction of correctly classified particles across all classes. Figure 3(a) describes the list of chosen individual features (selected from basic shapes features, Hu’s image moments (Hu, 1962), features obtained from the pre-trained convolutional neural network (CNN) AlexNet (Krizhevsky et al., 2012) and from Speeded Up Robust Features, SURF (Bay et al., 2008)). The predictors were selected based on the following criteria: basic structural features of the particles (e.g. area, eccentricity) and Hu’s moments were chosen because they are rotationally and translationally invariant. For the features from AlexNet and SURF a feature selection approach was designed based on maximising the standard deviation across the different structural classes. This approach constitutes a more rational choice compared to simple principal component analysis (PCA), which does not typically take the information regarding the classes into account, therefore our method selects for predictors that have high potential for class discrimination. This data reduction narrowed down the number of predictors to six for AlexNet and six for SURF.

Figure 3. Machine learning-based classification.

(a) Building the list of predictors from basic features, image moments, convolutional neural network (CNN) features and SURF bag of features (BoF). (b) Example of 2D scatter plots of pairs of predictors showing how some predictors allow identification of class clusters. (c) Confusion matrix obtained from the random forest showing the high true positive rate (TPR) and positive predictive values (PPV) of the classification. All numbers shown here are in percentage. (d) Scoring of the predictors sorted in descending order. IM: image moment. AN: AlexNet feature. BoF: SURF features. L1/L2: ratio of long axis over short axis. <I> : average intensity. P/A: perimeter to area ratio. σI: standard deviation of intensity. LF: long filamentous, SF: short filamentous, LS: large spherical, SS: small spherical, RD: rod-shape, UK: unknown.

Figure 3.

Figure 3—figure supplement 1. Flowchart describing the machine learning pipeline used here for image classification.

Figure 3—figure supplement 1.

A strong emphasis should be put on the choice of predictors and the quality of the manual annotation (training dataset) prior to classification, as this will largely determine the quality of the classification.

A total of 24 predictors was finally chosen: seven based on basic shapes (area, ratio of axis lengths, eccentricity, solidity, perimeter-to-area, mean intensity, standard deviation of pixel intensities), 5 of Hu’s image moments (Hu1, Hu4, Hu5, Hu6 and Phi4), six features obtained from the pre-trained convolutional neural network (CNN) AlexNet and six from a SURF bag of features. The classification workflow is described in Figure 3—figure supplement 1.

Panels described in Figure 3(b) show examples of scatter plot from arbitrarily chosen pairs of predictors highlighting that some predictors support classification across specific classes better than other combinations (identifiable clusters of certain classes). Here, the training dataset was used to build a scatter plot of pairs of predictors with the knowledge of their true classification (see colour scheme). For instance, the pair of predictors L1/L2 and Area shows a good separation between long filamentous (dark blue labels) and unknown structures (black labels). The confusion matrix (Figure 3(c)) highlights the effective true positive rate (TPR) and positive predictive values (PPV) across the different classes with a model accuracy of 88.4%. We note that some long filamentous viruses are misclassified as small filamentous, some small filamentous are misclassified as small spherical and that, on occasion, some unknown structures populate the predicted long and large spherical structure classes. Considering the simple shapes of these viruses, it is expected that a small fraction of particles are misclassified as structures with close resemblance.

The scoring of the predictors presented in Figure 3(d) indicates the average accuracy of each individual predictor. A high score indicates a high capacity to discriminate between different classes. The scoring was performed by measuring the accuracy of the classification for many combinations of predictors and distributing the accuracy score across the predictors tested (see Materials and methods for details). In other words, if a combination of 2 predictors alone give an accuracy of 60%, a score of 30% is awarded to both individual predictors. This method was repeated and scores represent averages across >13,000 different combinations of predictors.

Structural details of an NDV virus population

We analysed a total of ~6500 particles using MiLeSIM and established that 49.7% of NDV particles presented a filamentous shape whereas the large spherical, small spherical and rods represent 18.6%, 7.8% and 7.3% of the total population, respectively (Figure 4(a)). In addition to structural classification, the high-resolution images also permitted a dimensional analysis to be performed at the single particle level. We estimated the particle radius from both small and large spherical particles by calculating the equivalent radius from the particle area; backbone extraction to the short and long filamentous particles, to estimate the particle length; and designed a model fitting for the rod structures. Figure 4 shows the distribution of structural parameters for each class. We observe that both long filamentous and large spherical are well described by a Gamma distribution whereas the small filamentous and small spherical are well described by a Gaussian distribution. The model-fitting applied to the rod-shaped particles (see Materials and methods and Figure 2—figure supplement 1 for details) allows the extraction of both the width and length of each particle. Therefore, it is possible to plot the distribution of structural parameters as a contour plot Figure 4(c)).

Figure 4. Quantitative analysis of NDV.

The distribution of structural parameters for all classes was obtained from a total of ~6500 virus particles. LF: long filamentous, SF: short filamentous, LS: large spherical, SS: small spherical, RD: rod-shape, UK: unknown. Images of individual particles cover a field of view of 1.6 × 1.6 µm.

Figure 4—source data 1. Source data for Figure 4.
elife-40183-fig4-data1.xlsx (118.6KB, xlsx)
DOI: 10.7554/eLife.40183.013

Figure 4.

Figure 4—figure supplement 1. ELM analysis of the large spherical NDV.

Figure 4—figure supplement 1.

(a) and B-Victoria LAIV (b) viruses. The distribution of diameters were fitted to a Gamma and Gaussian distributions respectively. The mean diameters and standard deviations of the data are shown.

We estimated the mean and standard deviation of the structural parameters from the distributions and obtained: LLF = 650 ± 430 nm, LSF = 200 ± 100 nm, DLS = 338 ± 94 nm, DSS = 190 ± 10 nm. For the rod-shaped particles, we observed that the width WRD = 135 ± 30 nm and the length LRD = 610 ± 350 nm (all rounded to two significant figures, ± represents the standard deviation of the distribution). These values are distributed around two populations as shown on the contour plot in Figure 4(d).

However, we note that the radius analysis based on the area of the particle used here constitutes an overestimate of the physical radius of the particle due to the broadening caused by the point-spread function. It is possible to estimate a more accurate diameter of the underlying spherical structures by using the ELM analysis. The results obtained from the ELM analysis of the large spherical structures are shown in Figure 4—figure supplement 1(a). The ELM diameter obtained for the large spherical particles (220 ± 69 nm) is in good agreement with an area-based diameter of 338 nm and a resolution of 90 nm.

It should be noted that the small spherical distribution is centred on the value of optical resolution of our SIM microscope, which indicates that the small spherical structures the small spherical structures are smaller than the point-spread-function.

MiLeSIM is capable of assaying influenza strains used for vaccine production in purified and non-purified samples from the production line

We applied our approach to four different strains of Live Attenuated Influenza Virus (LAIV) immuno-labelled for the glycoprotein Hemagglutinin (HA) present on the exterior of the viral envelope. The shape of the virus particles obtained here were classified using the same classifier as for NDV. The LAIV virus population was dominated by spherical structures (>60%). Figure 5 shows the distribution of particle sizes for four virus strains: a B-Victoria subtype (B/Brisbane/60/2008), a B-Yamagata subtype (B/Phuket/3073/2013) and two subtype A H1N1 strains (A/South Dakota/06/07 and A/Bolivia/559/2013). The fractions of small and large spherical particles are shown, as well as the equivalent radii and representative images of the viruses. It is clear that B-Victoria particles consist of mostly large hollow particles with an equivalent radius of ~130 nm, a value that is in good agreement with the ELM analysis and a resolution of 90 nm (Figure 4—figure supplement 1(b)).

Figure 5. MiLeSIM approach applied to Live Attenuated Influenza Virus (LAIV).

2 types of B and A viruses were analysed here. The population was dominated by small and large spherical particles. The distributions of equivalent radius are shown here for both the large and small spherical for direct comparisons. The number of particles analysed were N = 3,821, 4704, 1062 and 1756 for B/Brisbane/60/2008 (B-Victoria), B/Phuket/3073/2013 (B-Yamagata), A/South Dakota/06/2007 and A/Bolivia/559/2013 respectively. Images of individual particles cover a field of view of 1.6 × 1.6 µm.

Figure 5—source data 1. Source data for Figure 5.
elife-40183-fig5-data1.xlsx (168.1KB, xlsx)
DOI: 10.7554/eLife.40183.016

Figure 5.

Figure 5—figure supplement 1. Structural analysis of B-Victoria LAIV obtained from pool harvested fluid (PHF).

Figure 5—figure supplement 1.

(a) TIRF-SIM images. The images acquired here using PHF show an identical image quality as with highly purified samples. (b) Structural analysis of N = 1295 virus particles. Images of individual particles cover a field of view of 1.6 × 1.6 µm.

In contrast, the B-Yamagata strain shows small and large particles of equal amount, indicating that the particles sizes are distributed around the region of overlap between small and large particles. This is confirmed by the nearly identical equivalent radius distributions.

Both A strains appeared clearly dominated by small spherical particles with sizes close to the resolution limit of our imaging. However, our high-throughput approach reveals subtle differences in the distribution of small spherical structures where the A/South Dakota viruses appear more heterogeneous (standard deviation ~10 nm), whereas the A/Bolivia viruses are sharply distributed (standard deviation ~4 nm).

We also investigated the potential of directly imaging pool harvested fluid (PHF). LAIV are commonly propagated in embryonated hens’ eggs where progeny viruses are released into the allantoic fluid of the egg. This fluid is harvested from numerous eggs and pooled. This constitutes a very basic and commonly used virus material. It is easy to produce and does not undergo any downstream purification. Consequently, PHF is impure, containing a variety of egg-derived impurities. The high molecular specificity of fluorescence microscopy allowed us to visualize the structure of the viruses with the same image quality directly in PHF despite the presence of a large amount of impurities (Figure 5—figure supplement 1(a)). The structural analysis of the B-Victoria strains from MVB (Figure 5) and PHF (Figure 5—figure supplement 1(b)) allows us to decipher the effect of purifications steps on the structural properties of the population. The fraction of unknown structures dropped from 23% to 8% between MVB and PHF respectively. This change in fraction of unknown structure may be a result of the different densities of viruses on the cover slip. We observed a lower density of virus particles in the PHF preparations, which may lead to fewer aggregated classes and therefore fewer unidentifiable structures. In addition, whereas the PHF shows a nearly equal amount of small and large spherical structures, the MVB preparation is missing a large population of small spherical compared to the PHF. This is also reflected by the larger average diameters observed in the MVB compared to the PHF (DSS = 191 ± 12 nm and 198 ± 14 nm and DLS = 241 ± 56 nm and 275 ± 49 nm for PHF and MVB respectively).

MiLeSIM therefore enables the study of unpurified samples and allows probing the virus production at any intermediate levels of production and purification. This constitutes a strong advantage over EMtechniques which require the use of highly purified samples and elaborate preparation protocols.

Discussion

We have demonstrated the potential of high-throughput imaging of virus structures, taking advantage of the optimal combination of speed and resolution afforded by the TIRF-SIM imaging method. TIRF-SIM provided sufficient resolution to identify, discriminate and analyse individual viral structural classes with high specificity, even in non-purified samples. Our approach combines machine learning to classify NDV viruses, followed by a model-based or direct quantification of virus structural parameters. The method yielded similar results both in purified samples and in samples from unfiltered PHF offering promise for use as an assay during virus production. We were able to image up to ~220 particles/second at 90 nm resolution, vastly increasing imaging throughput compared to alternative super-resolution methods, improving sensitivity and specificity in comparison to EM. Furthermore EM does not feature the specificity to analyse virus samples in their aqueous, unaltered unpurified forms. We were able to observe large structural variabilities in the NDV population and also between different strains of LAIV.

Our particular classification uses random forest with a selection of predictors from simple shape parameters, rotational and translational invariant image moments and features from AlexNet and common feature for image recognition such as SURF. The model accuracy is ~88.4% and the mis-classifications occur between classes that are similar (between small spherical and small filamentous for instance). The structural parameters that we extract from the model fitting are precise beyond the image resolution as they take into account the finite optical resolution. This therefore reveals subtle differences in populations such as the two sub-classes observed in the rod-shaped class. This approach will be beneficial especially when heterogeneous populations are present and need to be quantified. In future, such information can be correlated with functional characteristics of produced virus classes and production parameters can accordingly be optimised. The approach thus holds great promise for the production of virus-based therapeutics. We note, however, that the methods presented are generally applicable to other systems and they are not restricted to a particular type of fluorescence microscopy, SRM or not.

Materials and methods

Sample preparation

The purified NDV samples were prepared on cover slips as previously described (Laine et al., 2015). Briefly, viruses were adhered on poly-L-lysine-coated Ibidi 8-well dishes, fixed, permeabilised and immuno-labelled for the envelope glycoprotein Hemagglutinin-Neuraminidase (HN) with primary antibodies (mouse anti-hemagglutinin-neuraminidase HN, Abcam, UK) followed by secondary labelling (goat anti-mouse labelled with Alexa Fluor 647 for dSTORM, with Alexa Fluor 488 for TIRF-SIM and with ATTO647-N for STED, Abcam, UK).

The LAIV samples were prepared identically but using primary antibodies originating from MedImmune in-house, non-commercially available monoclonals that target the viral glycoprotein Hemagglutinin (HA) present on the exterior of the viral envelope: F16 mouse antibody for B-Victoria, Infa0121 mouse antibody for B-Yamagata and FY1 human antibody (Kallewaard et al., 2016) for A South Dakota and A Bolivia. The corresponding secondary antibodies were used (donkey anti-mouse DyLight 488 labelled or rabbit anti-human DyLight 488 labelled antibodies, ThermoFisher). All virus samples originated from the monovalent bulk (MVB) and are therefore highly purified, unless indicated in the text, where the direct pool harvest fluid (PHF) was used.

TIRF-SIM, STED and dSTORM imaging

Our custom-built TIRF-SIM system was described previously (Young et al., 2016). We used an Olympus UAPON 100x TIRF NA = 1.49 and an Orca Flash 4.0 camera, with a sample pixel size of 64 nm. A total of 9 SIM images were acquired (three phases, three orientations) with a camera exposure time of 200 ms and ~250 µW of 488 nm laser, measured at the back aperture of the objective. The SIM images were obtained using the reconstruction code provided by Dr Lin Shao (Shao et al., 2011), providing images with doubled resolution and 32 nm final pixel size using a Wiener filter of 0.01. The STED imaging was performed on our custom-built STED microscope as described previously (Mahou et al., 2015). The dSTORM imaging was performed on a custom-built single-molecule microscope previously described (Ströhl et al., 2017; Wong et al., 2017) and with mercaptoethylamine (MEA) buffer as previously described (Laine et al., 2015). The dSTORM image reconstruction was carried out using rapidSTORM 3 (Wolter et al., 2012).

The resolution achieved by the TIRF-SIM microscope was assessed by identifying the edge of the spatial frequency support using the SIMcheck plugin (Ball et al., 2015), as shown in Figure 1—figure supplement 1. For STED microscopy, the resolution was estimated from cross-sections of 20 nm beads and reporting the full width at half maximum (FWHM). The dSTORM resolution reported here was obtained from the FWHM of the localization precision, estimated by (Thompson et al., 2002).

Classification

All segmentations, predictors extractions and classifications were performed using MATLAB (Mathworks). The code is freely available (Laine, 2018). A general diagram of the method is shown in Figure 1—figure supplement 1. The segmentation was obtained by an initial Otsu binarization and refined by active contour. This allowed a better outline of the particles and efficient separation of particles in close proximity. The particles that were judged too small or too dim to be real particles (based on criteria obtained from the control sample) were excluded from further analysis.

The basic shape features were extracted using the MATLAB function regionprops. Hu’s image moments were computed from the 71 × 71 pixels particle image centred on the centre of mass of the particle. The absolute values of the logarithm of the moments were used in the classification. For the features obtained from AlexNet (Krizhevsky et al., 2012), the individual 71 × 71 pixels images were resized to 227 × 227 pixels and used as all three color layers of the RGB images taken by AlexNet. Then, feature extraction was performed using AlexNet as a pre-trained network. 4096 features were obtained and data reduction was performed to limit the number of predictors used. For this, the features were averaged within each individual class and the standard deviation of every feature across the classes was computed. The six features with the highest standard deviation was selected. For the SURF features, first a bag of visual words was created from the training dataset, this bag was then used to check the presence of visual words in the 71 × 71 pixels images of individual particles. Similarly to AlexNet features, we selected only the six visual word features with the highest standard deviation across the different classes for classification. This allowed the computation of a total of 24 features for ML.

The classification was performed using a random forest algorithm. The training dataset was made of 370 manually labelled individual particles and was used to train the random forest across 60 epochs. The classification was validated by 10-fold cross validation on the same dataset. The confusion matrix obtained from this cross-validation is shown in Figure 3. At the training stage, the training dataset was augmented 5-fold by transforming the images with image translation and rotation randomly picked between 0 and 1 pixel and between 0 and 360 degrees respectively.

The accuracy of the model was estimated by calculating the fraction of correctly classified particles across all classes.

accuracy=NumberofparticlescorrectlyclassifiedTotalnumberofparticles

Predictor scoring

The predictors were scored by computing the accuracy of the random forest trained on the training dataset but with only subsets of features. Out of the 24 predictors all combinations of 2, 3, 4, 24, 23 and 22 predictors were tested corresponding to a total of 13,227 combinations of predictors. For each combination of predictors, the accuracy obtained was split equally across the different predictors used, producing a ‘local’ accuracy for each feature. This local score was average across all combinations using a specific feature to obtain the global score.

Si=1Ncij=1NcaijPjnj

Where Si is the global score of the feature i, Nci is the total number of combinations tested involving feature i, aij is a factor reflecting the presence of the feature i in the combination j. aij is equal to one if i is present in j, 0 otherwise. Pj is the accuracy of the combination j, Nc is the total number of combination tested and nj is the number of features present in the combination j.

Quantitative analysis

All quantitative analyses were performed using MATLAB (Mathworks). The code is freely available (Laine, 2018). The length of the filamentous structures were extracted by measuring the geodesic distance along the skeletonized image of the filament. The ELM analysis is freely available (Manetsberger et al., 2015) and the code was adapted to insert within the workflow of our approach. For ELM analysis, we observed no significant ellipticity in the spherical virus particles and fitted spherical shapes to extract the radius of the particles (Figure 4—figure supplement 1).

The equivalent radius r of the spherical particles were simply calculated from the area A of the segmented particle.

r=Aπ

The image model for the rod-shaped particles is presented in Figure 1—figure supplement 1. Briefly, the backbone of the particle was extracted by image thinning and then dilated by a disk-shaped kernel of radius equal to half of the width of the rod. The length of the rod could be adjusted by shortening the ends of the backbone or by extrapolating it outwards to lengthen it. The interior pixels of the image obtained were removed to leave the outline of the particle shape. This outline was then convolved with a Gaussian kernel in order to take into account the effect of the image resolution (here 90 nm). The intensity, the width and length of the model image were adjusted to minimize the sum of the square difference of intensity χ2.

χ2=ijImi,j-Id(i,j)2

Where i and j refer to the indices in the image, Im(i,j) is the image model, and Id(i,j) is the data image.

Supplementary Note 1: Throughput of the method

The imaging throughput of the method can be assessed in terms of number of particles imaged per second. The field-of-view achievable in our TIRF-SIM system is ~32 µm x 32 µm and a high quality sample preparation can yield a virus particle density of ~1 particle/µm2. Therefore, with an acquisition time of 200 ms/SIM raw frame (with a total of 9 frames), we assess that our single frame particle throughput can reach ~500 imaged particles/s. However, the acquisition of two consecutive fields-of-view are affected by imaging dead time as a consequence of stage movement and refocussing. In the study presented here, this step was done manually and took approximately 2–3 s. Therefore, a practical throughput achievable for the imaging is of the order of ~220 particles/s. We note however that both acquisition times and the stage movement time can be easily reduced by increasing illumination power and automation respectively. This makes the 500 particles/s not an unreasonable estimation for the achievable throughput of a further optimised acquisition. The throughput of the method can also be regarded as the time necessary to perform the complete study from sample preparation to analysis. Table 2 indicates typical times necessary to perform the individual steps of the workflow. This table indicates that a full structural analysis of a particular sample can be obtained within a day.

Table 2. Estimation of the time necessary to perform individual steps involved in MiLeSIM.

Sample preparation was estimated based on standard immuno-labelling protocols. The computational times were assessed on an analysis machine with an i7 processor at 3.5 GHz and 64 GB of RAM.

Step Description Time
Sample preparation Plating, permeabilising and immune-labelling of
virus particles
2–3 hr
Instrument set-up Quality check of set-up alignment, calibration and
sample mounting
30 min
Imaging Image acquisition, stage movement and refocus
for ~ 50,000 particles (50 fields-of-view)
30 min
SIM reconstruction SR reconstruction of 50 fields-of-view <30 min
Classification on unknown
data
Extraction of predictors and classification (for 50
fields-of view)
1h
Structural analysis Extraction of structural parameters for each classes
(for 50 fields-of view)
1h
Data curation for training
dataset
Generating manually labelled particle dataset
(performed only once) for ~ 500 particles
2h
Generating classification model Data augmentation, extraction of predictors, training
of the model, cross-validation on ~ 500 particles (performed only once)
5–6 hr

Acknowledgements

RFL and GG were supported by funding from MedImmune Ltd. We thank C F van der Walle for critical reading of the manuscript. CFK acknowledges funding from the UK Engineering and Physical Sciences Research Council, EPSRC (grants EP/L015889/1 and EP/H018301/1), the Wellcome Trust (grants 3–3249/Z/16/Z and 089703/Z/09/Z) and the UK Medical Research Council, MRC (grants MR/K015850/1 and MR/K02292X/1), MedImmune and Infinitus (China) Ltd. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 722380. RFL acknowledges a BBSRC TRDF grant (BB/P027431/1). We would like to thank Hong Jin and Xing Cheng from the Medimmune ID Vaccines group to supply the NDV virus. We would also like to thank Dr Kathryn White and Tracey Davey from the EM Research Services at the Newcastle Medical School for providing examples of TEM images of LAIV samples. Thanks also go to Hong Jin and Dan Pappas for providing TEM images of NDV.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Romain F Laine, Email: r.laine@ucl.ac.uk.

Clemens F Kaminski, Email: cfk23@cam.ac.uk.

Antoine M van Oijen, University of Wollongong, Australia.

Arup K Chakraborty, Massachusetts Institute of Technology, United States.

Funding Information

This paper was supported by the following grants:

  • Engineering and Physical Sciences Research Council EP/L015889/1 to Romain F Laine, Gemma Goodfellow, Laurence J Young, Clemens F Kaminski.

  • Medical Research Council MR/K015850/1 to Romain F Laine, Gemma Goodfellow, Laurence J Young, Clemens F Kaminski.

  • Medical Research Council MR/K02292X/1 to Romain F Laine, Gemma Goodfellow, Laurence J Young, Clemens F Kaminski.

  • Wellcome Trust 3-3249/Z/16/Z to Romain F Laine, Gemma Goodfellow, Laurence J Young, Clemens F Kaminski.

  • Engineering and Physical Sciences Research Council EP/H018301/1 to Romain F Laine, Gemma Goodfellow, Laurence J Young, Clemens F Kaminski.

  • Biotechnology and Biological Sciences Research Council BB/P027431/1 to Romain F Laine.

  • MedImmune to Romain F Laine, Gemma Goodfellow, Jon Travers, Danielle Carroll, Oliver Dibben, Helen Bright.

  • Infinitus to Clemens F Kaminski.

  • Horizon 2020 - Research and Innovation Framework Programme to Clemens F Kaminski.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing.

Conceptualization, Data curation, Software, Methodology.

Resources, Formal analysis, Methodology.

Resources, Methodology.

Resources.

Resources.

Conceptualization, Resources, Funding acquisition, Investigation.

Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Writing—original draft, Project administration, Writing—review and editing.

Additional files

Transparent reporting form
DOI: 10.7554/eLife.40183.018

Data availability

Analysis code for image segmentation, machine learning training, classification and structural analysis is available via GitHub (https://github.com/Romain-Laine/MiLeSIM; copy archived at https://github.com/elifesciences-publications/MiLeSIM). Source data files have been provided for Figures 4 and 5.

References

  1. Albecka A, Laine RF, Janssen AF, Kaminski CF, Crump CM. HSV-1 Glycoproteins Are Delivered to Virus Assembly Sites Through Dynamin-Dependent Endocytosis. Traffic. 2016;17:21–39. doi: 10.1111/tra.12340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ball G, Demmerle J, Kaufmann R, Davis I, Dobbie IM, Schermelleh L. SIMcheck: a toolbox for successful super-resolution structured illumination microscopy. Scientific Reports. 2015;5:15915. doi: 10.1038/srep15915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-Up Robust Features (SURF) Computer Vision and Image Understanding. 2008;110:346–359. doi: 10.1016/j.cviu.2007.09.014. [DOI] [Google Scholar]
  4. Betzig E, Patterson GH, Sougrat R, Lindwasser OW, Olenych S, Bonifacino JS, Davidson MW, Lippincott-Schwartz J, Hess HF. Imaging intracellular fluorescent proteins at nanometer resolution. Science. 2006;313:1642–1645. doi: 10.1126/science.1127344. [DOI] [PubMed] [Google Scholar]
  5. Breiman L. Random Forests. Machine Learning. 2001;45:5–32. [Google Scholar]
  6. Brenner S, Horne RW. A negative staining method for high resolution electron microscopy of viruses. Biochimica Et Biophysica Acta. 1959;34:103–110. doi: 10.1016/0006-3002(59)90237-9. [DOI] [PubMed] [Google Scholar]
  7. Gad SC. Handbook of Pharmaceutical Biotechnology. Wiley Online Library; 2007. [Google Scholar]
  8. Ganar K, Das M, Sinha S, Kumar S. Newcastle disease virus: current status and our understanding. Virus Research. 2014;184:71–81. doi: 10.1016/j.virusres.2014.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Goldsmith CS, Miller SE. Modern uses of electron microscopy for detection of viruses. Clinical Microbiology Reviews. 2009;22:552–563. doi: 10.1128/CMR.00027-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gray RD, Beerli C, Pereira PM, Scherer KM, Samolej J, Bleck CK, Mercer J, Henriques R. VirusMapper: open-source nanoscale mapping of viral architecture through super-resolution microscopy. Scientific Reports. 2016;6:29132. doi: 10.1038/srep29132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gustafsson MG. Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy. Journal of Microscopy. 2000;198:82–87. doi: 10.1046/j.1365-2818.2000.00710.x. [DOI] [PubMed] [Google Scholar]
  12. Heilemann M, van de Linde S, Schüttpelz M, Kasper R, Seefeldt B, Mukherjee A, Tinnefeld P, Sauer M. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angewandte Chemie International Edition. 2008;47:6172–6176. doi: 10.1002/anie.200802376. [DOI] [PubMed] [Google Scholar]
  13. Heintzmann R, Cremer CG. Laterally modulated excitation microscopy: improvement of resolution by using a diffraction grating. In: Bigio I. J, Schneckenburger H, Slavik J, Svanberg K, Viallet P. M, editors. In Proc.SPIE 3568, Optical Biopses and Microscopic Techniques III. Spie digital library; 1999. pp. 185–196. [DOI] [Google Scholar]
  14. Hell SW, Wichmann J. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Optics Letters. 1994;19:780. doi: 10.1364/OL.19.000780. [DOI] [PubMed] [Google Scholar]
  15. Horsington J, Turnbull L, Whitchurch CB, Newsome TP. Sub-viral imaging of vaccinia virus using super-resolution microscopy. Journal of Virological Methods. 2012;186:132–136. doi: 10.1016/j.jviromet.2012.07.003. [DOI] [PubMed] [Google Scholar]
  16. Hu M-K. Visual pattern recognition by moment invariants. Inf. Theory, IEEE Trans. 1962;8:179–187. doi: 10.1109/TIT.1962.1057692. [DOI] [Google Scholar]
  17. Kallewaard NL, Corti D, Collins PJ, Neu U, McAuliffe JM, Benjamin E, Wachter-Rosati L, Palmer-Hill FJ, Yuan AQ, Walker PA, Vorlaender MK, Bianchi S, Guarino B, De Marco A, Vanzetta F, Agatic G, Foglierini M, Pinna D, Fernandez-Rodriguez B, Fruehwirth A, Silacci C, Ogrodowicz RW, Martin SR, Sallusto F, Suzich JA, Lanzavecchia A, Zhu Q, Gamblin SJ, Skehel JJ. Structure and Function Analysis of an Antibody Recognizing All Influenza A Subtypes. Cell. 2016;166:596–608. doi: 10.1016/j.cell.2016.05.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kner P, Chhun BB, Griffis ER, Winoto L, Gustafsson MG. Super-resolution video microscopy of live cells by structured illumination. Nature Methods. 2009;6:339–342. doi: 10.1038/nmeth.1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2012;9 [Google Scholar]
  20. Laine RF, Albecka A, van de Linde S, Rees EJ, Crump CM, Kaminski CF. Structural analysis of herpes simplex virus by optical super-resolution imaging. Nature Communications. 2015;6:5980. doi: 10.1038/ncomms6980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Laine RF. MiLeSIM. bc9ef5fGitHub. 2018 https://github.com/Romain-Laine/MiLeSIM
  22. Lichty BD, Breitbach CJ, Stojdl DF, Bell JC. Going viral with cancer immunotherapy. Nature Reviews Cancer. 2014;14:559–567. doi: 10.1038/nrc3770. [DOI] [PubMed] [Google Scholar]
  23. Mahou P, Curry N, Pinotsi D, Kaminski Schierle G, Kaminski C. Stimulated Emission Depletion Microscopy to Study Amyloid Fibril Formation. In: Enderlein J, Gregor I, Gryczynski Z. K, Erdmann R, Koberling F, editors. In SPIE Conference Series. 93310U. Laser Analytics group; 2015. [Google Scholar]
  24. Manetsberger J, Manton JD, Erdelyi MJ, Lin H, Rees D, Christie G, Rees EJ. Ellipsoid Localization Microscopy Infers the Size and Order of Protein Layers in Bacillus Spore Coats. Biophysical Journal. 2015;109:2058–2066. doi: 10.1016/j.bpj.2015.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Müller B, Heilemann M. Shedding new light on viruses: super-resolution microscopy for studying human immunodeficiency virus. Trends in Microbiology. 2013;21:522–533. doi: 10.1016/j.tim.2013.06.010. [DOI] [PubMed] [Google Scholar]
  26. Rust MJ, Bates M, Zhuang X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM) Nature Methods. 2006;3:793–796. doi: 10.1038/nmeth929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Shao L, Kner P, Rego EH, Gustafsson MG. Super-resolution 3D microscopy of live whole cells using structured illumination. Nature Methods. 2011;8:1044–1046. doi: 10.1038/nmeth.1734. [DOI] [PubMed] [Google Scholar]
  28. Sommer C, Straehle C, Kothe U, Hamprecht FA. Ilastik: Interactive learning and segmentation toolkit. 2011 IEEE International Symposium on Biomedical Imaging From Nano to Macro; 2011. pp. 230–233. [DOI] [Google Scholar]
  29. Ströhl F, Lin JQ, Laine RF, Wong HH, Urbančič V, Cagnetta R, Holt CE, Kaminski CF. Single Molecule Translation Imaging Visualizes the Dynamics of Local β-Actin Synthesis in Retinal Axons. Scientific Reports. 2017;7:709. doi: 10.1038/s41598-017-00695-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Thompson RE, Larson DR, Webb WW. Precise nanometer localization analysis for individual fluorescent probes. Biophysical Journal. 2002;82:2775–2783. doi: 10.1016/S0006-3495(02)75618-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Van Valen DA, Kudo T, Lane KM, Macklin DN, Quach NT, DeFelice MM, Maayan I, Tanouchi Y, Ashley EA, Covert MW. Deep Learning Automates the Quantitative Analysis of Individual Cells in Live-Cell Imaging Experiments. PLOS Computational Biology. 2016;12:e1005177. doi: 10.1371/journal.pcbi.1005177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wolter S, Löschberger A, Holm T, Aufmkolk S, Dabauvalle MC, van de Linde S, Sauer M. rapidSTORM: accurate, fast open-source software for localization microscopy. Nature Methods. 2012;9:1040–1041. doi: 10.1038/nmeth.2224. [DOI] [PubMed] [Google Scholar]
  33. Wong HH, Lin JQ, Ströhl F, Roque CG, Cioni JM, Cagnetta R, Turner-Bridger B, Laine RF, Harris WA, Kaminski CF, Holt CE. RNA docking and local translation regulate site-specific axon remodeling in vivo. Neuron. 2017;95:852–868. doi: 10.1016/j.neuron.2017.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Young LJ, Ströhl F, Kaminski CF. A Guide to structured illumination tirf microscopy at high speed with multiple colors. Journal of Visualized Experiments. 2016 doi: 10.3791/53988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zhang Z, Kenny SJ, Hauser M, Li W, Xu K. Ultrahigh-throughput single-molecule spectroscopy and spectrally resolved super-resolution microscopy. Nature Methods. 2015;12:935–938. doi: 10.1038/nmeth.3528. [DOI] [PubMed] [Google Scholar]

Decision letter

Editor: Antoine M van Oijen1
Reviewed by: Peter Dedecker2

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Structured illumination microscopy combined with machine learning for the high throughput analysis of virus structure" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Arup Chakraborty as the Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: Peter Dedecker (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. We hope you will be able to submit the revised version within two months.

Summary:

The authors describe the use of TIRF-SIM to visualize structural heterogeneity within populations of viral particles and use machine-learning approaches to classify the different morphologies. With the speed of data acquisition and the automated classification, the method is an interesting approach to introduce robust and fast quality control of production of viral particles in an industrial setting (e.g. vaccine production).

Essential revisions:

1) Since this manuscript is a submission for the Tools and Resources section of the journal, there is less of a requirement for new mechanistic insight, but an essential need for a clear justification of why the methodology will be better than what is currently used. The reviewers commented on a lack of clear justification of the work and the need to place it in context of methods that are currently used to characterise morphologies in heterogeneous populations of viral particles. Importantly, the work should provide a quantitative analysis of how the described workflow performs compared to established workflows (the authors mention electron microscopy).

2) Also, the manuscript lacks detail in many places to truly understand what the authors did and in what order.

Not until the Materials and methods does it become clear what was being imaged in the paper, and even then, a description of the primary antibodies is not very informative with respect to what structures are being identified (envelope, capsid…?). A brief mention of what is being measured early would greatly improve the readability of the paper.

More details regarding the high throughput nature of the measurements are needed. How do the authors achieve the 500 particles/second? Do they have 1000 particles in a field of view? (1000 particles/1.8 s data collection time). How long does it take to switch between fields of view? How long does the software need to extract the 500 super resolution particles? The authors should also mention how long the analysis takes (both for training and analyzing the data with the trained algorithm).

3) The described method for feature selection hinges on the evaluation of their predictive power, though no clear definition is provided of how this is calculated.

The main issue is with Figure 4C, showing the radii of large and small spherical particles. First of all, LS and SS both seem to have the same average radius. Why? The authors also remark that "It may first appear surprising that the distribution of radius of small spherical lies within that of the large spherical but in the ELM analysis the broadening of the image structure due to the finite optical resolution is effectively removed by taking into account the point spread function (PSF)." We don't understand this argument. Consider also that the example SS and LS images (e.g. Figure 4A) are clearly different in size. Are these images then not representative?

In the next sentence, the authors argue that the SS distribution is centered on the optical resolution of the instrument, suggesting that the SS particles are effectively point-like. First of all, the LS distribution is also centered on this value, so the conclusion is that these are also point-like? Second of all, the ELM analysis is supposed to correct for this, judging by the immediately preceding sentence? It is unclear what the authors are trying to argue here.

4) When analyzing the viruses in the pool harvested fluid, the authors should quantify the percentage of unknown objects compared to measurements with purified samples. The authors could also use their method to see what impact purification has on the distribution of structures in the sample.

5) Figure 4. Extraction of the size distribution is an interesting aspect of this work. However, it is problematic that the radius of the larger spheres is often smaller than the small spheres. Particularly for spherical particles, it is possible to determine radii with better resolution than the resolution limit when using deconvolution especially for the virus, which is on the size scale of the PSF. If it is really a point-like structure, this suggest that you are not measuring viruses. Could it be unspecific binding of the secondary antibodies to the surface or individual proteins that are not associated to the viruses in this class?

eLife. 2018 Dec 13;7:e40183. doi: 10.7554/eLife.40183.021

Author response


Essential revisions:

1) Since this manuscript is a submission for the Tools and Resources section of the journal, there is less of a requirement for new mechanistic insight, but an essential need for a clear justification of why the methodology will be better than what is currently used. The reviewers commented on a lack of clear justification of the work and the need to place it in context of methods that are currently used to characterise morphologies in heterogeneous populations of viral particles. Importantly, the work should provide a quantitative analysis of how the described workflow performs compared to established workflows (the authors mention electron microscopy).

We have now included a comparison between electron microscopy and our approach with a number of comments throughout the text and have also included a table summarising important performance parameters of the two methods (Table 1).

“Typically, this is achieved by extracting batches from the production process, with elaborate subsequent purification and preparation steps before characterisation by Transmission Electron Microscopy (TEM) (Gad, 2007; Goldsmith and Miller 2009; Brenner and Horne, 1959. […] It is therefore challenging for TEM to be of practical use during production operations.”

“Traditionally, EM has been the method of choice for observing sub-diffraction structures of virus particles (see Figure 1—figure supplement 2 for examples of particles). […] Also, the capability of investigating unpurified and aqueous samples makes TIRF-SIM ideally suited to the present application.”

“We were able to image up to ~220 particles/second at 90 nm resolution, vastly increasing imaging throughput compared to alternative super-resolution methods, improving sensitivity and specificity in comparison to TEM. Furthermore EM does not feature the specificity to analyse virus samples in their aqueous, unaltered unpurified forms.”

2) Also, the manuscript lacks detail in many places to truly understand what the authors did and in what order.

Not until the Materials and methods does it become clear what was being imaged in the paper, and even then, a description of the primary antibodies is not very informative with respect to what structures are being identified (envelope, capsid…?). A brief mention of what is being measured early would greatly improve the readability of the paper.

We have worked on the main text to improve clarity and flow of information. Regarding antibodies, NDV viruses were labelled by targeting Hemagglutinin-Neuraminidase (HN) via primary/secondary antibody labelling. HN is a viral envelope glycoprotein and therefore identifies the outline and overall shape of the virus particles. Additional details were added to the Results and Materials and methods sections.

“NDV viruses were labelled for the envelope glycoprotein Hemagglutinin-Neuraminidase (HN) and imaged with all three SRM imaging techniques (see Figure 1). […] Typical shapes observed with TIRF-SIM are shown in Figure 1B. TIRF-SIM provides clear structural details to discern filamentous, spherical and rod-like structures in large NDV populations.”

“The images obtained with TIRF-SIM show a number of stereotypical virus structures in NDV samples labelled for HN, […]”

For influenza, we would like to clarify that primary antibodies were all MedImmune inhouse, non-commercially available monoclonals that target the viral glycoprotein Hemagglutinin (HA) present on the exterior of the viral envelope. Additional details were added to the Results and Materials and methods sections.

“We applied our approach to four different strains of Live Attenuated Influenza Virus (LAIV) immuno-labelled for the glycoprotein Hemagglutinin (HA) present on the exterior of the viral envelope. The shape of the virus particles

obtained here were classified using the same classifier as for NDV.”

More details regarding the high throughput nature of the measurements are needed. How do the authors achieve the 500 particles/second? Do they have 1000 particles in a field of view? (1000 particles/1.8 s data collection time). How long does it take to switch between fields of view? How long does the software need to extract the 500 super resolution particles? The authors should also mention how long the analysis takes (both for training and analyzing the data with the trained algorithm).

We agree that aspects of imaging throughput should have been presented in more detail. We have now added a discussion on throughput achievable with our method in a supplementary note (Supplementary Note 1). The value originally quoted as 500 particles/s was based on an estimation of the maximal achievable throughput for the imaging conditions presented in the manuscript. As the reviewers point out, this estimation is based on ~1000 / field-of-view and a ~2s acquisition, conditions that are readily achievable in practice. In the examples shown, the microscope stage was moved and the field refocused manually, which took an average of ~2-3s. This therefore brings the actual imaging throughput to ~ 220 particles/s, which is what we now quote in the manuscript. However this value can be easily improved on through a reduction of acquisition times and automation of stage movement. This makes the 500 particles/s not an unreasonable estimation for a possible throughput.

Supplementary Note 1 also includes estimations of the time necessary for the different steps of the workflow proposed here.

3) The described method for feature selection hinges on the evaluation of their predictive power, though no clear definition is provided of how this is calculated.

Here, we defined the predictive power as the accuracy of the model. This describes the fraction of correctly classified particles, which corresponds to the sum of the correctly predicted fractions in the confusion matrix shown in Figure 3C. We now name our metric “accuracy” (instead of predictive power) and give a definition at its first instance.

“The choice of algorithm and the set of features (often called predictors) extracted for each identified particle were optimised to maximise the overall accuracy of the model based on the training dataset (comprising of 370 manually annotated particles). Here, the model accuracy is defined as the fraction of correctly classified particles across all classes.”

We also explicitly define what we mean by accuracy in the Materials and methods section.

“The accuracy of the model was estimated by calculating the fraction of correctly classified particles across all classes.

accuracy=NumberofparticlescorrectlyclassifiedTotalnumberofparticles

The main issue is with Figure 4C, showing the radii of large and small spherical particles. First of all, LS and SS both seem to have the same average radius. Why? The authors also remark that "It may first appear surprising that the distribution of radius of small spherical lies within that of the large spherical but in the ELM analysis the broadening of the image structure due to the finite optical resolution is effectively removed by taking into account the point spread function (PSF)." We don't understand this argument. Consider also that the example SS and LS images (e.g. Figure 4A) are clearly different in size. Are these images then not representative?

We agree that the way we presented the results for the spherical structures was confusing. The confusion came from extracting the radii for large and small spherical particles using different analysis methods and plotting them on the same graph. In the original manuscript, the radius obtained for the small spherical was extracted from an area analysis whereas the radius shown for the large spherical originated from the ELM analysis. The radii obtained by these 2 methods represent two different structural parameter for the images.

We have therefore changed Figure 4 to show only radius estimation by area analysis for both small and large spherical particles in Figure 4C.

The average diameters are now clearly separated: DLS = 338 ± 94 nm, DSS = 190 ± 10 nm.

We also added the following clarifications.

“The spherical structures were analysed by estimating their equivalent radius from the area of the particle. We note that other methods for estimation of the radius, e.g. the ellipsoid localization microscopy (ELM) analysis (Manetsberger et al., 2015), could also be used here.”

And the confusing section has now been deleted.

The ELM analysis now appears in Figure 4—figure supplement 1A and is introduced in the main text as the following paragraph.

“However, we note that the radius analysis based on the area of the particle used here constitutes an overestimate of the physical radius of the particle due to the broadening caused by the point-spread function. […] The ELM diameter obtained for the large spherical particles (220 nm ± 69 nm) is in good agreement with an area-based diameter of 338 nm and a resolution of 90 nm.”

In the next sentence, the authors argue that the SS distribution is centered on the optical resolution of the instrument, suggesting that the SS particles are effectively point-like. First of all, the LS distribution is also centered on this value, so the conclusion is that these are also point-like? Second of all, the ELM analysis is supposed to correct for this, judging by the immediately preceding sentence? It is unclear what the authors are trying to argue here.

We hope that the amended presentation of the data addresses this point clearly. By using the equivalent (area-based) radius analysis for both small and large spherical particles, the distinction between the two classes is now clear, see the amended Figure 4.

The ELM analysis is now presented as an alternative analysis for the estimation of the radius in Figure 4—figure supplement 1. See the comment above and comment 5) below for further discussion on ELM.

4) When analyzing the viruses in the pool harvested fluid, the authors should quantify the percentage of unknown objects compared to measurements with purified samples. The authors could also use their method to see what impact purification has on the distribution of structures in the sample.

We now present the results of the structural analysis of B-Victoria from PHF in Figure 5—figure supplement 1B and describe our observations in terms of unknown objects and changes in distribution of structures.

“The high molecular specificity of fluorescence microscopy allowed us to visualize the structure of the viruses with the same image quality directly in PHF despite the presence of a large amount of impurities (Figure 5—figure supplement 1A). […] This is also reflected by the larger average diameters observed in the MVB compared to the PHF (DSS = 191 ± 12 nm and 198 ± 14 nm and DLS = 241 ± 56 nm and 275 ± 49 nm for PHF and MVB respectively).”

5) Figure 4. Extraction of the size distribution is an interesting aspect of this work. However, it is problematic that the radius of the larger spheres is often smaller than the small spheres. Particularly for spherical particles, it is possible to determine radii with better resolution than the resolution limit when using deconvolution especially for the virus, which is on the size scale of the PSF. If it is really a point-like structure, this suggest that you are not measuring viruses. Could it be unspecific binding of the secondary antibodies to the surface or individual proteins that are not associated to the viruses in this class?

With the amended analysis, now using equivalent diameter for both large and small spherical particles, the two classes are clearly discriminated and, as expected, the radii of large spherical particles are distributed around a range of radii larger than those of the small spherical particles (see amended Figure 4C). The reviewers mention the possibility to extract radius with better precision than the resolution limit. This is the intention with the ELM analysis, which takes the PSF into account (to some extent in a similar way to a deconvolution operation, but here based on a parametric shape modelling of the sample). The ELM analysis is now shown in Figure 4—figure supplement 1 for both the large spherical of NDV and that of B-Victoria.

As for unspecific binding, we excluded this possibility of this by measuring a control sample (sample prepared identically to the virus samples but in the absence of virus particles) and having a thresholding step prior to further analysis to exclude background signal. This point was made clearer in the main text:

“A control sample that was prepared identically to the other samples except without virus particles present allowed us to identify that non-specific bindings of antibodies appear as rare, dim and small point-like structures that could easily be discriminated and excluded from further analysis.”

As well as in the Materials and methods:

“The particles that were judged too small or too dim to be real particles (based on criteria obtained from the control sample) were excluded from further analysis.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Figure 4—source data 1. Source data for Figure 4.
    elife-40183-fig4-data1.xlsx (118.6KB, xlsx)
    DOI: 10.7554/eLife.40183.013
    Figure 5—source data 1. Source data for Figure 5.
    elife-40183-fig5-data1.xlsx (168.1KB, xlsx)
    DOI: 10.7554/eLife.40183.016
    Transparent reporting form
    DOI: 10.7554/eLife.40183.018

    Data Availability Statement

    Analysis code for image segmentation, machine learning training, classification and structural analysis is available via GitHub (https://github.com/Romain-Laine/MiLeSIM; copy archived at https://github.com/elifesciences-publications/MiLeSIM). Source data files have been provided for Figures 4 and 5.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES