Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Mar 11;121(8):4561–4677. doi: 10.1021/acs.chemrev.0c00752

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2021 The Authors. Published by American Chemical Society

Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Basic principles of data analysis: data exploration and statistical modeling. (A) Input and output data matrix. (B) Scatterplot of the number of cells as a function of stiffness and WCA. Every sample is one data point. (C) Hierarchical clustering of the 6 data points into a dendrogram. The dendrogram or tree can be cut at any height to determine data clusters. Cutting the tree at the highest level (dashed line) results in 2 clusters (samples 1–3–6 and 2–4–5). Cutting the tree at a lower level (broken line) results in 4 clusters (samples 1–3, 6, 2–4, and 5). (D) On the basis of stiffness, the data points can be classified into two groups: high # cells and low # cells. (E) Simple linear regression where the dependent variable Y (here: the amount of cells) is predicted using the independent variable X (here: stiffness). (F) Classification of data points into two classes (open and solid circles) with an underfitted (gray line), reasonable (black curve), and overfitted (dashed curve) decision boundaries. The overfit is influenced by one black data point (arrow) and would classify a new point (blue dot) as a solid circle, which would probably be an error. (G) Example of K-fold cross-validation with K = 5. Reprinted with permission from refs (55) and (972). Copyright 2017 Elsevier, Ltd. and 2016 Nature Publishing Group.