Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Nov 24;51(12):2489–2491. doi: 10.1080/02664763.2023.2286426

Post-shrinkage strategies in statistical and machine learning for high dimensional data

by Syed Ejaz Ahmed, Feryaal Ahmed and Bahadir Yüzbaşı, New York, Imprint Chapman and Hall/CRC Press, 2023, Pages 408, US $112.50 (Hardback), US $48.71 (eBook), ISBN 9780367763442, eBook ISBN 9781003170259, https://doi.org/10.1201/9781003170259

Reviewed by: Shuangzhe Liu 1,, Tiefeng Ma 2
Post-shrinkage strategies in statistical and machine learning for high dimensional data, by  AhmedSyed Ejaz, AhmedFeryaal and YüzbaşıBahadir ,  New York,  Imprint Chapman and Hall/CRC Press,  2023,  Pages 408, US $112.50 (Hardback), US $48.71 (eBook), ISBN  9780367763442,  eBook ISBN  9781003170259,   https://doi.org/10.1201/9781003170259 
PMCID: PMC11389623

In the contemporary landscape of data-driven insights, the challenges of extracting meaningful patterns from vast datasets have become paramount. Addressing the complexities of high-dimensional data analysis, Syed Ejaz Ahmed, Feryaal Ahmed and Bahadir Yüzbaşı’s (2023) book ‘Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data’ published by Chapman and Hall/CRC Press offers an exploration into obtaining valuable insights and navigating the intricate landscape of modern data science. With a focus on both theory and practical applications, the book offers invaluable understandings for a diverse audience, including statisticians, data analysts, practitioners, and students across various disciplines.

The book’s primary objective spins around sorting out the complexities inherent in high-dimensional data analysis. It addresses questions that resonate with today’s data analysts: How can we harness the power of big data to unveil meaningful patterns? How do we effectively analyse data with an abundance of predictors and a limited number of observations? The authors carefully guide readers through a number of tools, techniques, challenges, and opportunities, enabling them to make informed decisions and contribute to this ever-evolving field of data science.

A notable feature of this book is its framework for high-dimensional shrinkage analysis, which accommodates both strong and weak signals within a non-binary selection process. This approach empowers practitioners to analyse data in scenarios where sparsity might not be fully attainable. By seamlessly integrating statistical and machine learning techniques, the authors unlock optimal results, presenting strategies for post-estimation and prediction that uphold a mild assumption of weak sparsity. These strategies are tailored to various models with practical applications in applied statistics and data science, including the techniques of LASSO, Adaptive LASSO, Elastic Net, Smoothly Clipped Absolute Deviation, Minimax Concave Penalty, Ridge, and Bridge estimators.

Central to the book’s narrative is the exploration of bias issues commonly encountered in machine learning and statistical practices. By introducing an effective methodology of shrinkage strategies, the authors mitigate bias by combining submodels selected through penalised methods with feature-rich models.

The book caters comprehensively to students, researchers, and practitioners across a number of disciplines. Its lucid definitions and accessible strategies for estimation and prediction within data science and regression analysis render it a valuable resource for advanced undergraduate and graduate courses. Moreover, the book’s wealth of tangible ideas, techniques, real-life examples, graphs, and R codes makes it a key practical reference.

The 10 chapters cover a range of topics in both low and high-dimensional data analysis. Chapter 1 provides an introduction to big data, with a specific focus on high-dimensional data analysis and relevant regularisation approaches, setting the stage for subsequent discussions. Chapter 2 delves into popular statistical and machine learning techniques, including unsupervised learning methods like principal component analysis and k-means clustering. This chapter is well written and provided a clear review of the techniques.

Chapter 3 examines estimation strategies for parameters in multiple regression models, emphasising the integration of full model and submodel estimation through classical shrinkage strategies. It considers the scenario where models encompass strong, weak, and zero signals. Chapter 4 addresses model selection, parameter estimation, and prediction problems in high-dimensional regression models, where the sample size is smaller than the number of data elements linked to each observation. The chapter also explores these problems in sparse regression models when many potential predictors may have little or no influence on the response of interest.

Chapters 5 to 9 explore the application of shrinkage strategies to various models, including partially linear models, generalised linear models, sparse linear mixed models, sparse nonlinear regression models, and sparse robust regression models. These chapters incorporate Monte Carlo simulation studies to assess the relative performance of the discussed estimators and provide results on the asymptotic distributional bias, quadratic asymptotic distributional bias, and asymptotic distributional risk of these estimators.

In the final chapter, the relative performance of the Liu estimator and its variants, along with shrinkage and penalty estimators, is examined within the context of multiple linear regression models that exhibit multicollinearity in the data matrix while possibly being sparse. The chapter presents the asymptotic theory of the Liu regression estimator and shrinkage estimators, with numerical analysis based on simulation studies corroborating the conclusions drawn from the asymptotic properties of the estimators.

Recently, there have been notable monographs that contribute significantly to the field. For instance, [1] provides historical context for pretest and shrinkage strategies, crucial elements in statistical estimation. [2] concentrates on accurate point and loss estimation of mean vectors in multivariate normal and spherically symmetric distributions. [4] offers a unified approach to both low and high-dimensional models, with a focus on mean matrix dimensions and decision-theoretic covariance matrix estimation. Additionally, a practical guide by [3] covers rank-based statistical estimation methods, effectively bridging theory and application.

The book ‘Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data’ fills a critical gap, examining shrinkage strategy against penalty, full model, and submodel estimators. It offers foundational insights into model selection and estimation challenges, uniting theory with practicality. A valuable resource for statisticians and practitioners navigating complex big data scenarios.

In conclusion, this book stands as a timely contribution. Its appeal spans across practitioners, researchers, and graduate students engrossed in high-dimensional data analysis and estimation strategies. Notably, the authors’ dedication of the book to the memory of Don A. Fraser and Kjell Doksum, accompanied by a rare picture, adds a heartfelt touch to a work deeply committed to advancing the boundaries of statistical and machine learning knowledge.

References

  • 1.Ahmed S.E., Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation, Springer, Cham, Switzerland, 2014. [Google Scholar]
  • 2.Fourdrinier D., Strawderman W.E., and Wells M.T., Shrinkage Estimation, Springer, Cham, Switzerland, 2018. [Google Scholar]
  • 3.Saleh A.K.Md.E., Arashi M., Saleh R.A., and Norouzirad M., Rank-Based Methods for Shrinkage and Selection: With Application to Machine Learning, Wiley, Hoboken, NJ, USA, 2022. [Google Scholar]
  • 4.Tsukuma H. and Kubokawa T., Shrinkage Estimation for Mean and Covariance Matrices, Springer, Singapore, 2020. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES