Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Mar 13;12076:97–118. doi: 10.1007/978-3-030-45234-6_5

An Empirical Study on the Use and Misuse of Java 8 Streams

Raffi Khatchadourian 10,11,, Yiming Tang 11, Mehdi Bagherzadeh 12, Baishakhi Ray 13
Editors: Heike Wehrheim8, Jordi Cabot9
PMCID: PMC7418129

Abstract

Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5:53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.

Keywords: empirical studies, functional programming, Java 8, streams, multi-paradigm programming, static analysis.

Contributor Information

Heike Wehrheim, Email: wehrheim@upb.de.

Jordi Cabot, Email: Jordi.cabot@icrea.cat.

Raffi Khatchadourian, Email: raffi.khatchadourian@hunter.cuny.edu.

Yiming Tang, Email: ytang3@gradcenter.cuny.edu.

Mehdi Bagherzadeh, Email: mbagherzadeh@oakland.edu.

Baishakhi Ray, Email: rayb@cs.columbia.edu.

References

  • 1.Ahmed, S., and Bagherzadeh, M.: What Do Concurrency Developers Ask About?: A Large-scale Study Using Stack Overflow. In: International Symposium on Empirical Software Engineering and Measurement, 30:1–30:10 (2018). 10.1145/3239235.3239524
  • 2.AOL: AOL/cyclops: An advanced, but easy to use, platform for writing functional applications in Java 8. (2019). http://git.io/fjxzF (visited on 08/29/2019)
  • 3.Bagherzadeh, M., and Khatchadourian, R.: Going Big: A Large-scale Study on What Big Data Developers Ask. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 432–442. ACM, Tallinn, Estonia (2019). 10.1145/3338906.3338939
  • 4.Bagherzadeh, M., and Rajan, H.: Order Types: Static Reasoning About Message Races in Asynchronous Message Passing Concurrency. In: International Workshop on Programming Based on Actors, Agents, and Decentralized Control, pp. 21–30 (2017). 10.1145/3141834.3141837
  • 5.Biboudis, A., Palladinos, N., Fourtounis, G., and Smaragdakis, Y.: Streams a la carte: Extensible Pipelines with Object Algebras. In: European Conference on Object-Oriented Programming, pp. 591–613 (2015). 10.4230/LIPIcs.ECOOP.2015.591
  • 6.Bloch, J.: Effective Java. Prentice Hall, Upper Saddle River, NJ, USA (2018)
  • 7.Bordet, S.: Pull Request #2837 Inline graphic eclipse/jetty.project, Webtide. (2018). http://git.io/JeBAF (visited on 10/20/2019)
  • 8.Casalnuovo, C., Devanbu, P., Oliveira, A., Filkov, V., and Ray, B.: Assert Use in GitHub Projects. In: International Conference on Software Engineering. ICSE ’15, pp. 755–766. IEEE Press, Florence, Italy (2015). http://dl.acm.org/citation.cfm?id=2818754.2818846
  • 9.Casalnuovo, C., Suchak, Y., Ray, B., and Rubio-González, C.: GitcProc: A Tool for Processing and Classifying GitHub Commits. In: International Symposium on Software Testing and Analysis. ISSTA 2017, pp. 396–399. ACM, Santa Barbara, CA, USA (2017). 10.1145/3092703.3098230
  • 10.Dean, J., and Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008). 10.1145/1327452.1327492
  • 11.Dyer, R., Rajan, H., Nguyen, H.A., and Nguyen, T.N.: Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In: International Conference on Software Engineering. ICSE 2014, pp. 779–790. ACM, Hyderabad, India (2014)
  • 12.Eclipse Foundation: Eclipse Java development tools (JDT), Eclipse Foundation. (2019). http://eclipse.org/jdt (visited on 10/19/2019)
  • 13.Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B.: Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In: Symposium on Operating Systems Principles. SOSP ’01, pp. 57–72. ACM, Banff, Alberta, Canada (2001). 10.1145/502034.502041
  • 14.EPFL: Collections–Mutable and Immutable Collections–Scala Documentation, (2017). http://scala-lang.org/api/2.12.3/scala/collection/index.html (visited on 08/24/2018)
  • 15.Erdfelt, J.: Pull Request #2837 Inline graphic eclipse/jetty.project, Eclipse Foundation. (2018). http://git.io/JeBAM (visited on 10/20/2019)
  • 16.Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., and Geay, E.: Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology 17(2), 91–934 (2008). 10.1145/1348250.1348255
  • 17.Gharbi, S., Mkaouer, M.W., Jenhani, I., and Messaoud, M.B.: On the Classification of Software Change Messages Using Multi-label Active Learning. In: Symposium on Applied Computing. SAC ’19, pp. 1760–1767. ACM, Limassol, Cyprus (2019). 10.1145/3297280.3297452
  • 18.Jin, H., Qiao, K., Sun, X.-H., and Li, Y.: Performance Under Failures of MapReduce Applications. In: International Symposium on Cluster, Cloud and Grid Computing. CCGRID ’11, pp. 608–609. IEEE Computer Society, Washington, DC, USA (2011). 10.1109/ccgrid.2011.84
  • 19.Kavulya, S., Tan, J., Gandhi, R., and Narasimhan,P.: An Analysis of Traces from a Production MapReduce Cluster. In: International Conference on Cluster, Cloud and Grid Computing. CCGrid 2010, pp. 94–103. IEEE, Melbourne, Australia (2010). 10.1109/CCGRID.2010.112
  • 20.Ketkar, A., Mesbah, A., Mazinanian, D., Dig, D., and Aftandilian, E.: Type Migration in Ultra-large-scale Codebases. In: International Conference on Software Engineering. ICSE ’19, pp. 1142–1153. IEEE Press, Montreal, Quebec, Canada (2019). 10.1109/ICSE.2019.00117
  • 21.Khatchadourian, R., and Masuhara, H.: Automated Refactoring of Legacy Java Software to Default Methods. In: International Conference on Software Engineering, pp. 82–93 (2017). 10.1109/ICSE.2017.16
  • 22.Khatchadourian, R., and Masuhara, H.: Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods. In: International Conference on the Art, Science, and Engineering of Programming, 6:1–6:30 (2018). 10.22152/programming-journal.org/2018/2/6
  • 23.Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: A Tool for Optimizing Java 8 Stream Software via Automated Refactoring. In: International Working Conference on Source Code Analysis and Manipulation, pp. 34–39 (2018). 10.1109/SCAM.2018.00011
  • 24.Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. In: International Conference on Software Engineering. ICSE ’19, pp. 619–630. IEEE Press (2019). 10.1109/ICSE.2019.00072
  • 25.Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ray, B.: An Empirical Study on the Use and Misuse of Java 8 Streams, (2020). 10.5281/zenodo.3677449. Feb. 2020.
  • 26.Kochhar, P.S., and Lo, D.: Revisiting Assert Use in GitHub Projects. In: International Conference on Evaluation and Assessment in Software Engineering. EASE’17, pp. 298–307. ACM, Karlskrona, Sweden (2017). 10.1145/3084226.3084259
  • 27.Lau, J.: Future of Java 8 Language Feature Support on Android. Android Developers Blog (2017). http://android-developers.googleblog.com/2017/03/future-of-java-8-language-feature.html (visited on 08/24/2018)
  • 28.Lu, S., Park, S., Seo, E., and Zhou, Y.: Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 329–339. ACM (2008). 10.1145/1346281.1346323
  • 29.Lucas, W., Bonifácio, R., Canedo, E.D., Marcílio, D., and Lima, F.: Does the Introduction of Lambda Expressions Improve the Comprehension of Java Programs? In: Brazilian Symposium on Software Engineering. SBES 2019, pp. 187–196. ACM, Salvador, Brazil (2019). 10.1145/3350768.3350791
  • 30.Luontola, E.: Pull Request #140 Inline graphic orfjackal/retrolambda, Nitor Creations. (2018). http://git.io/JeBAQ (visited on 10/20/2019)
  • 31.Marin, M., Moonen, L., and Deursen, A. van: An Integrated Crosscutting Concern Migration Strategy and its Application to JHotDraw. In: International Working Conference on Source Code Analysis and Manipulation (2007)
  • 32.Mazinanian, D., Ketkar, A., Tsantalis, N., and Dig, D.: Understanding the Use of Lambda Expressions in Java. Proc. ACM Program. Lang. 1(OOPSLA), 85:1–85:31 (2017). 10.1145/3133909
  • 33.Microsoft: LINQ: .NET Language Integrated Query, (2018). http://msdn.microsoft.com/en-us/library/bb308959.aspx (visited on 08/24/2018)
  • 34.Moncsek, A.: allow OnShow when Perspective is initialized, fixed issues with OnShow/OnHide in perspective Inline graphic JacpFX/JacpFX@f2d92f7, JacpFX. (2015). http://git.io/Je0X8 (visited on 10/24/2019)
  • 35.Naftalin, M.: Mastering Lambdas: Java Programming in a Multicore World. McGraw-Hill (2014)
  • 36.Nielebock, S., Heumüller, R., and Ortmeier, F.: Programmers Do Not Favor Lambda Expressions for Concurrent Object-oriented Code. Empirical Softw. Engg. 24(1), 103–138 (2019). 10.1007/s10664-018-9622-9
  • 37.Oracle: Collectors (Java Platform SE 10 & JDK 10)–groupingByConcurrent, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Collectors.html#groupingByConcurrent(java.util.function.Function) (visited on 08/29/2019)
  • 38.Oracle: HashSet (Java SE 9) & JDK 9, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/HashSet.html (visited on 04/07/2018)
  • 39.Oracle: java.util.stream (Java SE 9 & JDK 9), (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html (visited on 02/22/2020)
  • 40.Oracle: java.util.stream (Java SE 9 & JDK 9)–Parallelism, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html#Parallelism (visited on 02/22/2020)
  • 41.Oracle: Stream (Java Platform SE 10 & JDK 10)–forEach, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#forEach(java.util.function.Consumer) (visited on 08/29/2019)
  • 42.Oracle: Thread Interference, (2017). http://docs.oracle.com/javase/tutorial/ essential/concurrency/interfere.html (visited on 04/16/2018)
  • 43.Parnin, C., Bird, C., and Murphy-Hill, E.: Adoption and Use of Java Generics. Empirical Softw. Engg. 18(6), 1047–1089 (2013). 10.1007/s10664-012-9236-6
  • 44.Refsnes Data: JavaScript Array map() Method, (2015). http://w3schools.com/jsref/jsrefmap.asp (visited on 02/22/2020)
  • 45.Rutledge, P.: Pull Request #1 Inline graphic RutledgePaulV/monads, Vodori. (2018). http://git.io/JeBAZ (visited on 10/20/2019)
  • 46.Sangle, S., and Muvva, S.: On the Use of Lambda Expressions in 760 Open Source Python Projects. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1232–1234. ACM, Tallinn, Estonia (2019). 10.1145/3338906.3342499
  • 47.Shilkov, M.: Introducing Stream Processing in F#, (2016). http://mikhail.io/2016/11/introducing-stream-processing-in-fsharp (visited on 07/18/2018)
  • 48.Stack Overflow: Newest ‘java-stream’ Questions, (2018). http://stackoverflow.com/questions/tagged/java-stream (visited on 03/06/2018)
  • 49.Strom, R.E., and Yemini, S.: Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering SE-12(1), 157–171 (1986). 10.1109/tse.1986.6312929
  • 50.Tian, Y., and Ray, B.: Automatically Diagnosing and Repairing Error Handling Bugs in C. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2017, pp. 752–762. ACM, Paderborn, Germany (2017). 10.1145/3106237.3106300
  • 51.Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., and Daleiden, P.: An empirical study on the impact of C++ lambdas and programmer experience. In: International Conference on Software Engineering. ICSE ’16, pp. 760–771. ACM, Austin, Texas (2016). 10.1145/2884781.2884849
  • 52.WALA Team: T.J. Watson Libraries for Analysis, (2015). http://wala.sf.net (visited on 01/18/2017)
  • 53.Warburton, R.: Java 8 Lambdas: Pragmatic Functional Programming (2014)
  • 54.Weiss, T.: Java 8: Behind The Glitz and Glamour of The New Parallelism APIs. OverOps Blog (2014). http://blog.overops.com/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour (visited on 10/20/2019)
  • 55.Wilkins, G.: Issue #3681 Inline graphic eclipse/jetty.project@70311fe, Webtide, LLC. (2019)
  • 56.Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw Inline graphic Pull Request #3682 Inline graphic eclipse/jetty.project, Webtide, LLC. (2019). http://git.io/JeBAq (visited on 09/18/2019)
  • 57.Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw Inline graphic Pull Request #3682 Inline graphic eclipse/jetty.project. Comment, Webtide, LLC. (2019). http://git.io/Je0MS (visited on 10/24/2019)
  • 58.Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., and Zhou, L.: Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs. In: ICSE Companion, pp. 44–53 (2014). 10.1145/2591062.2591177
  • 59.Zhitnitsky, A.: How Java 8 Lambdas and Streams Can Make Your Code 5 Times Slower. OverOps Blog (2015). http://blog.overops.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower (visited on 10/20/2019)
  • 60.Zhou, H., Lou, J.-G., Zhang, H., Lin, H., Lin, H., and Qin, T.: An Empirical Study on Quality Issues of Production Big Data Platform. In: International Conference on Software Engineering. ICSE 2015, pp. 17–26. ACM, Florence, Italy (2015)

Articles from Fundamental Approaches to Software Engineering are provided here courtesy of Nature Publishing Group

RESOURCES