Abstract

When microliter drops of salt solutions dry on nonporous surfaces, they form erratic yet characteristic deposit patterns influenced by complex crystallization dynamics and fluid motion. Using OpenAI’s image-enabled language models, we analyzed deposits from 12 salts with 200 images per salt and per model. GPT-4o classified 57% of the salts accurately, significantly outperforming random chance and GPT-4o mini. This study underscores the promise of general-use AI tools for reliably identifying salts from their drying patterns.
1. Introduction
The macroscopic patterns and compositional variations formed by drying solutions have long fascinated scientists. For instance, chemistry’s first Nobel laureate, van ‘t Hoff, aimed to predict the mineral sequence of evaporating seawater, partly motivated by economically important potash deposits.1 Another example is the creeping salt phenomenon, where dissolved salts migrate along surfaces during evaporation, leading to distinctive deposition patterns influenced by capillary forces and crystallization dynamics.2
A specific variant of this type of pattern formation are the deposit structures formed by sessile evaporating drops of solutions and dispersions. An iconic case is the coffee-ring effect, where particles suspended in an evaporating droplet migrate to the periphery, leaving a characteristic ring-like deposit.3 Lesser-known examples include self-lifting macroscopic NaCl crystals, where smaller crystals grow into “legs” that push larger crystals upward from a hydrophobic surface during the final stages of evaporation.4
The diverse dynamics of deposit formation have inspired studies on using drying patterns to reveal liquid compositions. For example, deposits from tap water and alcoholic beverages have been analyzed for compositional features. In diagnostics, fern-like patterns in dried tears have been linked to dry-eye disease,5 while dried blood patterns show potential for detecting leukemia and anemia.6 Similarly, deposits of KCl or KCl/MgCl2 mixed with urine have been analyzed with deep neural networks to assist in diagnosing bladder cancer.7
Our groups have studied the deposit patterns formed by 10 μL salt solutions on glass. The first study, analyzed 12 inorganic salts based on a total of 6,000 images using a reduction of each photo to 16 image metrics.8 Simple optimization of Euclidian distances in the resulting 16-dimensional morphospace yielded prediction accuracies of 90% from single test images. Machine learning analyses of 42 different salts and only 14 training images, still yielded correct salt identification in 75% of all cases.
Very recently, we developed an automated experimental method for the rapid collection of large image libraries of deposit patterns.9 This robotic drop imager (RODI) allowed the analysis of seven different salts at five different initial concentrations. Using a deep learning method, higher-dimensional metrics vectors, and over 23,000 images, we achieved prediction accuracies of 98.6% for the salt type and 92.5% for the combined salt type and initial concentration.
Here, we investigate a subset of the image data in ref (8) using a very different approach: multimodal large language models (LLMs). Unlike traditional image analysis techniques, these models combine advanced natural language processing capabilities with image recognition by integrating visual inputs into their neural network architecture. Images are processed as input, allowing the models to extract and interpret features, which are then correlated with textual or structured data through a shared embedding space.10 In contrast, domain-adapted models such as Cephalo follow a more specialized strategy, using tailored architectures and data sets.11 Recent work on GPT-4’s vision capabilities has demonstrated success in tasks such as interpreting complex visual scenes and analyzing graphs, suggesting its potential for nonstandard applications like salt deposit analysis.12,13
While many image types could be used to assess multimodal LLMs, dried salt deposits present a particularly relevant and chemically meaningful challenge. Their patterns result from complex processes including evaporation, crystallization, and fluid flow—factors that make classification nontrivial even for experts. Unlike standard test images, these patterns are grounded in real physicochemical behavior, offering a unique opportunity to evaluate whether general-purpose models can extract compositional information from subtle morphological cues.
2. Methods
The GPT-4o and GPT-4o-mini models exhibit broad general knowledge but lack the specific training to recognize macroscopic images of salt deposits formed by evaporating droplets.14 To address this, a curated set of training images was included to familiarize the models with the characteristic patterns of each salt type. These images, generated in a laboratory setting, have been described previously.8 All images and the used Python script are available for download at refs (15) and (16), respectively.
OpenAI’s Application Programming Interfaces (APIs) were employed to facilitate batch processing, enabling multiple simultaneous requests with higher token limits and reduced costs compared to sequential approaches.17,18 Each request was stateless—no memory was retained between interactions19—therefore, the training images required for salt differentiation had to be included in every individual request. While this ensured consistent processing, it also substantially increased the token count, contributing to the overall computational cost.
For batch processing, a Javascript Object Notation Line (JSONL) file was prepared, with each line containing a valid JSON object representing one API request. Because of file size and token constraints, each trial required multiple JSONL files. Specifically, 200 images from each salt were randomly selected to capture morphological diversity; these images were then added to a JSONL file as individual requests. Consequently, each trial comprised 2,400 requests (200 images per salt for 12 salts), distributed across as many JSONL files as necessary. To reduce image repetition, we split the 200 images per salt into two sets of 100 images that contained no duplicates.
Each request contained several key parameters: (1) an ID combining the image’s filename (a randomly generated unique number) with its parent folder name, allowing later confirmation of the true salt identity; (2) the specified model name (either gpt-4o-mini-2024-07-18 or gpt-4o-2024-08-06); and (3) a temperature set to 0 and a seed set to 17 (arbitrarily chosen) to bring results closer to being deterministic, despite the inherently probabilistic nature of these models.20 Additionally, each request included the following system prompt:
“You are a helpful assistant who is knowledgeable about different types of salt crystals and can identify them from images. You can identify these 12 different salts: NaCl, KCl, NH4Cl, Na2SO4, K2SO4, NH4NO3, NaH2PO4, NaNO3, Na3PO4, KBr, KNO3, and RbCl.”
For training, each request was accompanied by 12 user messages passed as context, each containing five images and a line of text informing the model of the training images’ identities. These images were compressed and contained in Base64 format in every request (see Python script). A 13th user message stating, “Identify this salt with just the name.”, was then included alongside the test image, prompting the model to provide only a single-salt name as its output. While we constrained the model output to a single predicted label for consistency, vision models like GPT-4o are also capable of generating more structured responses (e.g., confidence scores) depending on the prompt. Such structured outputs and the potential for fine-tuning—particularly in open-source models—may offer valuable extensions for future implementations.
Once all JSONL files were programmatically created, they were submitted to the designated model. For convenience, the OpenAI web interface21 was utilized, offering the same functionality as the API. Initially, usage tier restrictions limited file size and permitted only one smaller file to be processed at a time; however, elevated usage tiers allowed multiple large files to be processed concurrently, circumventing rate limits. Each fully processed file produced an output JSONL file containing the response for every request in the input file. To analyze these results, the custom ID was used to match each image back to its correct salt type, and the first mention (typically the only one) of any of the 12 salts in the model’s output was recorded as the prediction.
3. Results and Discussion
Examples for the deposit patterns formed during the evaporation of the 12 saturated salt solutions are shown in Figure 1. Some of the salts form rather characteristic patterns, such as the NH4NO3 which is a creeping salt or NaCl and KCl which dry into small crystals. However, the variations of the deposit patterns formed by any given salt can be substantial and, as discussed in ref (8), some compounds even form two distinct pattern types, a phenomenon that was coined “bifurcated salts”.
Figure 1.
Evaporating solution drops create deposit patterns that depend on the dissolved salt. The figure shows examples for the 12 analyzed salts taken from the training sets with enhanced contrast. Scale bar: 1 cm (applies to all panels).
Our attempts to identify the salt type using GPT-4o-mini and 4o used photos similar to those in Figure 1. As detailed in the Methods section, each test was performed by providing five training images for each of the 12 salts and then testing the response with 200 random images for each salt and model. We reemphasize that the key steps of these identification attempts were plain English prompts.
Figure 2 shows the main results of our study as confusion matrices. The ordinate denotes the true salt name, while the abscissa shows the predicted chemical formulas. Accordingly, high numbers along the top-left to bottom-right diagonal indicate successful predictions. Notice that the numbers in each row add up to 200, i.e. the total number of images presented for identification.
Figure 2.
Confusion matrices summarizing the prediction outcomes of (a) GPT-4o-mini and (b) GPT-4o. The corresponding total prediction accuracies are (a) 10.5% and (b) 57.2%.
The performance of the GPT-4o-mini model is summarized in Figure 2a. The results are overall disappointing with a total prediction accuracy of 10.5% which is only marginally above the expectation of 8.3% for random guesses. Furthermore, we find that the model shows a peculiar preference for identifying patterns as Na3PO4 and to a lesser extent as NaNO3. Indeed, 55.7% of all queries are identified as Na3PO4, which in the first prompt’s list of salt occupies the unremarkable tenth position. The origin of this preference is unknown.
Despite the poor performance of the mini model, GPT-4o demonstrates a clear ability to carry out the image-based identification task. The confusion matrix for GPT-4o in Figure 2b shows large counts along the diagonal and corresponds to an overall prediction accuracy of 57.2%. Notice that the score for NaH2PO4 is a perfect 200 out of 200. The weakest result is found for KBr and KCl with correct scores of 5 out of 200 and typical misidentifications as NaCl. We note that visual inspection of the latter three salt patterns indicates close similarities (see Figure 1).
The performance of GPT-4o, however, remains below the 90–99% accuracy recently achieved in our group using deep learning methods,9 specifically a multilayer perceptron (MLP) trained on vectors of 47 image metrics representing the deposit patterns. Nonetheless, this comparison is limited, as the MLP was trained on over 20,000 labeled images and leveraged extensive feature engineering. In contrast, GPT-4o received only a small number of examples per salt and no task-specific adaptation. Its performance is therefore more reminiscent of one-shot (or few-example) learning, a hallmark of human visual reasoning.22
We also investigated whether the repeated runs for each model suggest statistically significant differences. For this analysis, we determined Cohen’s kappa coefficients (Table S1), a measure of inter-rater agreement that corrects for chance, thus allowing us to assess how consistently each model performed across multiple trials. For the two GPT-4o sets, we found κ = 0.962, indicating near-perfect agreement between the repeated runs and underscoring the model’s robust reproducibility. Similarly, the GPT-4o-mini sets yielded κ = 0.907.
Lastly, we computed F1 scores (Table S2), a single metric that balances precision and recall, offering a deeper understanding of how the models handle each salt classification under imbalanced conditions. The macro F1 scores for each model differed slightly from their overall accuracies, indicating that even with equal representation of all salts in our training set, certain salts pose more challenges. Notably, both GPT-4o and GPT-4o-mini had lower F1 scores for visually similar salts such as KBr, KCl, and NaCl. Nevertheless, GPT-4o consistently outperformed GPT-4o-mini, highlighting the enhanced capability of the larger model in this classification task.
4. Conclusions
Artificial intelligence tools offer unprecedented opportunities for novel chemical and biological analyses from image data.23 In this context, our team recently showed that specialized approaches such as multilayer perceptrons and other deep learning models can achieve accuracies of >90% for the identification of inorganic salts from macroscopic deposit patterns formed during the evaporation of solution drops.8,9 Here we studied how well multimodal LLMs like GPT-4o can perform similar tasks. Despite the lower accuracy achieved, our results indicate promise in such a nontailored approach. We also showed that performance is strongly model dependent with GPT-4o-mini barely outperforming random guesses. Considering the rapid rate of improvement of multimodal LLMs, it seems likely that at least some image analysis tasks in chemistry will soon be solvable with broadly accessible tools. As these general-use models continue to mature—alongside open-source alternatives like CLIP24—they may no longer require specialized architectures or extensive domain-specific data sets to achieve high accuracy. This expectation is supported by recent advances in model performance—particularly the marked improvements in visual reasoning between GPT-4 V and GPT-4o. Additionally, emerging evidence suggests that these systems can efficiently adapt from minimal training data, further highlighting their potential for AI-based chemical analysis.25
Acknowledgments
We thank Dr. Bruno C. Batista, Dr. Amrutha S V, Dr. Jéssica A. Nogueira, Prof. Jie Yan, Dr. Suman Sinha Ray, Dr. Ruth Agada, and Srinivasakranthikiran Kolachina for discussions.
Data Availability Statement
The Python script used for generating JSONL files for submission into OpenAI’s API, training images, csv files of the data in Figure 2, and figure-generating MATLAB scripts are freely available at GitHub: https://github.com/osteinbock/GPT4oBatch. All image data is freely available at https://www.chem.fsu.edu/~steinbock/saltscapes.phphttps://www.chem.fsu.edu/~steinbock/saltscapes.php.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c01150.
Cohen’s kappa values between pairs of analysis sets and F1 scores for every salt across all trials (PDF)
Author Contributions
D.B.D. and O.S. designed research; D.B.D. performed research and analyzed data; B.B.D. and O.S. obtained funding; and D.B.D., B.B.D., and O.S. wrote the paper.
This material is based on work supported by NASA under grant no. 80NSSC23M0050.
The authors declare no competing financial interest.
Supplementary Material
References
- Hardie L. A. On the Significance of Evaporites. Annu. Rev. Earth Planet. Sci. 1991, 19, 131–168. 10.1146/annurev.ea.19.050191.001023. [DOI] [Google Scholar]
- Shahidzadeh N.; Schut M. F. L.; Desarnaud J.; Prat M.; Bonn D. Salt Stains from Evaporating Droplets. Sci. Rep. 2015, 5, 10335. 10.1038/srep10335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deegan R. D.; Bakajin O.; Dupont T. F.; Huber G.; Nagel S. R.; Witten T. A. Capillary Flow as the Cause of Ring Stains from Dried Liquid Drops. Nature 1997, 389, 827–829. 10.1038/39827. [DOI] [Google Scholar]
- Salim H.; Kolpakov P.; Bonn D.; Shahidzadeh N. Self-Lifting NaCl Crystals. J. Phys. Chem. Lett. 2020, 11, 7388–7393. 10.1021/acs.jpclett.0c01871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Akhtar S.; Masmali A.; Khan A.; Almubrad T. Structure and Microanalysis of Tear Film Ferning of Camel Tears, Human Tears, and Refresh Plus. Acta Ophthalmol. 2018, 92, e1–e7. 10.1111/j.1755-3768.2014.S051.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yakhno T. A.; Sedova O. A.; Sanin A. G.; Pelyushenko A. S. On the Existence of Regular Structures in Liquid Human Blood Serum (Plasma) and Phase Transitions in the Course of Its Drying. Technol. Phys. 2003, 48, 399–403. 10.1134/1.1568479. [DOI] [Google Scholar]
- Demir R.; Koc S.; Ozturk D. G.; et al. Artificial Intelligence Assisted Patient Blood and Urine Droplet Pattern Analysis for Non-Invasive and Accurate Diagnosis of Bladder Cancer. Sci. Rep. 2024, 14 (1), 2488. 10.1038/s41598-024-52728-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Batista B. C.; Tekle S. D.; Yan J.; Dangi B. B.; Steinbock O. Chemical Composition from Photos: Dried Solution Drops Reveal a Morphogenetic Tree. Proc. Natl. Acad. Sci. U.S.A. 2024, 121, e2405963121 10.1073/pnas.2405963121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Batista B. C.; S V A.; Yan J.; Dangi B. B.; Steinbock O. High-Throughput Robotic Collection, Imaging, and Machine Learning Analysis of Salt Patterns: Composition and Concentration from Dried Droplet Photos. Digital Discovery 2025, 4, 1030–1041. 10.1039/D4DD00333K. [DOI] [Google Scholar]
- OpenAI GPT-4o System Card. 2024. URL: https://doi.org/10.48550/arXiv.2410.21276. 10.48550/arXiv.2410.21276 [DOI] [Google Scholar]; [last accessed: January 21, 2025].
- Buehler M. J.Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design. arXiv, 2024, arXiv:2405.19076. 10.48550/arXiv.2405.19076. [DOI]
- OpenAI GPT-4 Technical Report. OpenAI 2023. URL: https://cdn.openai.com/papers/gpt-4.pdf [last accessed: January 16, 2025].
- Alayrac J.-B.et al. Flamingo: A Visual Language Model for Few-Shot Learning. arXiv 2022, arXiv:2204.14198. 10.48550/arXiv.2204.14198 URL: https://doi.org/10.48550/arXiv.2204.14198 [last accessed: January 16, 2025]. [DOI] [Google Scholar]
- Shahriar S.et al. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency 10.48550/arXiv.2407.09519 URL: https://doi.org/10.48550/arXiv.2407.09519 [last accessed: January 21, 2025]. [DOI] [Google Scholar]
- Saltscapes 1.0 Image Library, URL: https://www.chem.fsu.edu/~steinbock/saltscapes.php [last accessed: January 17, 2025].
- Python JSONL file generation script, https://github.com/osteinbock/GPT4oBatch.git [last accessed: January 17, 2025].
- Cheng Z.; Kasai J.; Yu T.. Batch prompting: Efficient inference with large language model APIs. arXiv 2023, arXiv:2023.emnlp-industry.74, 10.18653/v1/2023.emnlp-industry.74. URL: https://doi.org/10.18653/v1/2023.emnlp-industry.74 [DOI] [Google Scholar]
- OpenAI Batch API. URL: https://platform.openai.com/docs/guides/batch [last accessed: November 24, 2024].
- OpenAI Assistants API Overview (Python SDK). URL: https://cookbook.openai.com/examples/assistants_api_overview_python [last accessed: November 24, 2024].
- Ouyang S.; Zhang J. M.; Harman M.; Wang M.. An Empirical Study of the Non-determinism of ChatGPT in Code Generation. ACM Trans. Softw. Eng. Methodol. 2024– just accepted. 10.1145/3697010. [DOI] [Google Scholar]
- OpenAI web interface, URL: https://platform.openai.com/batches [last accessed: January 16, 2025].
- Lake B. M.; Salakhutdinov R.; Gross J.; Tenenbaum J. B. One Shot Learning of Simple Visual Concepts. Proc. Annu. Meet. Cogn. Sci. Soc. 2011, 33, 2568–2573. [Google Scholar]; URL: https://escholarship.org/uc/item/4ht821jx.
- Knoll P.; Ouyang B.; Steinbock O. Patterns Lead the Way to Far-from-Equilibrium Materials. ACS Phys. Chem. Au 2024, 4, 19–30. 10.1021/acsphyschemau.3c00050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radford A.; Kim J. W.; Hallacy C.; Ramesh A.; Goh G.; Agarwal S.; Sastry G.; Askell A.; Mishkin P.; Clark J.; Krueger G.; Sutskever I.. Learning Transferable Visual Models from Natural Language Supervision. arXiv 2021, arXiv:2103.00020. 10.48550/arXiv.2103.00020. [DOI] [Google Scholar]
- Bubeck S.; Chandrasekaran V.; Eldan R.; Gehrke J.; Horvitz E.; Kamar E.; Lee P.; Lee Y. T.; Li Y.; Lundberg S.; Nori H.; Palangi H.; Ribeiro M. T.; Zhang Y.. Sparks of Artificial General Intelligence: Early Experiments with GPT-4; arXiv:2303.12712v5 [cs.CL] , 2023. 10.48550/arXiv.2303.12712. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Python script used for generating JSONL files for submission into OpenAI’s API, training images, csv files of the data in Figure 2, and figure-generating MATLAB scripts are freely available at GitHub: https://github.com/osteinbock/GPT4oBatch. All image data is freely available at https://www.chem.fsu.edu/~steinbock/saltscapes.phphttps://www.chem.fsu.edu/~steinbock/saltscapes.php.


