Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2026 Feb 14;2025:1345–1354.

RAG vs Reddit: Decoding Autism Conversations on Reddit with LLMs and Topic Modeling

Deshan Wattegama 1, Benjamin Black 2,3, Marcus Moen 5, Chi-Ren Shyu 1,3,4
PMCID: PMC12919513  PMID: 41726480

Abstract

Social media platforms like Reddit have become vital spaces for autistic individuals and caregivers to seek advice, share experiences, and discuss challenges. Simultaneously, Large Language Models (LLMs) are increasingly used to provide medical guidance. This study examines autism-related discussions on Reddit, comparing them with clinician-patient discussions and evaluating the effectiveness of an autism-specific Retrieval-Augmented Generation (RAG) system. We applied BERTopic to identify key discussion themes in r/autism and r/autism_parenting, revealing significant discussions around behavioral challenges, and practical support. Comparing clinical messages from the University of Missouri Thompson Center for Autism and Neurodevelopment, we found caregivers in clinical settings focused more on medication management, whereas online discussions emphasized non-traditional therapies. We then assessed LLM-generated responses against Reddit peer advice, discussing the differences in accuracy, relevance, empathy and helpfulness. This work underscores the potential of RAG systems in enhancing autism-related guidance while emphasizing the importance of community-driven insights in healthcare conversations.

Introduction

Autism spectrum disorder is a complex neurodevelopmental condition characterized by persistent challenges in social communication and interaction, along with restricted, repetitive patterns of behavior1. Due to the diverse challenges associated with autism, autistic individuals and their caregivers increasingly seek immediate, accessible health guidance through online forums2,3. Social media platforms like Reddit4 host autism communities where users exchange personal experiences, practical strategies, and urgent questions about diagnoses, therapies, and daily challenges5,6. These peer-driven discussions fill critical gaps in formal healthcare access, offering support to those navigating under-resourced systems or unmet needs. Simultaneously, LLMs such as ChatGPT have emerged as popular alternatives for answering medical questions, raising concerns about the accuracy and relevance of AI-generated advice compared to community-based or clinical perspectives7,8. Despite the increasing use of social media and AI tools in healthcare, there is limited understanding of how themes in online discussions align with clinical priorities or how responses from LLMs compare to insights from real life experiences. This study addresses this gap by conducting an analysis of autism specific Reddit communities, clinical interactions, and LLM generated outputs.

Prior work highlights the value of online autism discussions in revealing undiscovered needs. For instance, Larnyo et al. identified post-COVID spikes in Reddit discussions about diagnostic delays and caregiver burnout, exposing systemic service gaps9. Similarly, studies of reddit parenting forums discuss the lack of tailored resources for autistic mothers navigating healthcare systems10. While analyses of social media have been significant in highlighting community health concerns11,12, there is a notable gap in research comparing these discussions to clinical dialogues or assessing the effectiveness of LLMs in addressing such issues. This gap is significant, as discrepancies between patient priorities expressed online and the focus areas of healthcare providers can lead to care disparities. Moreover, reliance on unverified AI-generated responses and community-generated replies on social media platforms can contain inaccuracies, further compromising patient safety and public health.

This study pursues three objectives to bridge these gaps. First, we identify common discussion themes in autism-related Reddit communities using BERTopic, a state-of-the-art NLP technique that leverages BERT embeddings to detect coherent topics in large text corpora13. Second, we contrast these themes with clinical communication patterns from the University of Missouri Thompson Center for Autism and Neurodevelopment14, an autism specific clinical institution, to uncover overlaps and difference between community driven and provider mediated autism discourse. Third, we evaluate the quality of responses generated by an autism-specific Retrieval-Augmented Generation15,16 (RAG) system by comparing them to those provided by Reddit users, to assess how AI-mediated guidance compares to peer-supported advice.

Methods

Data Sources

In this study, we utilized three primary data sources. The first two were collected from Reddit, while the third consisted of patient messages from the University of Missouri Thompson Center for Autism and Neurodevelopment’s14 Patient Messaging Portal.

Reddit as a Data Source

We collected data from two subreddits:

  1. r/autism: This public subreddit, created on April 9, 2008, has over 440,000 members and serves as a space for individuals to share personal experiences, seek advice, and discuss various aspects of autism. Data for this study were extracted using Pushshift17, an archival tool for Reddit data, covering discussions from November 1, 2009, to December 31, 2023. After preprocessing (removing submissions from deleted users, moderators, Reddit poll-related posts, and the top 5% of users with a high volume of irrelevant submissions), the final cleaned dataset comprised 116,712 posts.

  2. r/autism_parenting: This subreddit, with over 60,000 members, is focused on discussions related to autism caregiving and challenges faced by parents of autistic children. Data were collected using the Pullpush API18, a third party Reddit data extraction tool, for the period February 15, 2025, to March 9, 2025. After preprocessing, this dataset contained 1,216 posts.

Thompson Center Patient Messaging Portal

Thompson Center’s patient messaging portal, which operates via PowerChart, is an electronic health record system that facilitates communication between clinicians and patients. For this study, we extracted patient submitted questions from the portal within the same timeframe as the r/autism_parenting dataset (February 15, 2025, to March 9, 2025). The inclusion criteria required that messages contain references to behavioral challenges or issues faced by autistic children. Messages related to appointment scheduling, medication refills, or other administrative requests were excluded. All messages were manually collected, de-identified, and stored securely under Institutional Review Board (IRB) approval. The final dataset comprised 153 messages.

Topic Modeling Pipeline

In this study, we first analyzed posts from r/autism independently, as it represents a general audience perspective. Due to the diverse nature of the r/autism subreddit, as shown in Figure 1, we employ an LLM prompt based classification1922 method to categorize posts into five distinct groups, initially before extracting the themes. We then compare posts from r/autism_parenting with messages from the Thompson Center portal to examine the most prevalent themes in autism discussions within a social community of caregivers and a clinical setting for parents.

Figure 1:

Figure 1:

Data analysis Workflow of the Reddit post classification, Topic extraction using BERT topic model and RAG answer generation for sampled questions for selected themes.

Large Language Model-Based Reddit Post Classification

Prior to implementing LLM-based classification, we conducted an exploratory review of Reddit posts from the r/autism subreddit, and the messages that we collected from Thompsons Center, to identify common types of caregiver questions. The primary objective was to assess whether these questions could be meaningfully grouped into high-level help-seeking categories. Through this informal analysis, we observed that most help-seeking posts fell into two broad themes: Autism-Related Medical/Behavioral Help, which included inquiries about therapies, medications, and behavior management strategies; and Practical, Non-Medical Support, covering topics such as accommodations, daily routines, and assistive tools. To capture the broader range of discourse, we also included three additional categories: Diagnosis-Related Questions/Concerns, focused on formal diagnostic processes and uncertainty about autism-related traits; Personal Opinions, Experiences, and Stories, encompassing first-person reflections shared by users; and Unrelated or Off-Topic Content, comprising posts not directly related to autism. To ensure the clinical relevance and appropriateness of these categories, we consulted with a board-certified clinician specializing in autism care. These validated categories were then used to guide the LLM-based classification of the full dataset.

To develop a reliable evaluation corpus, we applied a three-stage annotation workflow that aligns with best-practice23. First, 1,000 posts were sampled at random from the r/autism subreddit. Two annotators independently labeled each post into one of the five categories. Annotation guidelines were refined iteratively: the annotators double coded an initial calibration batch of 100 posts, met to reconcile mismatches, then repeated the process in a second batch before annotating the remaining data. Inter-rater agreement on the full set of D=1,000 posts were measured by Cohen’s κ (0.86) and archived an “almost perfect” agreement. Residual disagreements were resolved by discussion between the

annotators, producing a single consensus label per instance. Because the natural class distribution was highly skewed, we applied stratified under sampling to ensure class balance in experiments. The final evaluation set therefore contains 570 posts (Dval) (114 per class), preserving diversity across categories. To classify the reddit posts, we formulated the problem as a multi-class text classification task. Given a dataset of posts D = {x1, x2 …, xn} where each post Xi belongs to one of five predefined categories C = {c1, c2, …, cn} with K = 5, our goal was to assign each post to its most appropriate category. We employed three prompting strategies with LLMs, Zero-Shot Prompting (ZS), Few-Shot Prompting (FS), and Chain-of-Thought Prompting (COT). In Zero-Shot Prompting, the model was given only the raw post text xi and instructions. Few-Shot Prompting provided the model with a small set (n = 5) of examples labeled with the prompt. We utilized Chain-of-Thought Prompting to generate intermediate reasoning steps before predicting the category, achieving the highest accuracy in the classification task. Based on the initial evaluation results, we selected OpenAI’s GPT-4o-mini to classify the entire dataset using the Chain-of-Thought prompting strategy.

Text Pre-Processing and BERT Topic Modelling

We implemented a comprehensive text preprocessing pipeline using Python Natural Language Toolkit (NLTK) to prepare posts for topic modeling. Initially, we combined the title and body text of each submission into a single paragraph to create the content for analysis. The preprocessing steps included converting all text to lowercase to ensure uniformity, removing URLs and special characters using regular expressions to reduce noise. Then we removed common English stop words to focus on more meaningful words, applied part-of-speech tagging and retained only nouns, verbs, adjectives, and adverbs to ensure the interpretability of the resulting topics. Finally, we lemmatized these word tokens to their base forms to prepare them for topic modeling task.

We focus on autism-related medical help, practical support categories to extract the most prevalent themes. Each Reddit post xi (for i = 1,2,3 … , N) is first pre-processed and then converted into a dense vector representation using a pre-trained SentenceTransformer - “thenlper/gte-small”24. The embedding function fembed maps each post xi to a d-dimensional vector Ei ∈ ℝR: Ei = fembed(xi), d = 384. These embeddings capture the semantic nuances of the posts and serve as input for the BERTopic model. The BERTopic model is then configured with, 1) Dimensionality Reduction - UMAP is used with parameters n_neighbors = 15 and n_components = 2 to reduce the high-dimensional embeddings, 2) Clustering - Bisecting K-Means25,26, a variation of K-Means is applied to cluster the reduced-dimension embeddings into K clusters, each of which is used to extract a candidate topic and 3) Vectorizer - A CountVectorizer with an n-gram range of 1 to 3 to capture unigrams, bigrams, and trigrams to capture the top words for each topic. Following the approach outlined by Ni et al27., and Xin et al28., we determined the optimal number of topics K by plotting the average semantic coherence and exclusivity scores for BERTopic models with K ranging from 5 to 20. We then selected the number of topics that achieved the best balance between these two criteria. Semantic coherence quantifies how frequently the top words of a topic co-occur in the corpus. High coherence implies that the words within a topic are likely to appear together in documents, which increases interpretability. We utilize Genism topic modelling framework to calculate coherence values for each topic. Exclusivity measures the uniqueness of the top words within a topic. A topic with high exclusivity will have top words that rarely appear in other topics, ensuring that the topics are distinct. For each topic k with top words {w1, w2, …, wN} exclusivity is calculated as:

Exclusivity(k)=1NwtopN(k)I{w is unique to k},

where the indicator function I{⋅} equals 1 if the word w is not among the top words in any other topic, and 0 otherwise. The mean exclusivity across topics is:

E¯(K)=1Kk=1KExclusivity(k).

Once the optimal K are obtained, the BERTopic13 model is re-run using model to extract the main topic for each cluster. After the extraction of the topics, based on the keywords and related documents, we annotated each topic with a meaningful label.

RAG pipeline to generate LLM based answers

Reddit questions and comment selection

After clustering the posts and extracting topics using BERTopic, we identified specific posts that best represent selected topics for further analysis. Since not all Reddit posts receive comments, the availability of comments serves as a key criterion for question selection. To ensure a meaningful analysis, we only consider posts with at least three comments when selecting representative and outlier questions, while excluding those comments from moderators, deleted comments, and self-replies by the original poster. After the filtration, to determine these questions, we calculate the cosine distance between each post and its cluster centroid. The post with the smallest cosine distance is selected as the representative question, as it closely reflects the core theme of the topic. In contrast, the post with the largest cosine distance is chosen as the outlier question, offering a unique, rare, or extreme perspective that deviates from the primary discussion.

Design of the RAG pipeline

To compare and evaluate LLM based answers with reddit community answers, as shown in the RAG pipeline for LLM answer generation module in Figure 1 we designed a RAG pipeline using carefully sourced document sources. Our approach consists of two primary components: (1) offline knowledge base construction and (2) online query processing and response generation. The offline component involves curating a domain-specific knowledge base under expert guidance, embedding relevant textual data, and storing it in a high-performance vector database. As the document source we utilized publicly available research papers, toolkits from Autism Speaks29, and peer-reviewed journal articles. Each document was segmented into smaller chunks to facilitate efficient retrieval. For each document di, an embedding vector ei was generated using an embedding function f : ei = f(di) (all-minilm-l6-v2, 256-token max, 384-dimensional).These embeddings were then stored in ChromaDB30, a high-performance vector database optimized for similarity search and efficient retrieval. The online component employs a retrieval mechanism based on cosine similarity to identify and extract the most relevant information. To retrieve the most relevant document chunks, we computed the cosine similarity between the query embedding eq and each stored document chunk embedding ei: sim(eq, ei) = eqeieqei. The top 10 document chunks with the highest similarity scores were retrieved from ChromaDB and subsequently used with the prompt instructions provided to the LLM for generation. For answer generation we used the gpt-4o-mini model (128K-context limit), with a temperature of 0.40 and default top-p = 1.0, and zero frequency/presence penalties—settings that balance determinism and diversity while avoiding truncation.

Expert evaluation of reddit comments and LLM-RAG responses

Ten clinicians from the Thompson Center for Autism, including nurses, doctors, and social workers evaluated the quality of Reddit comments and RAG-generated responses using a four-dimensional assessment framework8. For each selected Reddit question, evaluators assessed reddit comments and RAG-generated responses using correctness, relevance, empathy and helpfulness criteria (Table 2). The correctness criterion assesses whether the medical information provided aligns with established healthcare standards. Relevance measures how well the response stays on topic, while empathy evaluates whether the answer acknowledges and considers the emotional state of the individual when answering. Additionally, we incorporate a helpfulness criterion to determine whether the response effectively addresses the individual’s needs and provides meaningful support.

Table 2:

Four categories of evaluation criteria and instructions for the evaluators, to evaluate reddit and RAG answers.

Score
Correctness
 Absolutely incorrect Contains serious errors that could endanger patient health and does not adhere to established standards. 1
 Incorrect Includes significant inaccuracies or mistakes that compromise medical quality. 2
 Partly correct Provides some correct information but contains multiple noncritical errors or omissions. 3
 Mostly correct Largely accurate with only minor, noncritical inaccuracies. 4
 Absolutely correct Fully accurate, aligning with current medical guidelines and evidence-based practices. 5
Relevance
 Absolutely irrelevant Does not address the question; the response is entirely unrelated to the patient’s concern. 1
 Irrelevant Barely considers the question, with most of the information being off topic. 2
 Partly relevant Somewhat related to the question but includes irrelevant details 3
 Mostly relevant Stays on topic with minimal irrelevant information. 4
 Absolutely relevant Directly and concisely answers the question without unnecessary details. 5
Empathy
 Absolutely unemphatic Completely lacks warmth or sensitivity and does not acknowledge the emotional concerns of the individual. 1
 Unemphatic Largely detached and impersonal, showing minimal consideration for the person’s emotions. 2
 Partly emphatic Displays some awareness of the person’s concerns but remains distant or formal. 3
 Mostly emphatic Supportive and friendly, demonstrating clear understanding of the individual’s emotions. 4
 Absolutely emphatic Highly compassionate and encouraging, offering strong emotional support and validation. 5
Helpfulness
 Not helpful at all The response is misleading, irrelevant, or could cause harm. It does not provide any useful support or guidance. 1
 Barely helpful Provides vague or generic information that does not meaningfully address the person’s concern. Lacks actionable advice. 2
 Somewhat helpful Contains some useful information but is incomplete, unclear, or lacks depth. The person would need additional guidance. 3
 Mostly helpful Provides relevant and mostly clear advice that addresses the concern well, though it could be more precise or detailed. 4
 Very helpful Thorough, well-structured, and actionable. Fully addresses the concern with clear, evidence-based guidance. 5

Results

The initial classification of r/autism posts revealed key discussion areas, with Diagnosis-Related Questions/Concerns being the most common (27.73%). Autism-Related Medical/Behavioral Help followed at 22.82%, reflecting frequent requests for advice on challenges like meltdowns, potty training, and adult life issues. After this initial analysis we directed our focus on themes within medical and practical support discussions.

Selection of the number of topics K for medical and practical support discussions

To identify coherent and distinct themes within medical and practical help-related posts, we evaluated topic models with the number of topics (K) ranging from 5 to 20. We used semantic coherence to assess topic interpretability and exclusivity to evaluate thematic uniqueness. Based on this evaluation, the model with K=17 offered the best trade-off, balancing high coherence and low overlap across topics. A brief qualitative review further confirmed that the resulting topics were interpretable and aligned well with the intended focus.

Autism Related Medical and Practical Help Discussions

Table 4 presents the top 12 topics extracted from caregiver queries within the medical and practical help category focusing more towards behavioral and practical help. The topics identified include medical help-related queries such as food preferences and picky eating, potty training and toileting challenges, stimming, teeth brushing and sensory toys, sensory overload and noise sensitivities, and touch issues. We observed that majority of other topics revolve around social skills and navigating daily life with family, siblings, and broader social interactions. A notable insight emerged in the family dynamics and sibling relationships issues category is the challenges faced by family members of autistic children. Unlike autistic individuals and primary caregivers, who have direct access to clinicians, siblings and extended family members often lack professional support. This highlights the need for more comprehensive family-centered resources to address their emotional and logistical difficulties31.

Table 4 :

Themes discussed in medical and practical assistance-related posts. The following top 12 topics have been identified based on coherence score and uniqueness, derived from an analysis of 37,495 posts.

Topic Label Top Words Percentage (%) Coherence Score
T1 Behavioral Food Sensitivities & Eating Challenges food, eat food, picky eater, eating, food texture, sensory issue 3.45 0.8592
T2 Personal Hygiene Related Sensory Issues sensory issue, brush teeth, toothbrush, hygiene, shower, smell 3.74 0.7261
T3 Behavioral Challenges in School & Educational Challenges school, teacher, classroom, education, behavior, parent 7.6 0.5845
T4 Stimming & Pain Related Issues stop stimming, stimming, stim toy, pain, sensory 3.65 0.5123
T5 Social Communication Issues & Friendships Related Issues friendship, friends, make friend, conversation, communicate, feeling 8.7 0.5006
T6 Meltdowns related Issues meltdown, sensory overload, panic, upset, overstimulation, overwhelm, situation 9.07 0.4819
T7 Sensory Overload Issues sensory issue, sensory overload, sensitivity, uncomfortable, feeling 7.69 0.4713
T8 Practical Help Headphone Usage for Noise Sensitivities noise cancel, headphone, earplugs, noise, background noise 3.41 0.8863
T9 Relationships & Employment Employment & Workplace Challenges job, work, career, workplace, coworkers, hire, life 8.6 0.5023
T10 Family Dynamics & Sibling Relationships brother, little brother, family, parent, talk, advice, mom, dad 5.41 0.4886
T11 Romantic Relationships & Love relationship, boyfriend, love, couple, romantic, emotional, girlfriend 4.47 0.4839
T12 Research Participation Questions about Research Participation & Studies participant, participate, study, college, program, survey 5.45 0.7143

Comparison of Discussions in Clinical vs. Online Autism Communities

From the Thompson center messaging portal, the most frequently discussed topics were medication-related concerns (Table 5). Compared to the clinical discussions, r/autism_parenting posts focused more on day-to-day caregiving challenges (Table 6), such as sleep disturbances, potty training, and communication barriers, which can be further addressed and treated in a clinical setting. In contrast, discussions on Reddit rarely centered on prescription drugs, instead emphasizing alternative treatments. Education and schooling challenges were prevalent in both clinical and online discussions, highlighting the shared struggles caregivers face in securing appropriate accommodation and behavioral support for their autistic children.

Table 5 :

The top 5 key topics identified from 153 Thompson Center messages between February 15, 2025, and March 9, 2025.

Topic Label Top Words Percentage (%) Coherence Score
Ritalin and Medication Adjustments try ritalin, ritalin afternoon, ritalin need, ritalin, ritalin morning, leave ritalin, ritalin ritalin, refill ritalin, need guanfacine, change guanfacine 8.5 0.81
Education and Schooling need help, help know, know need, need discuss, help plan, teacher help, check teacher, need specific, behavior check, need work 5.88 0.58
Abilify Usage and Concerns start abilify, improvement, update, abilify, worry, gain, attention, occur, routine, dosage 11.76 0.41
Therapy and Autism Assessment test autism, progress therapist, autism, need letter, therapist, leave supervision, email, felt need, attention, support 3.92 0.35
Behavioral Issues and Mood Changes increase behavior, behavior bad, behavior week, help mood, affect, anger, pseudobulbar affect, aggression, outburst, angry 17.65 0.35

Table 6 :

The top 6 key topics identified from 1216 \autism_parenting posts between February 15, 2025, and March 9, 2025.

Topic Label Top Words Percentage (%) Coherence Score
Sleep Issues and Bedtime Concerns night sleep, want sleep, bedtime, able sleep, fall asleep, melatonin, night bed, hour sleep, asleep, night wake 17.54 0.80
Potty Training and Toileting potty train, potty training, toilet, pee, poop, poop underwear, diaper, pooped, train, input 20.39 0.68
Cannabinol usage and food habits child cbn, affect child, try cbd, aid try, try food, milk, cbn sleep, cbn affect, cbn, cbn know 5.87 0.55
Education and Schooling special education, special ed, change school, school know, school say, teacher tell, teacher say, education process, education, elementary 13.21 0.50
Autism and Nonverbal Communication son autistic, autistic kid, autistic non, cause autism, nonverbal autistic, old autistic, parent child, nephew, mum, child know 19.17 0.50
Autism Diagnosis and Therapy autism son, child asd, asd level, autism diagnosis, help parent, kid spectrum, aba therapy, feedback, product service, intellectual disability 23.82 0.38

Qualitative evaluation of reddit answers vs RAG answers

This study investigated the quality of peer generated versus LLM generated responses in autism related behavioral discussions. We selected six high-impact behavioral themes from the r/autism subreddit: Food Sensitivities & Eating Challenges, Personal Hygiene Related Sensory Issues, Behavioral Challenges in School & Educational Settings, Stimming & Pain Related Issues, Social Communication & Friendship Issues, and Meltdowns and Sensory Overload. For each theme, we selected one representative and one outlier question and extracted the top three Reddit comments and a corresponding RAG-generated answer, resulting in a total of 12 questions evaluated by n=10 clinician evaluators across four core dimensions—correctness, relevance, empathy, and helpfulness. Across all dimensions, RAG responses consistently demonstrated higher quality and reliability. While Reddit comments often reflected valuable lived experience, they varied widely in clarity, emotional tone, and clinical accuracy. Some shared personal experiences, though relatable, lacked clarity and structure for effective guidance. In contrast, RAG responses were more clinically aligned, providing clear, actionable answers without unnecessary detail.

Discussion

The prominence of diagnosis-related posts (27.73%) underscores lingering uncertainty around assessment pathways and interest towards anonymous validation online. Simultaneously, the large share of medical/behavioral help requests (22.82%) and practical support posts highlight a demand for concrete, day-to-day guidance that traditional clinics struggle to meet quickly. Integrating a vetted RAG assistant into patient portals or trusted community sites could provide timely, evidence-based answers while easing clinician workload. Comparisons between Reddit and Thompson Center messages, despite differences in sample size, revealed clear divergence in treatment focus. Importantly, while both datasets involve caregiver concerns, the Thompson Center messages reflect direct clinician– patient interactions, whereas Reddit posts are shaped by peer-to-peer, community-based discussions without clinical oversight. Caregivers using the clinical platform frequently inquired about prescription medications, including Abilify (Aripiprazole), Ritalin (Methylphenidate), and Guanfacine, with a focus on dosage adjustments and alternative options. In contrast, discussions on Reddit rarely centered on prescription drugs, instead emphasizing alternative treatments, particularly Cannabinol (CBN/CBD) products32–34. Education and schooling challenges were prevalent in both clinical and online discussions, highlighting the shared struggles caregivers face in securing appropriate accommodation and behavioral support for their autistic children. Notably, caregivers in clinical settings demonstrated greater familiarity with prescription medication management, often seeking guidance on dosage modifications or alternative medication. This suggests that while clinical discussions are structured around pharmacological interventions, online communities are more exploratory, engaging in discussion on non-traditional therapies.

Across six behavioral themes, clinician evaluators rated RAG answers higher on correctness (+1.29 mean difference) and empathy (+1.52) than Reddit comments. One significant difference was the consistency with which RAG responses addressed all aspects of a question without introducing unrelated personal commentary. For instance, in response to a hygiene-related query: “I have sensory problems with taking care of my teeth, washing my face, and shaving”, Reddit users primarily shared personal experiences, such as recommending specific products like different toothpastes. While relatable, these responses often lacked empathy or failed to directly engage with the broader concern. Some replies were minimally helpful or emotionally detached, such as simply asking, “What do you find difficult about brushing your teeth?”. In contrast, the RAG-generated response acknowledged the user’s struggle, “It’s great that you’re looking for tips to help you manage this better” and offered clear, structured strategies tailored to sensory sensitivities. These findings highlight the strength of LLM-driven, domain specific tools in delivering both practical, structured guidance while being empathetic. Notably, evaluators thought that RAG answers also displayed a higher level of emotional sensitivity, validating the user’s struggles and offering encouragement in a structured, compassionate tone. In an actual clinician care settings also, quality of care is improved when physicians explore and validate patient concerns35.

Conclusion

This study analyzes discussions from online autism communities and clinical caregiver communications, identifying key themes such as potty training, sensory sensitivities, picky eating, relationship challenges, and underrepresented family dynamics. It underscores the value of online platforms in bridging support gaps and calls for integrating peer-shared insights with formal care strategies. In comparing Reddit responses with Retrieval-Augmented Generation responses, RAG answers were often more accurate, empathetic, and concise, suggesting that such systems can complement peer discussions by reducing misinformation and enhancing caregiver support in autism-related contexts.

Limitations and future work

This study has several limitations. First, the Thompson Center data was sourced solely from the patient-portal trash folder, which retains messages for only 30 days, thereby constraining both sample size and topic diversity. Second, our analysis of Reddit posts included only threads with three or more comments, introducing selection bias that over-represents popular or controversial content and excludes unanswered queries. Third, our primary evaluation compared RAG vs. Reddit responses through just ten clinicians reviewing twelve Q&A pairs, limiting generalizability. Fourth, our initial caregiver question categorization derived from an informal review of 100 Reddit posts and 153 portal messages, lacking a systematic thematic analysis that may have overlooked nuanced themes. Finally, while our three-stage annotation workflow achieved strong inter-rater reliability (Cohen’s κ = 0.86), it relied on only two annotators. Best practices suggest involving multiple annotators and larger sample sizes to ensure dataset reliability and reduce bias. To address these shortcomings, future work will extract a Thompson center message corpus directly from the record database, expanding both the message volume and topical breadth. We plan to include more Q&A pairs and a more diverse evaluation panel including caregivers and autistic adults. Future objectives include developing a RAG-based application to actively support the autistic community and its stakeholders. Such applications have shown promise in providing tailored information and assistance in various contexts.

Figures & Tables

Figure 2:

Figure 2:

Chain-of-Thought Prompt Example for LLM Reddit Post Classification.

Figure 3:

Figure 3:

LLM based answer generation Prompt for RAG Pipeline.

Figure 4 :

Figure 4 :

Clinician evaluations across six topics and four main criteria indicated that LLM-generated responses consistently outperformed those from the Reddit community.

Table 1:

LLM classification results for r/Autism posts using Zero-shot, Few-shot, and Chain-of-Thought prompts (Figure 2). Focus is on Class 1 (Medical/Behavioral Help) and Class 3 (Practical Support); Classes 2,4 and 5 refer to Diagnosis-Related Questions, Personal Experiences, and Off-Topic content related posts.

Model GPT-4o-mini LLaMA2 DeepSeek-r1
Prompting Method ZS FS COT ZS FS COT ZS FS COT
Parameters NA NA NA 70B 70B 70B 70B 70B 70B
Metric
Accuracy 0.66 0.64 0.67 0.45 0.51 0.54 0.59 0.61 0.58
Macro-F1 0.65 0.63 0.67 0.43 0.50 0.53 0.59 0.62 0.58
Class 1 F1 0.58 0.53 0.65 0.32 0.54 0.36 0.53 0.53 0.47
Precision 0.50 0.51 0.61 0.58 0.45 0.58 0.57 0.54 0.67
Recall 0.71 0.54 0.70 0.22 0.68 0.26 0.49 0.52 0.37
Class 2 F1 0.79 0.79 0.81 0.48 0.71 0.78 0.70 0.69 0.71
Precision 0.80 0.80 0.85 0.90 0.84 0.85 0.95 0.93 0.86
Recall 0.77 0.78 0.77 0.33 0.61 0.71 0.55 0.55 0.60
Class 3 F1 0.58 0.56 0.67 0.43 0.45 0.57 0.61 0.64 0.59
Precision 0.72 0.77 0.66 0.56 0.63 0.45 0.71 0.68 0.46
Recall 0.49 0.44 0.68 0.36 0.34 0.77 0.53 0.61 0.81
Class 4 F1 0.59 0.60 0.53 0.39 0.28 0.49 0.51 0.57 0.52
Precision 0.66 0.58 0.55 0.35 0.45 0.42 0.44 0.51 0.46
Recall 0.53 0.62 0.52 0.45 0.21 0.59 0.61 0.65 0.60
Class 5 F1 0.74 0.69 0.70 0.54 0.51 0.47 0.62 0.66 0.62
Precision 0.70 0.61 0.72 0.39 0.41 0.64 0.52 0.59 0.74
Recall 0.78 0.80 0.68 0.88 0.67 0.37 0.75 0.75 0.53

Table 3 :

Classification results of primary discussion areas in r/Autism (116,712 posts).

Category Count Percentage (%)
Autism-Related Medical Behavioral Help 26,634 22.82
Diagnosis-Related Questions 32,367 27.73
Practical, Non-Medical Support 10,861 9.30
Personal Opinions, Experiences, and Stories 19,457 16.67
Unrelated or Off-Topic 27,393 23.47

Table 7 :

Clinician Ratings for RAG vs. Reddit Answers Across Four Dimensions.

RAG Generated Responses Reddit Responses
Median Mean (SD) Median Mean (SD)
Correctness 4 4.34 (0.56) 3 3.05 (0.86)
Relevance 5 4.45 (0.68) 3 3.31 (0.81)
Empathy 5 4.47 (0.59) 3 2.95 (0.81)
Helpfulness 4 4.16 (0.62) 3 2.98 (0.95)

References

  • 1.Psychiatry.org. What Is Autism Spectrum Disorder? [Internet] [cited 2025 Mar 20]. Available from: https://www.psychiatry.org/patients-families/autism/what-is-autism-spectrum-disorder.
  • 2.Papadopoulos C. Large language models for autistic and neurodivergent individuals: Concerns, benefits and the path forward. Neurodiversity. 2024 Jan;2:27546330241301938. [Google Scholar]
  • 3.Daynes-Kearney R, Gallagher S. Online Support Groups for Family Caregivers: Scoping Review. J Med Internet Res. 2023 Dec 13;25:e46858. doi: 10.2196/46858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reddit - Dive into anything [Internet] [cited 2025 Mar 3]. Available from: https://www.reddit.com/
  • 5.Mann S, Carter MC. Emotional disclosures and reciprocal support: The effect of account type and anonymity on supportive communication over the largest parenting forum on Reddit. Hum Behav Emerg Technol. 2021 Dec;3(5):668–76. [Google Scholar]
  • 6.Plank L, Zlomuzica A. Reduced speech coherence in psychosis-related social media forum posts. Schizophrenia. 2024 Jul 4;10(1):60. doi: 10.1038/s41537-024-00481-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Denecke K, May R, LLM HealthGroup. Rivera Romero O. Potential of Large Language Models in Health Care: Delphi Study. J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weber MT, Noll R, Marchl A, Facchinello C, Grünewaldt A, Hügel C, et al. MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases. J Am Med Inform Assoc. 2025 Feb 25:ocaf034. [Google Scholar]
  • 9.Larnyo E, Nutakor JA, Addai-Dansoh S, Nkrumah ENK. Sentiment analysis of post-COVID-19 health information needs of autism spectrum disorder community: insights from social media discussions. Front Psychiatry. 2024 Oct 11;15:1441349. doi: 10.3389/fpsyt.2024.1441349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Thom-Jones S, Melgaard I, Gordon CS. Autistic Women’s Experience of Motherhood: A Qualitative Analysis of Reddit. J Autism Dev Disord [Internet] 2024 Apr 26. [cited 2025 Mar 14]; Available from: https://link.springer.com/10.1007/s10803-024-06312-7.
  • 11.Edwards C, Love AMA, Jones SC, Cai RY, Nguyen BTH, Gibbs V. ‘Most people have no idea what autism is’: Unpacking autism disclosure using social media analysis. Autism. 2024 May;28(5):1107–19. doi: 10.1177/13623613231192133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bellon-Harn ML, Boyd RL, Manchaiah V. Applied Behavior Analysis as Treatment for Autism Spectrum Disorders: Topic Modeling and Linguistic Analysis of Reddit Posts. Front Rehabil Sci. 2022 Jan 5;2:682533. doi: 10.3389/fresc.2021.682533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure [Internet] arXiv. 2022. [cited 2025 Mar 7]. Available from: https://arxiv.org/abs/2203.05794.
  • 14.Home - Thompson Center for Autism & Neurodevelopment [Internet] [cited 2025 Mar 14]. Available from: https://thompsoncenter.missouri.edu/
  • 15.Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [Internet] arXiv. 2020. [cited 2025 Mar 20]. Available from: https://arxiv.org/abs/2005.11401.
  • 16.Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-Augmented Generation for Large Language Models: A Survey [Internet] arXiv. 2023. [cited 2025 Mar 20]. Available from: https://arxiv.org/abs/2312.10997.
  • 17.Watchful1. Subreddit comments/submissions 2005-06 to 2023-12. Available from: https://www.reddit.com/r/pushshift/comments/1akrhg3/separate_dump_files_for_the_top_40k_subreddits/
  • 18.PullPush API Forum [Internet] [cited 2025 Mar 14]. Available from: https://forum.pullpush.io/
  • 19.Sun X, Li X, Li J, Wu F, Guo S, Zhang T, et al. Findings of the Association for Computational Linguistics: EMNLP 2023 [Internet] Singapore: Association for Computational Linguistics; 2023. Text Classification via Large Language Models; pp. p. 8990–9005. [cited 2024 Nov 6]. Available from: https://aclanthology.org/2023.findings-emnlp.603. [Google Scholar]
  • 20.Wulcan JM, Jacques KL, Lee MA, Kovacs SL, Dausend N, Prince LE, et al. Classification performance and reproducibility of GPT-4 omni for information extraction from veterinary electronic health records. Front Vet Sci. 2025 Jan 16;11:1490030. doi: 10.3389/fvets.2024.1490030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang Z, Pang Y, Lin Y, Zhu X. Adaptable and Reliable Text Classification using Large Language Models [Internet] arXiv. 2024. [cited 2025 Feb 7]. Available from: https://arxiv.org/abs/2405.10523.
  • 22.Kostina A, Dikaiakos MD, Stefanidis D, Pallis G. Large Language Models For Text Classification: Case Study And Comprehensive Review [Internet] arXiv. 2025. [cited 2025 Mar 14]. Available from: https://arxiv.org/abs/2501.08457.
  • 23.Törnberg P. Best Practices for Text Annotation with Large Language Models. Sociologica. 2024 Oct 30;18(2):67–85. [Google Scholar]
  • 24.Li Z, Zhang X, Zhang Y, Long D, Xie P, Zhang M. Towards general text embeddings with multi-stage contrastive learning. ArXiv Prepr ArXiv230803281. 2023.
  • 25.Lossio-Ventura JA, Gonzales S, Morzan J, Alatrista-Salas H, Hernandez-Boussard T, Bian J. Evaluation of clustering and topic modeling methods over health-related tweets and emails. Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Curiskis SA, Drake B, Osborn TR, Kennedy PJ. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf Process Manag. 2020 Mar;57(2):102034. [Google Scholar]
  • 27.Ni C, Song Q, Chen Q, Song L, Commiskey P, Stratton L, et al. Sentiment Dynamics Among Informal Caregivers in Web-Based Alzheimer Communities: Systematic Analysis of Emotional Support and Interaction Patterns. JMIR Aging. 2024 Dec 4;7:e60050. doi: 10.2196/60050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xin Y, Ni C, Song Q, Yin Z. Fatigue, Pain, and Medication: Mining Online Posts Regarding Rheumatoid Arthritis From Reddit. AMIA Annu Symp Proc AMIA Symp. 2023;2023:754–63. [PMC free article] [PubMed] [Google Scholar]
  • 29.Information by topic | Autism Speaks [Internet] [cited 2025 Mar 20]. Available from: https://www.autismspeaks.org/information-topic.
  • 30.Chroma [Internet] [cited 2025 Mar 20]. Available from: https://www.trychroma.com/
  • 31.Russa MB, Matthews AL, Owen-DeSchryver JS. Expanding Supports to Improve the Lives of Families of Children With Autism Spectrum Disorder. J Posit Behav Interv. 2015 Apr;17(2):95–104. [Google Scholar]
  • 32.Hacohen M, Stolar OE, Berkovitch M, Elkana O, Kohn E, Hazan A, et al. Children and adolescents with ASD treated with CBD-rich cannabis exhibit significant improvements particularly in social symptoms: an open label study. Transl Psychiatry. 2022 Sep 9;12(1):375. doi: 10.1038/s41398-022-02104-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bilge S, Ekici B. CBD-enriched cannabis for autism spectrum disorder: an experience of a single center in Turkey and reviews of the literature. J Cannabis Res. 2021 Dec;3(1):53. doi: 10.1186/s42238-021-00108-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ma L, Platnick S, Platnick H. Cannabidiol in Treatment of Autism Spectrum Disorder: A Case Study. Cureus [Internet] 2022 Aug 26. [cited 2025 Mar 12]; Available from: https://www.cureus.com/articles/109585-cannabidiol-in-treatment-of-autism-spectrum-disorder-a-case-study.
  • 35.Epstein RM, Shields CG, Franks P, Meldrum SC, Feldman M, Kravitz RL. Exploring and Validating Patient Concerns: Relation to Prescribing for Depression. Ann Fam Med. 2007 Jan 1;5(1):21–8. doi: 10.1370/afm.621. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES