With a growing number of Americans turning to artificial intelligence for health advice, health systems nationwide are increasingly exploring and implementing their own branded chatbots. The goal is to leverage this popular technology, guide patients toward their services, and potentially address some of the persistent challenges within the U.S. healthcare system. However, this burgeoning trend is simultaneously sparking significant questions and concerns about the safety, efficacy, and ethical implications of integrating AI into patient care pathways.
Health system executives often present these new AI-powered tools as a significant step toward patient convenience, aiming to meet individuals where they are and enhance digital equity. Furthermore, they position these in-house chatbots as a safer alternative to the commercially available large language models (LLMs) that many Americans are already utilizing for health-related queries.
"We are at an inflection point in healthcare," stated Allon Bloch, CEO of clinical AI company K Health, in a press release. "Demand is accelerating, and patients are already using AI to navigate their lives." K Health, in collaboration with Hartford HealthCare in Connecticut, is at the forefront of this movement, rolling out its PatientGPT chatbot to tens of thousands of existing patients. Bloch elaborated, "The question isn’t whether AI will shape healthcare; it’s about how we do it in a safe, transparent way, inside a health system that connects to your medical records and your care team. PatientGPT represents that turning point."
Despite these optimistic projections, a segment of experts remains cautious. They are raising critical questions about the readiness of these chatbots for widespread deployment, the adequacy of monitoring mechanisms, the complex landscape of liability, and whether AI chatbots truly address the fundamental care access and quality issues that patients are experiencing. The tangible benefits for patients, at this stage, remain largely theoretical. "It’s a tempting idea," admitted Adam Rodman, a clinical reasoning researcher and internist at Beth Israel Deaconess Medical Center in Boston, in a recent interview with Stat News. He underscored the current lack of evidence demonstrating that integrating chatbots into health systems demonstrably improves patient outcomes, concluding, "We’re not there yet."
The Unmet Needs Driving AI Adoption in Healthcare
To fully grasp the potential role of AI in healthcare, it is crucial to examine the broader context of the U.S. healthcare system. Despite being one of the world’s wealthiest nations, the United States consistently underperforms in healthcare outcomes when compared to other high-income countries. Americans experience lower life expectancy, a higher burden of preventable deaths, and disproportionately high rates of maternal and infant mortality. Furthermore, the nation grapples with elevated levels of obesity and chronic conditions. Access to care remains a significant challenge, with a substantial portion of the population lacking adequate coverage or a regular healthcare provider. A 2023 report indicated that nearly a third of Americans, exceeding 100 million individuals, do not have a primary care provider, highlighting a critical gap in essential health services.
Into this landscape steps artificial intelligence. The accessibility of LLM-powered chatbots, offering seemingly comforting and confident responses, has led a significant portion of the American public to turn to these tools for health and medical inquiries. A recent poll by KFF revealed that one in three adults has utilized an AI chatbot for health information, a figure on par with those who use social media for similar purposes.
Among individuals who have engaged with AI for health advice, a striking 41 percent reported uploading personal medical information, such as test results. When questioned about their primary motivations for seeking AI assistance, 19 percent cited affordability issues, and 18 percent pointed to the absence of a regular healthcare provider or difficulties in securing appointments. The most common reason, cited by 65 percent of users, was the desire for a quick answer. Worryingly, a substantial number of these individuals did not follow up with a medical professional after their AI consultations; this included 58 percent of those who inquired about mental health and 42 percent who asked about physical health concerns.
Mounting Concerns Over AI’s Medical Accuracy and Information Integrity
As more Americans rely on AI to bridge existing gaps in healthcare access and information, cautionary tales and concerning anecdotes are emerging. These instances highlight potential pitfalls in both the way users formulate their queries to LLMs and the veracity of the information these models process and disseminate.
A study published in Nature Medicine in February 2026, which involved nearly 1,300 participants, sought to evaluate the medical accuracy of leading LLMs, specifically GPT-4o, Llama 3, and Command R+. When researchers provided the LLMs with text describing specific medical scenarios, the models correctly identified the medical condition approximately 95 percent of the time and suggested appropriate next steps, such as seeking emergency care, about 56 percent of the time. However, when participants used their own unstructured prompts to describe the same medical scenarios, the LLMs’ accuracy significantly diminished. They were able to correctly identify a medical condition in only about one-third of cases and guided participants to appropriate next steps in just 43 percent of instances.

"The study essentially shows that ‘people don’t know what they are supposed to be telling the model,’" explained lead author Andrew Bean, an AI researcher at Oxford University, in a recent NPR interview. This highlights a critical user-input challenge, where the effectiveness of the AI is heavily dependent on the user’s ability to articulate their symptoms and concerns in a way the model can accurately interpret.
Senior author Adam Mahdi further emphasized the gravity of these findings, stating, "The disconnect between benchmark scores and real-world performance should be a wake-up call for AI developers and regulators." This disparity underscores the need for more robust validation and oversight before these technologies are widely deployed in sensitive areas like healthcare.
Beyond the accuracy of AI responses, there is a significant concern regarding the quality of the medical information that LLMs may draw upon. Just last week, Nature News reported that LLMs were engaging users in discussions about "bixonimania," a skin condition entirely fabricated by researchers in Sweden. The research team had intentionally posted two fake studies online about this fictional condition to assess how readily medical misinformation could be adopted by AI tools. The experiment demonstrated that the uptake of such fabricated information was alarmingly easy, prompting the researchers to subsequently remove the false studies. This incident serves as a stark reminder of the potential for AI to amplify and spread misinformation, a particularly dangerous prospect in the realm of health advice.
Health Systems Embrace AI: Rollouts and Pilot Programs
Despite these significant concerns, several healthcare systems are moving forward with the implementation of their own proprietary chatbots. The PatientGPT chatbot, a collaboration between Hartford HealthCare and K Health, was launched in a beta version to a select group of patients last month. According to reports from Stat, the company plans to expand this rollout to tens of thousands more patients this week.
Hartford HealthCare has published a pre-print study, which is not yet peer-reviewed, detailing its iterative stress-testing process, often referred to as a "red teaming" approach. This methodology, involving 75 participants, purportedly improved the chatbot’s failure rate, particularly in "high risk" scenarios, over time. The testing reportedly reduced the failure rate in high-risk scenarios from 30 percent to 8.5 percent. However, the real-world implications of this reduction and the potential severity of the remaining 8.5 percent of failures remain unclear.
PatientGPT operates in two distinct modes. The first is a general medical question-and-answer mode, which can potentially incorporate patient-specific information. The second is a "medical intake" mode, where patients begin to describe their symptoms. In this mode, the chatbot becomes less conversational and adheres to predefined clinical flowcharts. Once the AI agent has gathered sufficient information, it provides a recommended next step, which could include scheduling a follow-up appointment with primary care or seeking urgent or emergency care. If the latter is advised, the chatbot ceases to respond to further queries, a built-in safety mechanism.
Hartford HealthCare has committed to continuous monitoring of the chatbot’s performance as the rollout expands. During the initial piloting phase, every interaction was meticulously monitored. Under the current broader rollout, human reviews will be conducted on 20 interactions daily, with a separate AI agent overseeing the remainder. Additionally, batch studies of every 1,000 conversations will be performed.
"We’re on a mission to be the most consumer-centric health system in the country," stated Jeff Flaks, president and CEO of Hartford HealthCare, in a recent address. "So much of healthcare has traditionally been organized around the provider, but it’s clear we have to meet people where they are and where they desire to be met. With PatientGPT, we are introducing a new tool that supports your health and provides access to a 24/7 care team, while protecting the human relationships at the heart of care."
Epic’s Emmie: A Cautious Approach to AI Integration
Beyond PatientGPT, another significant player in the AI chatbot space for healthcare is Emmie, an AI chat assistant being released by Epic, the electronic health records (EHR) behemoth that powers the widely used MyChart patient portal. Several health systems are beginning to deploy Emmie to their users through their respective online portals, including Sutter Health in California and Reid Health in Indiana.

In an executive address last year, Epic’s founder and CEO, Judy Faulkner, described Emmie as an assistant designed to help patients prepare for appointments by drafting visit agendas and, subsequently, to assist patients in understanding test results and answering follow-up questions. This initiative was first reported by Becker’s Hospital Review.
Sutter Health’s frequently asked questions (FAQ) page regarding Emmie clarifies its capabilities and limitations. The chatbot can "answer general health questions, and find or summarize information already visible in your chart—such as notes, results, past visits or messages." Crucially, it emphasizes that Emmie "doesn’t give personalized medical advice or make care decisions. Emmie is not intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of disease. Emmie is also not intended to replace, modify or be substituted for a physician’s professional clinical judgment." This explicit framing aims to manage patient expectations and reinforce the chatbot’s role as a supportive tool rather than a diagnostic or prescriptive one.
Currently, Emmie is accessible to a limited subset of Sutter Health patients. These early adopters are encouraged to provide feedback on Emmie’s responses through simple thumbs-up or thumbs-down reactions, contributing to the ongoing refinement of the system.
Reid Health is following Sutter Health’s lead as the second adopter of Epic’s Emmie. In a recent interview with Becker’s, Muhammad Siddiqui, CIO at Reid Health, highlighted that his system primarily serves rural communities. He views Emmie as a vital tool to broaden access to care and assist patients in navigating their healthcare journeys. "Patients want clearer answers, easier access and more guidance between visits," Siddiqui stated. "If we can provide that inside the health system experience, in a way that is connected to trusted clinical workflows, that is a much better path than leaving people on their own with public tools that may or may not be accurate." This perspective underscores the strategic advantage of integrating AI within established health systems, ensuring that the information and guidance provided are aligned with verified clinical protocols and patient records.
The Broader Implications: Navigating the Future of AI in Healthcare
The introduction of branded AI chatbots by health systems marks a pivotal moment, presenting both immense potential and significant challenges. On one hand, these tools could democratize access to basic health information, streamline administrative processes, and offer a more convenient and personalized patient experience, particularly for those facing barriers to traditional care. The ability of AI to process vast amounts of data and provide instant responses could alleviate pressure on healthcare professionals and improve patient engagement.
However, the risks associated with medical misinformation, data privacy, algorithmic bias, and the potential erosion of the crucial human element in patient care cannot be overstated. As demonstrated by the "bixonimania" incident, LLMs are susceptible to incorporating and disseminating fabricated information, posing a direct threat to patient safety. Furthermore, the collection and use of sensitive patient data by AI systems raise significant privacy concerns, demanding robust security measures and transparent data governance policies.
The current landscape suggests a cautious but determined push towards AI integration. Health systems are grappling with the ethical and practical considerations, seeking to balance innovation with the imperative of patient safety and quality of care. The ongoing development and deployment of tools like PatientGPT and Emmie, coupled with the critical insights from researchers and the experiences of early adopters, will undoubtedly shape the future trajectory of AI in healthcare. The ultimate success of these initiatives will hinge on the ability of health systems and AI developers to forge a path that is not only technologically advanced but also safe, equitable, and fundamentally centered on patient well-being. The journey ahead requires continuous evaluation, rigorous oversight, and a commitment to transparency to ensure that AI serves as a beneficial complement to, rather than a replacement for, human-centered medical care.
