June 15, 2025
Customer Research Deserves Better AI

Today’s AI tools can summarize feedback, but is summarization enough to truly understand customers in a way that’s impactful for the business?

From my experience building data-science models and AI over the past few years and finally, experimenting with publicly available LLMs, most of them fall apart when we ask deeper, product-critical questions.

Here’s why most LLM-based approaches fall short:

Riley Features Table
Feature Why
Riley Model Update V2 Leverage a deeper, more accurate model that triangulates quantitative and qualitative data
Customer Impact Score Quickly prioritize customer insights
Market Trends Analysis Stay ahead of the competition by automatically tracking their online activity
Automated Survey Analysis Analyze survey data in seconds - no more complex pivot tables
Save Insights for Later Think an insight is interesting but not relevant right now? Save it and we'll remind you about it later
Refine Insights Write a simple prompt to have Riley's data models reanalyze your insights any way you like
Deeper Citations Easily track the sources of your insights
Commenting & Collaboration Easily discuss customer insights with your team and capture key perspectives automatically
Insights on Slack Share and discuss insights directly where your team works
Notifications Stay alerted to the most valuable insights and activities on Riley
Instant Research Plans Become a stronger researcher by letting Riley coach you on your research plan
Onboarding Guide Learn how to use Riley from your very first login
Security Improvements Keep your customer and research data safe on Riley
Performance Improvements Analyze data and generate insights faster than ever

Lack of Grounded Quantitative Reasoning

LLMs are trained on language - not structured data.

They can describe patterns, but they don’t natively reason over metrics like funnel conversion, adoption curves, or engagement deltas. Ask them to correlate qualitative themes with behavioral trends - say, feature complaints with actual usage drop-off, and they fall short. This brings us back to the classic Achilles’ heel of customer research: you risk being swayed by the loudest or most recent voices.

A thorough benchmark from 2023 examining LLMs' performance on tabular and structured data reasoning demonstrates that most leading models, including GPT-4, show significant deficiencies when reasoning over realistic, complex tables. Performance drops further in the presence of missing values, duplicate entries, or structural variations common in real-world data. These results indicate a persistent gap in the ability of LLMs to robustly handle structured input without external tools or grounding in structured data sources. (Kalo et al., 2024).

I investigated this topic deeper on a previous blog post.

Surface-Level Pattern Matching ≠ Insight

Large Language Models (LLMs) are fundamentally trained to predict the next most likely token based on observed patterns in data. This architectural focus makes them highly effective at capturing and echoing what is most frequently said within a dataset—but not necessarily what is most relevant or important.

Recent research underscores this limitation. Jia et al. (2024) observe that “Large Language Models (LLMs) are excellent pattern matchers… they recognize patterns from the input text, drawing from their vast training, and produce outputs.” However, they caution that while such patterns are computationally efficient and easily discoverable through gradient descent and attention mechanisms, they are “inherently unreliable on their own.” This highlights a persistent gap between surface-level pattern recognition and the kind of deep, analytic reasoning required to interpret complex, real-world data.

Inability to Track Shifting Signals

LLMs are stateless by default. They don’t persist history across sessions. They don’t track individuals over time. This means they can’t tell you what’s changed in opinions over the past two months, how newly onboarded users behave differently from long-term power users, or whether frustration around a feature is rising or fading.

Unless you’ve explicitly built a pipeline that embeds time-awareness, cohort tracking, and user-level metadata, your LLM is working blind.

As Qiu et al. (2024) highlight, large language models (LLMs) consistently lag behind both human judgment and smaller, specialized models—particularly when it comes to maintaining self-consistency. In their study, LLMs produced incoherent or contradictory outputs in over 27% of cases. The authors conclude that “current LLMs lack a consistent temporal model of textual narratives,” underscoring a key structural limitation in how these models process evolving information.

What This Means for Customer Researchers (and Builders)

I’m not arguing that research should slow down. In fact, for it to have any real impact, it needs to move faster than ever, keeping pace with, or even outpacing, product development. But speed without clarity is noise. AI should help us analyze more intelligently, connecting signals across time, behavior, and context, not just summarize what was said most often.

We need systems built for real-world decisions. That means going beyond plug-and-play LLMs to architectures that surface insights grounded in business logic, structured data, and real usage patterns (P.S. that’s what we’re building at Riley).

Don’t settle for faster noise. Aim for faster truth.

Claudia is the CEO & Co-Founder of Riley AI. Prior to founding Riley AI, Claudia led product, research, and data science teams across the Enterprise and Financial Technology space. Her product strategies led to a $5B total valuation, a successful international acquisition, and scaled organizations to multi-million dollars in revenue. Claudia is passionate about making data-driven strategies collaborative and accessible to every single organization.Claudia completed her MBA and Bachelor degrees at the University of California, Berkeley.

LinkedIn