By MBI Deep Dives in AMZN — Aug 26, 2025

The Promise and Pitfall of Agentic Commerce

We are entering the age of agentic commerce, where autonomous AI agents, acting as our digital deputies, will navigate the endless aisles of online stores, evaluating products and making purchases on our behalf. Agentic commerce promises to crush search frictions and expand the practical consideration set from a handful of clicks to the whole shelf. While the promises seem quite lofty, I came across this paper recently which was quite instructive to appreciate the challenges as well. The paper explored four specific questions:

Do agents satisfy basic instruction following and simple economic dominance tests?

What are product market shares when purchases are fully mediated by AI agents, and how do such market shares vary across AI agents?

How do AI agents respond to observable attributes (price, rating, reviews, text) and platform levers (position, promotions, sponsorship)?

How might outcomes change when sellers and/or marketplace platforms deploy their own optimizing AI agents?

The paper discussed some experiments they did to understand how well the AI agents perform in different contexts.

In these experiments, even SOTA models struggled with basic economic rationality, sometimes failing to select the lowest-priced or highest-rated product when all other attributes are identical. From the paper:

In one of the price-based rationality tests, we construct a scenario with all listings being identical except for one listing having a lower price. Here, even state of the art models (GPT4.1) can register failure rates exceeding 9%. The failure rate tends to decrease with the increase in price difference. In rating dominance tests, all listings are identical except that one listing has an average rating which is higher by 0.1. We see significant heterogeneity in performance, with some models registering no failures (Gemini 2.0 Flash) and others registering up to 71.7% failures (GPT-4o, on which OpenAI Operator is built). However with GPT-4.1, this fail rate comes down to 16.0%, supporting the insight that failures reduce with more advanced models. These findings imply consumers delegating purchases may sometimes pay more or obtain lower-rated products, and sellers cannot rely on modest price cuts or rating advantages to guarantee being selected by agents.

It is perhaps not surprising that a product's placement on a webpage dramatically influences an AI agent's purchasing decision, but unlike humans who likely focus on just a first couple of options, the choices made by AI agents can be quite confounding. Again, from the paper:

Holding all attributes constant, each model assigns a clear premium to the top row relative to the bottom row. However, the horizontal (column) patterns vary sharply. GPT-4.1 strongly favors the first column; Claude Sonnet 4, in contrast, largely ignores the first column and prefers the two middle columns; and Gemini 2.5 Flash tilts toward the third column, while columns one and two are comparatively disfavored.

The position can lead to drastic changes in selection rates. For example, for Claude Sonnet 4, moving a product from the bottom right corner (where it is selected 4.5%) to the top row in the second or third column leads to a 5-fold increase in selection rate! Interestingly, the top left corner would only yield

Like humans, AI agents do prefer cheaper products with better ratings and more reviews. However, the degree to which they value these attributes differs significantly across various AI models. This means a small boost in a product's rating can result in a much larger increase in purchase probability for one AI compared to another, creating an unpredictable market dynamic. The paper showed in an experiment that a product with a baseline selection probability of 10%, an +0.1 increase in rating lifts the probability to 15.4%, 20.3% and 16.0% with Claude Sonnet 4, GPT-4.1 and Gemini 2.5 Flash, respectively.

This, of course, leads us to the ultimate "meta game" of agentic commerce: the interaction between buyer and seller agents. As sellers deploy their own AI to optimize product descriptions and pricing in response to the behavior of buyer agents, we may enter a new era of algorithmic cat-and-mouse. The paper showed that even minor, AI-driven tweaks to a product's description can lead to substantial gains in market share in some product segments.

Of course, in real world, it won’t just be one seller agent tweaking the descriptions, rather EVERY seller out there will do exactly that! This sets the stage for a kind of "SEO game of chicken," where sellers are constantly trying to outsmart each other's algorithms. The promise here is one of a hyper-efficient marketplace, where supply and demand are perfectly matched in real-time. The pitfall, however, is a potential race to the bottom, where the richness and diversity of the marketplace are sacrificed in the name of algorithmic optimization. If you hated the SEO slops in the pre-AI world to satisfy Google’s algorithm, we may be entering a world with exponential slops trying to cater to multiple AI agents with varying level of biases.

Can a technology with a non-deterministic approach solve all these challenges? It's unlikely that we will ever achieve a perfectly predictable and rational system, but that may not be the point. We humans aren’t quite the embodiment of rationality either. But going through the paper makes me deeply unwilling, as of today, to use agents to buy anything through any of the chat bots. Ultimately, I think for agentic commerce to thrive, it needs to largely imitate my preferences. So, instead of agents deploying their own internal logic, it needs to decipher my own “internal” logic of how I decide to buy products online. Perhaps it’s possible if such an agent can observe me taking decisions in variety of contexts for weeks (months?), it can truly become my personal agent and I will feel lot more comfortable in allowing such an agent to buy things online for me.

In addition to "Daily Dose" (yes, DAILY) like this, MBI Deep Dives publishes one Deep Dive on a publicly listed company every month. You can find all the 62 Deep Dives here.

Current Portfolio:

Please note that these are NOT my recommendation to buy/sell these securities, but just disclosure from my end so that you can assess potential biases that I may have because of my own personal portfolio holdings. Always consider my write-up my personal investing journal and never forget my objectives, risk tolerance, and constraints may have no resemblance to yours.

My current portfolio is disclosed below:

This post is for paying subscribers only

Already have an account? Sign in.

This post is for paying subscribers only

Subscribe to MBI Deep Dives