A trendy workplace behavior recently emerged that offers an important lesson for HR leaders: when companies measure activity instead of business outcomes, employees will often optimize for the metric rather than the result.
In this case, it’s “tokenmaxxing,” or the maximum consumption of AI tokens — the units of data processed by large language models — to signal productivity, innovation, and commitment to AI adoption. What began as an effort to encourage experimentation with AI tools is increasingly revealing the risks of poorly designed performance metrics.
Several major technology companies have promoted AI usage through formal or informal tracking systems. In some cases, employees competed on leaderboards that measured token consumption or activity on AI development platforms. The assumption was straightforward: higher usage signaled greater innovation and productivity.
However, organizations are now discovering unintended consequences. According to reports, Amazon recently shut down an internal leaderboard which tracked employee activity on the company’s AI development platform. The ranking system reportedly encouraged some employees to deploy AI agents to perform unnecessary tasks simply to increase their standing. As a result, AI-related costs rose without a corresponding increase in business value.
The phenomenon highlights a familiar challenge for HR: employees tend to perform to the metrics by which they are evaluated. The pattern mirrors other workplace measurement failures, where activity metrics eventually become disconnected from organizational goals.
“The path from informal leaderboard to team OKR to formal competency on a performance review is short. And it’s already moving,” warned Jake Paul, product and innovation analyst at Kyle and Co, a research firm based in Boston. “I’d bet that within 12 months, ‘demonstrates AI fluency’ shows up as a rated competency in a meaningful share of enterprise performance rubrics, with some semblance of usage volume hiding underneath.”
That prospect raises important questions about how organizations define and measure AI effectiveness.
A Familiar Measurement Mistake
Paul said that AI token tracking represents the latest version of a management error organizations have made for decades.
“Every generation of knowledge work seems to go through a different flavor of the same mistake,” he said. “IBM spent decades evaluating engineers on total lines of code written. The result wasn’t better software. It was bloated software, written by engineers who’d figured out that the metric rewarded quantity, not quality.”
The same dynamic can emerge when organizations treat AI usage as a proxy for performance. Once AI usage metrics become tied to recognition, promotions, compensation, or performance evaluations, employees have strong incentives to maximize visible activity, he said.
“The moment that lands on a comp plan, performative work isn’t a risk,” Paul said. “It’s the rational employee response.”
Instead of producing better outcomes, organizations may find themselves rewarding what Paul describes as “a performance of work.”
Wesley Paterson, a senior management consultant and president of Paterson Consulting Inc., in Alberta, Canada, believes the problem reflects a broader failure of measurement.
“Measuring tool usage instead of business outcomes guarantees systemic organizational failure,” Paterson said. The danger, he said, is that leaders often gravitate toward metrics that are easy to collect rather than those that truly reflect organizational performance.
“When leaders do not know what outcomes actually matter, they measure what is easy to see,” Paterson said. “Then everyone optimizes for visibility instead of value.”
Call quotas, hours worked, emails sent, meetings attended, and other activity-based metrics have frequently created incentives that undermine the very outcomes they were intended to improve.
The challenge becomes particularly acute as organizations seek to define AI fluency. While AI adoption remains a legitimate business priority, Paul said that leaders should distinguish between measuring adoption and measuring performance.
“ ‘Are they using it?’ is an adoption question,” he said. “ ‘Tokens per quarter’ is a performance metric. Those aren’t the same thing, and they shouldn’t be evaluated the same way.”
What HR Should Measure Instead
The metrics HR leaders already track may provide a better foundation for evaluating AI’s effectiveness. Measures such as quality of hire, customer satisfaction, productivity, revenue per employee, retention, and employee experience can help determine whether AI is actually creating value.
“The question isn’t whether your team is using AI enough,” Paul said. “It’s whether your outcomes are getting better because of AI.”
Paterson agreed, saying that outcome-based measurement should focus on the specific problems AI was intended to solve. “Did decision speed improve? Did error rates drop? Did customer satisfaction move? If you cannot connect the tool directly to the outcome, you are managing illusions,” he said.
For HR leaders, that may require rethinking how AI competency is incorporated into performance management systems. Rather than rewarding visible AI activity, organizations should assess whether employees are using AI to improve the quality, speed, and impact of their work.
Paterson offers a simple prescription: “Stop tracking activity. Start tracking capability.”
Was this resource helpful?