Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus convallis sem tellus, vitae egestas felis vestibule ut.

Error message details.

Reuse Permissions

Request permission to republish or redistribute SHRM content and materials.

Improving Performance Evaluations Using Calibration

Get managers onboard and keep the process on track

DALLAS—“Few things come out lower on employee engagement surveys than performance appraisals, but we do them because the data is needed” to ensure fairness with compensation, promotions—or terminations, said Dick Grote of Grote Consulting Corp., during his presentation at the 2014 WorldatWork Total Rewards Conference, held here May 19-21.

“A good system doesn’t start with assessment; a good system starts with performance planning,” he noted. “Performance appraisals will always be difficult, but managers make them even more so when they don’t make their objectives clear at the start of the year” — for instance, regarding expectations for measurable outcomes and accomplishments (results) and demonstration of competencies (behaviors).

“If you believe in pay for performance, you must have data to differentiate where people stand,” Grote explained. He proposed the radical notion that performance management systems can actually work, and that employers have “an ethical obligation” to employees to see that they do.

To ensure that performance appraisal ratings are accurate, organizations should use ratings distribution guidelines, have appraisals reviewed by the appraisal writer’s supervisor, and hold calibration sessions.

Distribution Guidelines

In performance appraisal systems, the 3-tier or 5-tier ratings scales are the most prevalent, Grote noted, although he recommends seven levels, having seen managers frequently try to add “plusses” or “minuses” to the scale in an effort to increase granularity and differentiation.

An ongoing challenge is that when employees receive a middle rating such as a 3 on a 5-point scale, it has “a connotation of mediocrity.” Workers “make the bogus analogy with a ‘C’ average in school. Instead, employers should “send a connotation of success” by defining a “3” as “a good solid performer.” At pharmaceutical firm Merck, a “3” rating is labeled “full Merck standard.”

This is vital because the majority of workers, in an effective rating program, are going to be clustered in the middle in order to truly recognize those who do outstanding work. Relatedly, if almost no one is being rated as needing improvement, “you should consider raising the bar for acceptable performance,” Grote said.

Distribution requirement guidelines, whether “forced” (as in controversial stacked ratings) or, more commonly, simply recommended, help ensure that the ratings have meaning.

“From our school days, we all remember that some teachers were easy graders and some were strict. An ‘A’ from professor Smith was equivalent to a ‘B’ from professor Jones.” Distribution requirements can “ensure that the same yardstick is used, so that performance ratings are accurate.”

Grote offered the following basic distribution requirement as “a reasonable example”:

Unsatisfactory /1

2% - 5%

Needs improvement /2

10% - 15%

Good solid performer /3

50% - 60%

Superior /4

20% - 30%

Distinguished /5

5% - 10%

One issue is weak managers who tell subordinates that “the process” forces them to give out less-than-stellar appraisals. Again, train managers to communicate that a midlevel rating connotes a good, solid performance — and that they are expected to “wear the company hat” instead of passing the buck.

As a matter of policy, Grote also advised that “all performance appraisals and ratings must be reviewed and approved by the appraisal-writer’s boss before any other action is taken.” When managers know that their own supervisor will see how they are rating their direct reports, “it encourages managers to take the matter seriously.”

Calibration Sessions

The objective of calibration sessions is to ensure that different managers apply similar standards in measuring and evaluating the performance of subordinates — that is, “to ensure a level playing field by neutralizing the effect of ‘tough graders’ and ‘easy graders’ on performance appraisal ratings,” Grote said.

The process typically operates this way:

  • Managers prepare preliminary performance appraisals, including proposed appraisal ratings.
  • Managers who supervise similar groups of employees meet and post names and ratings for all to review.
  • Participants review and discuss their proposed appraisal ratings for every employee.
  • Participants adjust ratings to assure accuracy and consistency.
  • Final performance appraisals are prepared.

Calibration “makes it easier for managers to deliver honest but negative performance appraisals,” Grote said. It also has the benefit of exposing talented employees to a larger number of senior leaders.

To work well, managers must be trained to prepare for and participate appropriately in a calibration session, and to differentiate performance accurately — particularly when many people are rated the same. Grote also highlighted the need to “provide skilled and brave facilitators” for calibration sessions, particularly the first time. “The facilitator’s role is to make sure that ratings managers have data, not just favorable or unfavorable views,” he noted.

Grote recommends the traditional “low tech” system of using wall-mounted flip charts divided by the ratings scale, with a Post-it note for each employee being evaluated, so that the distribution of Post-it notes in relation to the distribution guidelines is visually clear.

While it’s important to group employees together in appropriate pools based on the common nature of work, the process can work when evaluating large groups, using big boards that take up a good share of wall space and dozens of Post-its, with sessions lasting up to two to three hours.

“Give out ground rules for appropriate behavior during the session, and make participants responsible for policing each other,” Grote advised.

With discussion, Post-it notes get moved as managers realize they may be applying different standards than other managers, and as they share their experiences with employees who aren’t direct reports or their evaluations of those employees’ work results.

Managers may be asked why someone is a “4” and not a “3” — and to justify that decision with data and examples. Similarly, they may be asked why someone whose work is highly regarded wasn’t rated a “4” or a “5.”

“There’s an emotional impact when a manager realizes his or her rating of an employee was too low or too high, and gets up and changes the Post-it’s position,” Grote observed.

Stephen Miller, CEBS, is an online editor/manager for SHRM. Follow him on Twitter @SHRMsmiller.


​An organization run by AI is not a futuristic concept. Such technology is already a part of many workplaces and will continue to shape the labor market and HR. Here's how employers and employees can successfully manage generative AI and other AI-powered systems.