25 AI and Machine Learning in Performance Management

Learning Objectives

By the end of this chapter, readers will be able to:

Distinguish the four modes in which AI and machine learning enter performance management: automation, augmentation, prediction, and generation.
Explain the theoretical foundations of algorithmic decision-making, sociotechnical design, and fairness in machine learning as applied to people systems.
Identify specific applications, including natural language processing for feedback synthesis, predictive analytics for attrition and potential, sentiment analysis, and generative AI for drafting reviews and goals.
Articulate the risks of AI in PM: algorithmic bias, surveillance, opacity, deskilling, and false precision.
Design governance and human-in-the-loop arrangements that keep AI a decision support rather than a decision maker.
Describe Indian-context considerations for enterprise AI adoption in people processes.

25.1 Introduction

Why This Chapter Exists

The last decade has seen AI and machine learning move from research laboratories into mainstream performance management tooling. Every major HR platform now embeds machine learning in some form: for text suggestions, for predictive analytics, for anomaly detection, for skill inference, or for generative content. The conversation is no longer whether AI belongs in PM but how, where, and under what safeguards.

A Principled Approach

This chapter adopts a cautious-but-curious stance. AI in PM can genuinely help: it can surface patterns invisible to the eye, reduce the drudgery of documentation, democratise access to good coaching prompts, and free managers’ attention for the irreducible human work. It can also genuinely harm: it can encode and amplify bias, introduce surveillance, deskill judgement, and produce false precision that misleads decision-makers.

The Stance of This Chapter

AI should augment human judgement in performance management, not replace it. Decisions about people, their development, their rewards, their futures, are decisions where accountability must rest with identifiable humans who understand both the individual and the system. The right question is not how much AI can do but where its help is most valuable and least harmful.

25.2 Four Modes of AI in Performance Management

Figure 25.1: Four modes of AI in performance management

flowchart TD
    A[AI in Performance Management] --> B[Automation]
    A --> C[Augmentation]
    A --> D[Prediction]
    A --> E[Generation]

    B --> B1[Workflow routing<br/>Reminder scheduling<br/>Data extraction<br/>Report assembly]
    C --> C1[Manager assistants<br/>Feedback prompts<br/>Coaching suggestions<br/>Pattern surfacing]
    D --> D1[Attrition risk<br/>Performance trajectory<br/>Potential scoring<br/>Bias detection]
    E --> E1[Review drafting<br/>Goal language<br/>Development plan drafts<br/>Feedback synthesis]

    classDef main fill:#1f4e79,color:#fff,stroke:#0d2840,stroke-width:2px
    classDef mode fill:#2e75b6,color:#fff,stroke:#1f4e79,stroke-width:2px
    classDef example fill:#9dc3e6,color:#000,stroke:#2e75b6,stroke-width:2px
    class A main
    class B,C,D,E mode
    class B1,C1,D1,E1 example

Automation

Automation takes a rule-based task and removes the human from its execution. Routing a goal submission to an approver, sending a calibration reminder, extracting metrics from source systems, assembling a standard report, these are tasks where the logic is clear and the value of human attention is low. Automation is the oldest and least controversial AI use in PM.

Augmentation

Augmentation enhances human judgement with information, prompts, or suggestions. A manager writing a review gets a prompt about specific behaviours observed; a calibration participant sees suggested comparisons; a leader is alerted to anomalies in their team’s rating patterns. The human remains the decision maker; AI shapes the information environment in which the decision is made.

Prediction

Prediction uses historical data to forecast outcomes: who is at risk of leaving, whose performance trajectory is diverging, who might be ready for stretch roles. Prediction is powerful but epistemologically fragile, it depends on the past being a reliable guide to the future, on data being unbiased, and on predictions being used as input to human judgement rather than as verdicts.

Generation

Generation, the newest mode, uses large language models to produce content: review drafts, goal statements, feedback syntheses, development plan templates. Generation can reduce the tyranny of the blank page but introduces risks of homogenisation, hallucination, and a false confidence in polished-sounding but shallow output.

25.3 Theoretical Foundations

Algorithmic Decision-Making

The literature on algorithmic decision-making distinguishes between decisions delegated to algorithms (fully automated) and decisions informed by algorithms (human in the loop). The former transfers accountability to whoever designed the algorithm; the latter retains accountability with the human decision-maker. In people systems, the case for the former is weak, and the case for the latter is strong only when the human actually exercises judgement rather than rubber-stamping algorithmic output.

Fairness in Machine Learning

Machine learning models trained on historical data inherit the biases embedded in that data. If past promotions favoured a particular group, a model trained to predict promotion readiness will replicate that bias. The technical literature offers multiple definitions of fairness, each with trade-offs; none removes the underlying issue that a model is only as fair as the data and the choices that produced it (H. Aguinis, 2013).

Sociotechnical Design Revisited

AI systems are sociotechnical: the algorithm, the user interface, the organisational policies, and the human practices together determine outcomes. An unbiased algorithm deployed badly can produce biased decisions; a well-deployed but flawed algorithm can still cause harm. Design must consider the whole system.

Epistemic Humility

The theoretical literature on AI consistently counsels epistemic humility: awareness that models are approximations, that confidence intervals are wider than point predictions suggest, and that edge cases matter disproportionately in human judgements. Organisations that deploy AI in PM without building this humility into their processes tend to over-trust outputs and under-question assumptions (M. London, 2003).

25.4 Specific Applications

Natural Language Processing for Feedback Synthesis

Free-text feedback, from 360 surveys, exit interviews, engagement surveys, customer feedback, accumulates into volumes no human can read. NLP models can categorise, summarise, and surface themes, making this qualitative data accessible for decision-making. The risk is over-reliance on summaries that may miss subtle but important signals.

Predictive Analytics for Attrition

Attrition prediction models flag individuals or populations at elevated risk of leaving, based on patterns in historical data, engagement scores, compensation trajectories, promotion timing, and other signals. Used well, they direct retention investment to where it is most needed. Used badly, they label individuals in ways that become self-fulfilling and that can discriminate against groups whose historical patterns differ.

Potential Scoring

Some platforms offer algorithmic potential scoring, ranking employees on likelihood of advancement based on performance history, skills inferences, and comparable trajectories. These scores can be useful inputs to succession conversations but become dangerous when treated as definitive. The concerns raised in Chapter 21 about labelling, equity, and self-fulfilling prophecies apply with added force when the label is produced by an algorithm and carries the authority of apparent objectivity.

Sentiment Analysis

Sentiment analysis of employee communications, survey responses, and feedback text can track morale trends at population level. At individual level, it raises acute surveillance concerns and should generally be avoided. At team or organisation level, with appropriate aggregation and governance, it can complement traditional engagement measurement.

Generative AI for Drafting

Large language models can draft review language, goal statements, development plans, and feedback. This can reduce cycle time and help managers who struggle with writing. The risks are real: homogenisation of voice, hallucination of details not supported by evidence, and the moral hazard of managers signing off on text they did not compose and may not fully endorse. Effective deployments keep the human firmly in the generative loop, using AI as a first-draft tool rather than a final-output tool.

Bias Detection

AI can also be turned on the performance management system itself, surfacing patterns that suggest bias: rating distributions that differ by gender or ethnicity in ways not explained by performance, promotion timing that varies by group, feedback language that is systematically different for different populations. This application of AI is perhaps its most valuable in PM, using the technology to audit the very processes in which it might otherwise amplify bias.

25.5 Risks

Figure 25.2: Principal risks of AI in performance management

flowchart LR
    A[Risks of AI in PM] --> B[Algorithmic Bias]
    A --> C[Surveillance]
    A --> D[Opacity]
    A --> E[Deskilling]
    A --> F[False Precision]

    B --> B1[Historical bias in data<br/>Proxy variables<br/>Disparate impact]
    C --> C1[Activity monitoring<br/>Sentiment tracking<br/>Erosion of trust]
    D --> D1[Black-box models<br/>Unexplainable outputs<br/>Reduced accountability]
    E --> E1[Atrophy of manager skill<br/>Over-reliance<br/>Loss of judgement]
    F --> F1[Confident wrong answers<br/>Misleading precision<br/>Overstated certainty]

    B1 --> G[Need for Governance]
    C1 --> G
    D1 --> G
    E1 --> G
    F1 --> G

    classDef main fill:#1f4e79,color:#fff,stroke:#0d2840,stroke-width:2px
    classDef risk fill:#c00000,color:#fff,stroke:#7f0000,stroke-width:2px
    classDef detail fill:#fbe5e5,color:#000,stroke:#c00000,stroke-width:2px
    classDef outcome fill:#70ad47,color:#fff,stroke:#385723,stroke-width:2px
    class A main
    class B,C,D,E,F risk
    class B1,C1,D1,E1,F1 detail
    class G outcome

Algorithmic Bias

Models learn from data. If data encodes the biases of past decisions, models replicate them. Proxy variables, such as tenure, commute distance, or educational institution, can carry discriminatory signal even when protected attributes are removed. Disparate impact on protected groups can violate both ethics and, in some jurisdictions, law.

Surveillance Drift

The data infrastructure required for AI can tempt organisations into more intrusive monitoring than they originally intended. Productivity tracking, application activity, communication monitoring, and location tracking all become technically possible, and the instinct to use available data is strong. The erosion of trust this produces can damage the very performance it aims to improve.

Opacity and Accountability

Complex models, particularly deep learning, can be difficult or impossible to explain in human terms. When an algorithm contributes to a consequential decision about an employee and cannot be explained, accountability becomes unclear. Regulatory frameworks such as the EU AI Act and emerging Indian guidance increasingly require explainability for high-risk uses.

Deskilling of Managers

When AI drafts feedback, suggests goals, and recommends ratings, managers’ own skills can atrophy. A manager who has always used AI prompts for coaching conversations may struggle without them. The long-run risk is a management cadre dependent on tooling that, if ever removed or compromised, leaves a capability vacuum.

False Precision

AI outputs often come with confidence scores, similarity rankings, or probabilistic predictions that appear precise. The precision can be false. A 73 per cent attrition risk score may convey nothing more than a rough categorical hunch; treating it as a scientifically derived probability misleads decision-makers into unwarranted confidence (H. Aguinis, 2013).

25.6 Governance of AI in Performance Management

Model Validation

Any AI model used in PM decisions should be validated: tested for accuracy on representative data, audited for bias across protected groups, and reviewed periodically as underlying data changes. Validation is not a one-time event; models drift as the populations they serve evolve.

Transparency

Users and subjects of AI-influenced decisions should know when AI is in the loop, what it does, and what its limitations are. Employees whose performance is influenced by algorithmic input have a legitimate interest in understanding it. Transparent disclosure builds trust; opacity destroys it.

Human in the Loop

For any decision of consequence, a human must meaningfully exercise judgement. Rubber-stamping algorithmic recommendations is worse than no algorithm at all, because it transfers the moral and legal weight of a decision to a machine while preserving the appearance of human accountability.

Audit and Documentation

Decisions influenced by AI should be logged with sufficient detail to allow later review: what input data were used, what model version produced the output, what human judgement was applied on top. This enables both learning and defensibility.

The Ethical Review Board

Progressive organisations are establishing internal ethics boards or review mechanisms for AI in people systems, drawing on models from clinical research. These bodies review proposed uses, establish guardrails, and adjudicate edge cases. The investment is modest; the protection against reputational and regulatory harm is substantial.

25.7 The Regulatory Environment

Global Trends

The EU AI Act classifies AI in employment decisions as high-risk, triggering requirements around documentation, conformity assessment, and human oversight. US federal agencies and several states have issued guidance specific to AI in hiring and performance. These regimes are evolving rapidly; compliance is becoming a first-order concern for multinational organisations.

India’s Emerging Framework

India’s regulatory approach to AI is developing through a combination of the Digital Personal Data Protection Act, sectoral guidance, and emerging AI-specific principles. Organisations operating in India must track these developments and design their AI uses to be robust to probable tightening of rules.

25.8 Indian Context

Enterprise AI Adoption Curve

Indian large enterprises, particularly in IT services, financial services, and e-commerce, are adopting AI in HR processes at pace. Mid-market and traditional-industry adopters are further back on the curve but accelerating. Vendor offerings increasingly include AI features by default, making adoption sometimes incidental rather than deliberate.

Talent and Skills

India has a deep pool of AI and data science talent, much of which staffs global technology companies. Within the HR function itself, however, AI literacy is often thin. Building HR teams that can critically evaluate AI vendor claims, understand limitations, and govern deployments is an under-invested priority.

Data Quality and Representation

Indian organisational data often has quality and representation issues: missing fields, inconsistent coding across business units, legacy assumptions embedded in data structures, and skewed representation of demographic groups. Models trained on such data inherit these problems. Data quality investment is a precondition to responsible AI deployment.

The Cultural Appropriation Risk

AI models trained largely on Western data may not reflect Indian workplace dynamics, communication norms, or cultural context. A sentiment model trained on English corpora may misread Indian workplace communication; a potential model calibrated on Western career patterns may miss Indian career pathways. Localisation is not optional for AI in Indian HR.

25.9 Case Studies

Case: Wipro and Enterprise AI in HR Processes

Wipro, a leading Indian IT services company, has long invested in AI both as a business capability sold to clients and as an internal tool applied to its own operations. In performance management, Wipro has deployed AI across multiple functions. Natural language processing is used to analyse employee feedback at scale, with themes surfaced to HR business partners and executive leadership. Predictive models inform attrition risk and skills-gap identification, helping the company pre-empt losses and target reskilling investments. Generative AI features have been piloted for review drafting and goal-setting support, with explicit emphasis on managers editing rather than accepting unchanged. Wipro’s scale, with hundreds of thousands of employees and operations across dozens of countries, makes AI deployment a matter of both opportunity and care. The company has published guidance on responsible AI use and has built governance mechanisms around its internal deployments, including model validation, bias auditing, and human-in-the-loop requirements for consequential decisions. Challenges have included managing the pace of AI feature adoption across a diverse manager population, maintaining model performance as the business evolves, and reconciling global AI policies with local regulatory requirements across jurisdictions. The case illustrates the strengths of a technically mature organisation deploying AI in HR thoughtfully, and the continuing difficulty of keeping AI genuinely subordinated to human judgement at scale.

Case: Swiggy and Algorithmic Performance Management for a Distributed Workforce

Swiggy, India’s leading food delivery platform, presents a distinctive case: performance management for a very large workforce of delivery partners who operate largely through algorithmic systems. The platform’s algorithms allocate orders, optimise routes, measure performance, calculate incentives, and make decisions about continued engagement, all largely without human intermediation at the individual level. This is not performance management as traditionally understood; it is algorithmic management, a newer and rawer application of AI to workforce oversight. Swiggy’s experience illustrates both what algorithmic management can do and what it cannot. On the capability side, the platform can coordinate a massive distributed workforce at a speed and scale that human management cannot match. Performance metrics are immediate, feedback is frequent, and incentive adjustments are responsive. On the difficulty side, the platform must manage worker relations, fairness concerns, and regulatory attention in ways that traditional employers are not prepared for. Disputes over algorithmic decisions, concerns about data transparency, and questions about the social contract between platform and worker have all surfaced. Swiggy, in common with peer platforms in India and globally, has iterated its algorithms and policies in response to worker feedback and regulatory dialogue, experimenting with human escalation paths, greater transparency, and mechanisms for voice. The case illustrates that algorithmic performance management is not a small extension of traditional PM but a category change, raising questions that will shape employment law, labour economics, and management practice for the next decade.

25.10 The Future Direction

The Integration Continues

AI capabilities will continue to integrate into performance management platforms. Some integrations will be useful; others will be solutions in search of problems. Organisations that treat each new capability as requiring justification rather than adopting by default will make better choices.

Where Humans Remain Essential

No foreseeable advance in AI will remove the need for human judgement in decisions about people. The reasons are both ethical, humans must be accountable for decisions that shape others’ lives, and practical, performance ultimately rests on relationships, trust, and shared understanding that are irreducibly human. AI can support these; it cannot replace them (T. V. Rao, 2008).

The Dystopia to Avoid

The dystopian trajectory is visible: algorithmic micromanagement, pervasive surveillance, opaque decision-making, and the reduction of employees to data points. It is not inevitable, but it is not automatically avoided either. Each organisation’s choices, and the regulatory frameworks that shape them, determine which direction AI in PM takes.

A Closing Principle

The measure of AI in performance management is not how much it automates, predicts, or generates. The measure is whether it helps people work better, grow more fully, and be treated more fairly than they would be without it. Judged by this standard, much current AI in PM falls short, and the task of the coming years is to raise that standard.

25.11 Summary

Summary

AI enters performance management in four modes. Automation handles routine work. Augmentation supports human judgement. Prediction forecasts attrition, performance, and potential. Generation drafts text at scale. Each offers real capability and each introduces distinct risk, and confusing the modes produces policy built on the wrong concerns (H. Aguinis, 2013; M. London, 2003).
Theoretical foundations counsel restraint. Algorithmic decision-making theory, fairness in machine learning, and sociotechnical design each argue that AI is a component of a broader human system, not a substitute for human judgement. Organisations that miss this framing build systems that drift toward replacement rather than support (H. Aguinis, 2013; M. London, 2003).
The applications are specific and growing. NLP-based feedback synthesis, predictive analytics for attrition and potential, sentiment analysis, generative drafting, and bias detection each have defensible uses and characteristic failure modes. The discipline is matching tool to purpose rather than deploying capability because it is available (H. Aguinis, 2013; M. London, 2003).
The risks are not hypothetical. Algorithmic bias replicates historical patterns. Surveillance drift expands monitoring beyond original consent. Opacity prevents meaningful appeal. Manager deskilling erodes the judgement the system was meant to support. False precision mistakes sophistication for accuracy. Each requires deliberate governance to manage (H. Aguinis, 2013; M. London, 2003).
Governance is the work. Model validation, transparency, human-in-the-loop requirements, audit trails, and ethical review are the governance mechanisms that turn AI from a liability into a support. Organisations that deploy AI without them typically discover the governance gap after a harmful outcome rather than before (H. Aguinis, 2013; T. V. Rao, 2008).
The regulatory environment is evolving quickly. The EU AI Act, emerging US guidance, and India’s developing framework all shape what is permissible and advisable, and any deployment plan that treats the regulatory landscape as fixed will be overtaken within its own life cycle (H. Aguinis, 2013; T. V. Rao, 2008).
The Indian context shapes feasible practice. Enterprise AI adoption is accelerating. HR AI literacy lags deployment speed. Data quality remains a structural constraint. The cultural fit of Western-trained models requires careful localisation. Each of these deserves explicit attention in Indian programmes rather than the default assumption that Western playbooks transfer intact (H. Aguinis, 2013; T. V. Rao, 2008).
The measure of value is human. AI’s value in performance management is measured not by what it replaces but by whether it helps people work better, grow more fully, and be treated more fairly. Applied consistently, that principle gives AI its proper place as a servant to the human purposes of performance management, not its master (H. Aguinis, 2013; M. London, 2003).
Case lessons: Wipro illustrates the thoughtful deployment of AI within a traditional performance management system, using the technology to augment manager capacity and surface patterns without displacing the performance conversation at the centre of the system. Swiggy shows how algorithmic management has emerged for a distributed gig-adjacent workforce, where the algorithm is not an adjunct to the appraisal but the appraisal itself, raising fairness and transparency questions that the discipline is only beginning to answer. Together they define the range within which the next decade of practice will unfold (H. Aguinis, 2013; T. V. Rao, 2008).