9 Rethinking Assessment: Process Over Product

If AI can produce the product, then the product was never what mattered. The thinking was.

9.1 The Fundamental Question

What are you really trying to assess in business education?

Traditional approach: Can the student define key concepts? Can they list the steps in a process? Can they identify relevant frameworks?

This is assessing product — the knowledge artifact.

Process-based approach: Watch the student conduct a professional task — a negotiation, analysis, consultation, or design. Did they demonstrate the competency? Did they follow sound methodology? Did they apply frameworks appropriately in real-time?

This is assessing process — the professional methodology.

The difference matters enormously. A student can memorise definitions and still conduct a terrible negotiation. They can recite frameworks and still make decisions that expose an organisation to risk.

Professional work is a process discipline. The value lies not in what you know, but in what you do with what you know — how you investigate, communicate, analyse evidence, and make decisions under uncertainty. AI makes it possible, for the first time, to assess process at scale.

9.2 Why This Was Not Possible Before

Traditional process assessment has serious limitations. Role-play in class is time-consuming, allowing only a few students to participate per session. Peer actors vary in quality. Public performance anxiety reduces authenticity. There is minimal documentation of what actually happened.

Written case analysis tests knowledge about process rather than demonstration of process. Students can look up answers, and the format does not capture decision-making under pressure.

AI-enabled assessment addresses all of these problems. Every student gets unlimited practice in realistic scenarios where conversations are dynamic and unpredictable. Complete transcripts provide evidence of methodology. And because the conversation responds in real-time, students cannot rehearse a scripted answer.

9.3 Three Assessment Models

9.3.1 Model 1: Simulated Consultation and Process Audit

Students conduct a simulated professional consultation (a conversation with an AI persona) and then audit their own process against professional standards. The grade focuses on methodology, not outcome.

Example scenario: > You are the HR representative meeting with Taylor Kim, an employee who has requested a formal meeting to discuss concerns about their working conditions. Taylor has been with the company for 3 years and has never raised concerns before. You do not know what the specific issues are yet.

Students navigate the conversation, uncover issues, demonstrate appropriate process, and conclude professionally. Then they submit the transcript and a structured process audit where they identify every point where they applied (or failed to apply) professional standards, cite relevant principles, note missed opportunities, and explain what they would do differently.

The process audit is where the real assessment happens. The conversation produces the evidence. The audit demonstrates the understanding.

9.3.2 Model 2: Evidence-Based Intervention Plan

Students analyse data or a complex scenario using AI, then critique and improve the AI’s recommendations.

Structure: 1. Students receive data or a complex scenario 2. AI generates analysis and recommendations 3. Students critique the AI’s output — what did it miss? What assumptions are flawed? What context did it not have? 4. Students produce their own recommendation with explicit justification

The grade weights the critique and justification, not the final recommendation. A student who identifies what AI got wrong and explains why demonstrates more learning than one who accepts a correct answer uncritically.

9.3.3 Model 3: Competency-Based Critical Override

Students use AI to generate professional outputs (job descriptions, audit plans, policy documents), then critically evaluate and override the AI’s work.

Example — Recruitment: 1. AI generates 10 behavioural interview questions 2. Student selects the 5 best and rejects 5, explaining why each was kept or cut 3. AI scores mock candidate responses against the student’s rubric 4. Student overrides at least 2 AI scores with justification referencing theory, legal principles, or evidence the AI missed

The critical override is the assessment. A student who can identify what AI got wrong — and articulate why using professional knowledge — has demonstrated competence no amount of delegation can fake.

9.4 The Engagement Spectrum

Not all AI-assisted work is equivalent. A useful spectrum for assessment thinking:

Genuine collaborative thinking: the student drives the inquiry, pushes back on AI outputs, and iterates toward their own understanding. Highest cognitive engagement.
Guided drafting: the student provides context and direction, evaluates outputs critically, and modifies toward a coherent submission. Moderate engagement.
Curated delegation: the student uses AI to produce a submission and exercises judgment about what passes. Lower engagement, but not zero.
Pure delegation with no engagement: the actual failure case. Harder to achieve than assumed once any reflective or demonstrative component is required.

Rather than asking “did the student use AI?”, the question becomes “where on the engagement spectrum did the student operate?” That question is answerable from process evidence. For a rubric that operationalises this spectrum into markable performance levels, see the Rubric System appendix.

9.5 Making AI Engagement Visible: Transcript Analysis

At scale — 50 to 100 students — no marker can read every transcript in full. The solution is signal-based analysis. A lightweight script can extract the student’s own prompts from a conversation transcript and compute:

Flesch readability score on student prompts (measuring the student’s own language, not the AI’s)
Average prompt length (very short prompts suggest low engagement)
Number of turns (a four-turn conversation is not a working session)
Prompt specificity over time (do prompts become more precise? This signals learning in progress)
Presence of pushback (does the student ever question or redirect an AI response?)

The marker reviews a one-page summary per student rather than full transcripts, dipping into specific conversations only when signals flag something worth examining. This is scalable at 50 to 100 students in a way that reading every transcript is not.

One caveat: these metrics support marker judgement, they do not replace it. A low turn count might reflect efficient thinking. A high Flesch score might reflect a second-language student. The signals are a triage tool, not a verdict.

9.6 The NotebookLM Problem

Sophisticated students are already building personal research infrastructure: a notebook per unit loaded with lecture slides, readings, and rubrics, a separate notebook per assignment loaded with everything they have found on the topic. At that point the tool is not answering isolated questions — it is acting as a personalised tutor with full context.

This applies to essays, literature reviews, case studies, and reflective writing. The reflective section that feels resistant to AI is completable once a student has uploaded their own notes as context. The literature review requiring synthesis is completable once the sources are in the notebook.

Assessment types traditionally considered AI-resistant are not resistant to a student who has invested in building their own AI-ready research context. This is not a reason for alarm, but it is a reason for honesty. The stress test sequence in the appendix helps you assess where your assignments actually stand.

9.7 The Marks Split: Process Over Product

If the process of engaging with material is what we are actually trying to assess, the marks weighting should reflect that:

30% for the submitted artefact (the essay, report, or other product)
70% for the process (evidence of critical engagement, iteration, and reflective thinking)

An important clarification: the process component should not require AI use. A student who chose to think independently and kept a research journal, annotated their readings, or maintained a design log should be able to submit that instead. The assessment measures engaged thinking, not AI use. Students with limited internet access or preferences against AI should be able to demonstrate the same engagement through equivalent means.

This inversion signals to students that the destination matters less than the journey. It makes the final artefact almost irrelevant as an integrity concern. And it rewards the student who thinks carefully over time rather than the one who writes well under pressure on one occasion.

9.8 A Minimum Viable Adoption Path

The full framework represents a significant shift. Three starting points, in order of increasing commitment:

Tier 1: Add a single oral checkpoint. Ask students to spend five minutes explaining two or three decisions in their submission. No transcript required, no new rubric. This alone closes the most significant integrity gap.

Tier 2: Request a transcript with self-reflection. Ask students to submit their AI conversation alongside their work, with a brief reflection highlighting three moments where they pushed back, changed direction, or learned something unexpected. The marker reads the reflection and dips into the transcript if something seems thin.

Tier 3: Full signal-based analysis. Implement transcript analysis with automated metrics providing a one-page summary per student. Most appropriate once staff are comfortable with the earlier tiers.

Each tier is a genuine improvement on the status quo. Starting at Tier 1 and moving up over successive semesters is more sustainable than a full redesign at once.

9.9 Group Assessment in the AI Era

Group assessment has always required balancing individual accountability with shared outcomes. AI sharpens that tension in ways existing frameworks were not designed to handle.

9.9.1 The Presentation Assumption

Group presentations are widely assumed to be AI-resistant because students must show up and speak. The assumption is partially correct. But a group can rehearse AI-generated content to fluency. Fluency under rehearsal is not the same as understanding under pressure.

The integrity mechanism is not the presentation — it is the Q&A. Questions the group could not have anticipated, distributed randomly across members, surface understanding in a way rehearsed presentations cannot. Five minutes of genuine interrogation reveals more than twenty minutes of polished presentation. The marks weighting should reflect that.

9.9.2 New Problems AI Creates

AI introduces three challenges existing group frameworks were not designed for:

The rewritten section problem. Students report group members rewriting each other’s sections using AI without consent. This is not plagiarism in the traditional sense, but it undermines individual contribution. Groups need explicit agreements about what counts as editing versus replacing.
The free rider via AI problem. A student who contributes nothing can generate a plausible-looking section at the last minute. Traditional free rider detection through peer assessment becomes less reliable when the output signal is no longer correlated with effort.
The attribution problem. A polished section no longer signals high individual effort. Attribution requires process evidence, and process evidence requires infrastructure.

9.9.3 Marks Structure for Groups

Both problems point toward the same solution: assess the process, not the product.

Individual process mark (60-80%): each student’s AI conversation transcript, quality of engagement, and demonstrated personal understanding
Group product mark (20-40%): the shared artefact, assessed as a team

This changes the incentive structure completely. A student who contributes nothing cannot fabricate a rich individual transcript across a full semester. The free rider problem largely dissolves. And it mirrors professional life — nobody receives a group performance review. Every individual is assessed on their contribution.

A useful bridging component: a short Group AI Reflection where the group answers one question together — how did you decide which AI-generated ideas to keep, which to discard, and why? This surfaces collaborative sense-making that distinguishes a genuine group from a collection of individuals who assembled separately.

9.10 Core Assessment Design Principles

Whether designing individual or group assessments, four principles apply:

Process transparency — students must show their thinking process, not just final output
Critical engagement — assess how students interact with AI, not whether they use it
Authentic application — evaluate whether students can apply AI outputs in realistic professional contexts
Metacognitive development — assess students’ ability to reflect on their own learning and identify gaps

For ready-to-use rubric templates that operationalise these principles across disciplines, see the Rubric System appendix. For a structured process to stress-test any assessment against AI capabilities before deploying it, see the Stress Test Sequence appendix.

9.11 Your Action Step

Choose one assessment in your current curriculum. Ask: what am I really trying to measure — knowledge recall, professional judgement, or process skill? Then sketch how you would redesign it using one of the three models. You do not have to implement it immediately. Just think through how the shift from product to process might work in your context.

# Rethinking Assessment: Process Over Product > If AI can produce the product, then the product was never what mattered. The thinking was. ## The Fundamental Question What are you really trying to assess in business education? **Traditional approach:** Can the student define key concepts? Can they list the steps in a process? Can they identify relevant frameworks? **This is assessing product — the knowledge artifact.** **Process-based approach:** Watch the student conduct a professional task — a negotiation, analysis, consultation, or design. Did they demonstrate the competency? Did they follow sound methodology? Did they apply frameworks appropriately in real-time? **This is assessing process — the professional methodology.** The difference matters enormously. A student can memorise definitions and still conduct a terrible negotiation. They can recite frameworks and still make decisions that expose an organisation to risk. Professional work is a process discipline. The value lies not in what you know, but in what you do with what you know — how you investigate, communicate, analyse evidence, and make decisions under uncertainty. AI makes it possible, for the first time, to assess process at scale. ## Why This Was Not Possible Before Traditional process assessment has serious limitations. Role-play in class is time-consuming, allowing only a few students to participate per session. Peer actors vary in quality. Public performance anxiety reduces authenticity. There is minimal documentation of what actually happened. Written case analysis tests knowledge *about* process rather than demonstration *of* process. Students can look up answers, and the format does not capture decision-making under pressure. AI-enabled assessment addresses all of these problems. Every student gets unlimited practice in realistic scenarios where conversations are dynamic and unpredictable. Complete transcripts provide evidence of methodology. And because the conversation responds in real-time, students cannot rehearse a scripted answer. ## Three Assessment Models ### Model 1: Simulated Consultation and Process Audit Students conduct a simulated professional consultation (a conversation with an AI persona) and then audit their own process against professional standards. The grade focuses on methodology, not outcome. **Example scenario:** > You are the HR representative meeting with Taylor Kim, an employee who has requested a formal meeting to discuss concerns about their working conditions. Taylor has been with the company for 3 years and has never raised concerns before. You do not know what the specific issues are yet. Students navigate the conversation, uncover issues, demonstrate appropriate process, and conclude professionally. Then they submit the transcript and a structured process audit where they identify every point where they applied (or failed to apply) professional standards, cite relevant principles, note missed opportunities, and explain what they would do differently. The process audit is where the real assessment happens. The conversation produces the evidence. The audit demonstrates the understanding. ### Model 2: Evidence-Based Intervention Plan Students analyse data or a complex scenario using AI, then critique and improve the AI's recommendations. **Structure:** 1. Students receive data or a complex scenario 2. AI generates analysis and recommendations 3. Students critique the AI's output — what did it miss? What assumptions are flawed? What context did it not have? 4. Students produce their own recommendation with explicit justification The grade weights the critique and justification, not the final recommendation. A student who identifies what AI got wrong and explains why demonstrates more learning than one who accepts a correct answer uncritically. ### Model 3: Competency-Based Critical Override Students use AI to generate professional outputs (job descriptions, audit plans, policy documents), then critically evaluate and override the AI's work. **Example — Recruitment:** 1. AI generates 10 behavioural interview questions 2. Student selects the 5 best and rejects 5, explaining why each was kept or cut 3. AI scores mock candidate responses against the student's rubric 4. Student overrides at least 2 AI scores with justification referencing theory, legal principles, or evidence the AI missed The critical override is the assessment. A student who can identify what AI got wrong — and articulate why using professional knowledge — has demonstrated competence no amount of delegation can fake. ## The Engagement Spectrum Not all AI-assisted work is equivalent. A useful spectrum for assessment thinking: - **Genuine collaborative thinking:** the student drives the inquiry, pushes back on AI outputs, and iterates toward their own understanding. Highest cognitive engagement. - **Guided drafting:** the student provides context and direction, evaluates outputs critically, and modifies toward a coherent submission. Moderate engagement. - **Curated delegation:** the student uses AI to produce a submission and exercises judgment about what passes. Lower engagement, but not zero. - **Pure delegation with no engagement:** the actual failure case. Harder to achieve than assumed once any reflective or demonstrative component is required. Rather than asking "did the student use AI?", the question becomes "where on the engagement spectrum did the student operate?" That question is answerable from process evidence. For a rubric that operationalises this spectrum into markable performance levels, see the Rubric System appendix. ## Making AI Engagement Visible: Transcript Analysis At scale — 50 to 100 students — no marker can read every transcript in full. The solution is signal-based analysis. A lightweight script can extract the student's own prompts from a conversation transcript and compute: - **Flesch readability score on student prompts** (measuring the student's own language, not the AI's) - **Average prompt length** (very short prompts suggest low engagement) - **Number of turns** (a four-turn conversation is not a working session) - **Prompt specificity over time** (do prompts become more precise? This signals learning in progress) - **Presence of pushback** (does the student ever question or redirect an AI response?) The marker reviews a one-page summary per student rather than full transcripts, dipping into specific conversations only when signals flag something worth examining. This is scalable at 50 to 100 students in a way that reading every transcript is not. One caveat: these metrics support marker judgement, they do not replace it. A low turn count might reflect efficient thinking. A high Flesch score might reflect a second-language student. The signals are a triage tool, not a verdict. ## The NotebookLM Problem Sophisticated students are already building personal research infrastructure: a notebook per unit loaded with lecture slides, readings, and rubrics, a separate notebook per assignment loaded with everything they have found on the topic. At that point the tool is not answering isolated questions — it is acting as a personalised tutor with full context. This applies to essays, literature reviews, case studies, and reflective writing. The reflective section that feels resistant to AI is completable once a student has uploaded their own notes as context. The literature review requiring synthesis is completable once the sources are in the notebook. Assessment types traditionally considered AI-resistant are not resistant to a student who has invested in building their own AI-ready research context. This is not a reason for alarm, but it is a reason for honesty. The stress test sequence in the appendix helps you assess where your assignments actually stand. ## The Marks Split: Process Over Product If the process of engaging with material is what we are actually trying to assess, the marks weighting should reflect that: - **30% for the submitted artefact** (the essay, report, or other product) - **70% for the process** (evidence of critical engagement, iteration, and reflective thinking) An important clarification: the process component should not require AI use. A student who chose to think independently and kept a research journal, annotated their readings, or maintained a design log should be able to submit that instead. The assessment measures engaged thinking, not AI use. Students with limited internet access or preferences against AI should be able to demonstrate the same engagement through equivalent means. This inversion signals to students that the destination matters less than the journey. It makes the final artefact almost irrelevant as an integrity concern. And it rewards the student who thinks carefully over time rather than the one who writes well under pressure on one occasion. ## A Minimum Viable Adoption Path The full framework represents a significant shift. Three starting points, in order of increasing commitment: **Tier 1: Add a single oral checkpoint.** Ask students to spend five minutes explaining two or three decisions in their submission. No transcript required, no new rubric. This alone closes the most significant integrity gap. **Tier 2: Request a transcript with self-reflection.** Ask students to submit their AI conversation alongside their work, with a brief reflection highlighting three moments where they pushed back, changed direction, or learned something unexpected. The marker reads the reflection and dips into the transcript if something seems thin. **Tier 3: Full signal-based analysis.** Implement transcript analysis with automated metrics providing a one-page summary per student. Most appropriate once staff are comfortable with the earlier tiers. Each tier is a genuine improvement on the status quo. Starting at Tier 1 and moving up over successive semesters is more sustainable than a full redesign at once. ## Group Assessment in the AI Era Group assessment has always required balancing individual accountability with shared outcomes. AI sharpens that tension in ways existing frameworks were not designed to handle. ### The Presentation Assumption Group presentations are widely assumed to be AI-resistant because students must show up and speak. The assumption is partially correct. But a group can rehearse AI-generated content to fluency. Fluency under rehearsal is not the same as understanding under pressure. The integrity mechanism is not the presentation — it is the Q&A. Questions the group could not have anticipated, distributed randomly across members, surface understanding in a way rehearsed presentations cannot. Five minutes of genuine interrogation reveals more than twenty minutes of polished presentation. The marks weighting should reflect that. ### New Problems AI Creates AI introduces three challenges existing group frameworks were not designed for: - **The rewritten section problem.** Students report group members rewriting each other's sections using AI without consent. This is not plagiarism in the traditional sense, but it undermines individual contribution. Groups need explicit agreements about what counts as editing versus replacing. - **The free rider via AI problem.** A student who contributes nothing can generate a plausible-looking section at the last minute. Traditional free rider detection through peer assessment becomes less reliable when the output signal is no longer correlated with effort. - **The attribution problem.** A polished section no longer signals high individual effort. Attribution requires process evidence, and process evidence requires infrastructure. ### Marks Structure for Groups Both problems point toward the same solution: assess the process, not the product. - **Individual process mark (60-80%):** each student's AI conversation transcript, quality of engagement, and demonstrated personal understanding - **Group product mark (20-40%):** the shared artefact, assessed as a team This changes the incentive structure completely. A student who contributes nothing cannot fabricate a rich individual transcript across a full semester. The free rider problem largely dissolves. And it mirrors professional life — nobody receives a group performance review. Every individual is assessed on their contribution. A useful bridging component: a short Group AI Reflection where the group answers one question together — how did you decide which AI-generated ideas to keep, which to discard, and why? This surfaces collaborative sense-making that distinguishes a genuine group from a collection of individuals who assembled separately. ## Core Assessment Design Principles Whether designing individual or group assessments, four principles apply: 1. **Process transparency** — students must show their thinking process, not just final output 2. **Critical engagement** — assess how students interact with AI, not whether they use it 3. **Authentic application** — evaluate whether students can apply AI outputs in realistic professional contexts 4. **Metacognitive development** — assess students' ability to reflect on their own learning and identify gaps For ready-to-use rubric templates that operationalise these principles across disciplines, see the Rubric System appendix. For a structured process to stress-test any assessment against AI capabilities before deploying it, see the Stress Test Sequence appendix. ## Your Action Step Choose one assessment in your current curriculum. Ask: what am I really trying to measure — knowledge recall, professional judgement, or process skill? Then sketch how you would redesign it using one of the three models. You do not have to implement it immediately. Just think through how the shift from product to process might work in your context.