How CAIS is examined.
How the cut score is set.
How we would defend it.
An examination is only as credible as the methodology behind it. This document is the public, citable specification for how every CAIS examination is constructed, administered, scored, and audited — from the Common Body of Knowledge down to the individual item. It is designed to be held up to the standards an accreditation body would apply under ISO/IEC 17024 §9.2, and to be referenced in regulatory filings, employer due-diligence dossiers, and judicial review. Candidates, employers, regulators, and accreditation bodies read the same blueprint.
The blueprint is public. The methodology is citable. The cut score is not a negotiation.
Six principles this blueprint is built to satisfy.
Defensibility over cleverness. Every choice below is designed to survive audit.
01 · Criterion-referenced, not norm-referenced
Candidates are measured against a fixed competency standard derived from the CAIS Common Body of Knowledge, not against the performance of other candidates. A candidate passes because they demonstrated the required competency — not because they outperformed a cohort.
02 · Competency-based and job-task anchored
Every domain weight, item, scenario, and Build Task is traceable to a documented job task in the CAIS Practice Analysis. The Practice Analysis is refreshed on a three-year cycle under Standards Council oversight and is published in the Standards Library.
03 · Public blueprint, private item bank
Domain weights, item-type distributions, scoring methodology, and cut-score methodology are public. Individual items, the item bank, SME rating sheets, and candidate response data are not. The public part is what makes the credential citable; the private part is what makes it secure.
04 · Standards-aligned construction
Test construction follows principles derived from ISO/IEC 17024 §9.2 (examination) and the Standards for Educational and Psychological Testing (AERA/APA/NCME) where applicable. Divergences are documented in the Document Control section of this instrument.
05 · Fairness review before every administration
Every form is reviewed for content fairness and cultural sensitivity by a Fairness Subcommittee drawn from the Ethics Review Board. Post-administration, items are reviewed for statistical fairness via Differential Item Functioning (DIF) analysis.
06 · Public Registry record of every administration
Every completed examination attempt is committed to the Public Verification Registry as a signed attestation record referencing candidate wallet, form ID hash, administration timestamp, result status, and the issuing Standards Council. The attestation is the authoritative record of administration. The Registry is the human-readable mirror. If the two disagree, the registry prevails.
One standard. Four examinations.
Each tier measures a distinct zone of professional competency. Each has its own blueprint, its own cut score, its own security regime.
| Tier | What it measures | Written exam | Evidence component | Review | Target rate* |
|---|---|---|---|---|---|
| Practitioner CAIS-P | Safe AI application within a defined role | 75 items 3 hr seat time MCQ + scenarios | 7 in-course builds Formative — not assessed | — | 60–70% pass |
| Builder CAIS-B | End-to-end AI system construction | 50 items 2 hr seat time MCQ + scenarios | Capstone deliverable Built, deployed, documented | SME panel + video defense | 45–55% pass |
| Operator CAIS-O | Production deployment & operations at scale | 50 items 2 hr seat time MCQ + scenarios | Production portfolio Deployed systems + ops & incident log | SME panel Portfolio review | 35–45% pass |
| Architect CAIS-A | Contribution to Standards, Frameworks, Policy, or Ecosystems | No exam | Body-of-work dossier Standards · Frameworks · Policy · Ecosystems | Standards Council Council vote (2/3) | 20–30% inducted |
*Pass / induction rate targets are planning ranges used for blueprint calibration, not policy floors. For examined tiers (P, B, O), the criterion-referenced cut score governs; the pass rate is an output. For the Architect tier, the Council’s vote-threshold governs; the induction rate is an output.
Architect eligibility, application requirements, and Council-review procedure are documented in §11 · Architect Council Review below and will be re-codified as standalone instrument ARC-2026-01 in the Standards Library.
What each tier weighs. And why.
Weights are derived from the CAIS Practice Analysis. They are not marketing choices; they are the statistical centre of gravity of each tier's observed job tasks.
Practitioner (CAIS-P)
| CBK Domain | Domain code | Weight | Items (~) |
|---|---|---|---|
| Strategic Mindset & the Age of AI | AIS-101 | 10% | 8 |
| Foundations of Generative AI | AIS-120 | 25% | 19 |
| Prompt Engineering & System Design for LLMs | AIS-130 | 25% | 18 |
| AI Agents & Agentic Workflows | AIS-140 | 15% | 11 |
| Ethics, Data & Responsible AI | AIS-160 | 15% | 11 |
| Strategy, Transformation & Business Innovation | AIS-210 | 5% | 4 |
| Innovation & Applied AI Foundations | AIS-230 | 5% | 4 |
| Total | 100% | 75 | |
Practitioner item counts are blueprint targets; minor ±1 variation per domain is permitted to satisfy form-balancing constraints. The 7 in-course builds are formative practice and are not assessed in the credential examination.
Builder (CAIS-B)
| CBK Domain | Domain code | Weight | Items (~) |
|---|---|---|---|
| Strategic Mindset & the Age of AI | AIS-101 | 5% | 3 |
| Foundations of Generative AI | AIS-120 | 15% | 7 |
| Prompt Engineering & System Design for LLMs | AIS-130 | 25% | 13 |
| AI Agents & Agentic Workflows | AIS-140 | 30% | 15 |
| Ethics, Data & Responsible AI | AIS-160 | 10% | 5 |
| Strategy, Transformation & Business Innovation | AIS-210 | 5% | 2 |
| Innovation & Applied AI Foundations | AIS-230 | 10% | 5 |
| Total | 100% | 50 | |
The Builder written exam is paired with a panel-reviewed capstone build + video defense. Both components must be passed independently to earn the credential. Capstone scoring is governed by CAP-2026-01.
Operator (CAIS-O)
| CBK Domain | Domain code | Weight | Items (~) |
|---|---|---|---|
| Strategic Mindset & the Age of AI | AIS-101 | 10% | 5 |
| Foundations of Generative AI | AIS-120 | 10% | 5 |
| Prompt Engineering & System Design for LLMs | AIS-130 | 15% | 8 |
| AI Agents & Agentic Workflows | AIS-140 | 20% | 10 |
| Ethics, Data & Responsible AI | AIS-160 | 10% | 5 |
| Strategy, Transformation & Business Innovation | AIS-210 | 20% | 10 |
| Innovation & Applied AI Foundations | AIS-230 | 15% | 7 |
| Total | 100% | 50 | |
The Operator written exam is paired with a portfolio of deployed AI systems in production and a documented operational outcomes & incident log, reviewed by an SME panel. Both the written exam and the portfolio review must be passed independently to earn the credential.
Architect (CAIS-A) — Council Review (no proctored exam)
The Architect tier is a recognition credential, not an examined credential. There is no proctored written examination and no item bank. The competency being measured is contribution to the field, and the validity evidence is the candidate’s body of work as evaluated by the GAISB Standards Council. The eligibility floor and review dimensions below replace the domain-weight table used for examined tiers.
| Review dimension | What the Council evaluates | Weight |
|---|---|---|
| Tenure | Held CAIS-O (or higher) in good standing for ≥ 12 months at the time of application | Eligibility gate |
| Standard contribution | Authored or materially contributed to a published technical or professional standard (any recognized SDO, including GAISB) | 25% |
| Framework contribution | Authored or materially shaped a framework adopted by an institution, community, or jurisdiction | 25% |
| Policy contribution | Authored or materially shaped a policy instrument adopted by a government, regulator, or institution | 25% |
| Ecosystem contribution | Built, governed, or substantially advanced an ecosystem (community, protocol, methodology) with documented adoption | 25% |
| Professional standing | No open ERB matters; references confirm professional integrity | Eligibility gate |
| Threshold for induction | Material contribution in ≥ 1 dimension | |
A candidate must demonstrate material contribution in at least one of the four contribution dimensions. The four dimensions are weighted equally; no single category is privileged. Tenure and Professional standing are eligibility gates, not weighted dimensions. Full procedure is documented in §11.
Two proctored item types. One panel-reviewed component.
The proctored written exam uses MCQ and Scenario items only. Sandboxed performance items are not used. Builds and Capstones are assessed by SME panel as a separate component.
Proctored written-exam items (Practitioner, Builder, Operator)
| Item type | Format | What it measures | Practitioner | Builder | Operator | Architect |
|---|---|---|---|---|---|---|
| MCQ-SBA Single-best-answer | 1 correct of 4 options, all plausible | Recall, recognition, concept discrimination | 70% | 60% | 55% | — |
| Scenario Multi-stage case | Stem + 3–5 linked sub-items | Applied reasoning under constraint | 30% | 40% | 45% | — |
| Total of proctored written exam | 100% | 100% | 100% | No exam | ||
Sandboxed performance items have been retired from the CAIS examination. Skill execution is assessed instead through panel-reviewed Capstones and Portfolios (see below). Scenario weight rises with tier to reflect the increasing share of applied reasoning required at Operator level. Architect tier uses no proctored item bank; the credential is awarded via Standards Council review (see §11).
Panel-reviewed component — Capstones & Portfolios
Capstones and Portfolios are not proctored exam items. They are deliverables submitted asynchronously inside Prompt Atlas, scored by an SME panel against a published rubric, and (at Builder tier) accompanied by a recorded video defense. Each examined tier carries the panel-reviewed component listed below; both the written exam and the panel-reviewed component must be passed independently to earn the credential.
| Tier | In-course builds | Credential-bearing component | Defense format |
|---|---|---|---|
| Practitioner | 7 builds (formative, completed during coursework) | None assessed in the credential exam | — |
| Builder | In-course builds (formative) | Capstone build deliverable scored by SME panel against CAP-2026-01 rubric | Recorded video defense reviewed by panel |
| Operator | In-course builds (formative) | Portfolio of deployed AI systems in production + documented operational outcomes & incident log | Panel review of portfolio + outcomes documentation |
| Architect | — | Body-of-work dossier evidencing contribution to a Standard, Framework, Policy, or Ecosystem | Standards Council review and vote (see §11) |
See Capstone Specification & Rubric (CAP-2026-01) for Builder/Operator scoring procedures, defense protocols, and inter-rater reliability requirements. Architect Council Review is documented in §11 below and in ARC-2026-01.
Item-writing standards
- Every MCQ-SBA item has exactly one correct answer defensible in the published literature or in named GAISB standards. No "best-of-bad" items.
- Distractors are plausible misconceptions drawn from observed candidate errors in piloting, not fabricated nonsense.
- Stems are positive-phrased unless negative phrasing is the pedagogical point; "EXCEPT" and "NOT" items are bolded and limited to ≤10% of any form.
- No trick items. No tricks with formatting. No tricks with grammar. Cognitive load is domain-competency load, never language load.
- Every item carries a CBK domain tag, a sub-domain tag, a Bloom-level tag, and a Practice Analysis job-task reference.
- Every item passes through: author draft → peer SME review → Fairness Subcommittee review → pilot administration → psychometric screen (difficulty, discrimination, DIF) → bank admission.
Modified Angoff.
SEM-adjusted. Publicly defensible.
The cut score is not a number we choose. It is a number we compute and commit to.
Modified Angoff is the standard statistical method for setting the passing mark on a professional exam. A panel of nine subject-matter experts reviews every question and estimates what percentage of minimally-qualified candidates should answer it correctly. Those estimates, averaged and adjusted for statistical uncertainty (the Standard Error of Measurement, or SEM), become the passing score. It is the same family of methods used by medical-licensure, engineering, and accounting exams worldwide.
For each examination form, the passing standard is established via a Modified Angoffstandard-setting study conducted before the form goes live. The methodology is criterion-referenced, panel-based, and replicable. This section is written so that an auditor or a regulator can reconstruct the process from public record.
Panel composition
A standing Standard-Setting Panel of 9 SMEs is convened for each examination cycle. Panel composition is documented in the Council Register and is constructed for balance across: CBK domain expertise, industry sector (technology, financial services, public sector, media, healthcare, education), geography (minimum three regions), and career stage (minimum two early-career, two mid-career, two senior). Panelists complete conflict-of-interest declarations. No panelist has authored items for the form under review.
The minimally-competent candidate (MCC)
Before rating any item, the panel reaches consensus on a written description of theminimally-competent candidate (MCC) at the tier under review — a hypothetical candidate whose performance is exactly at the boundary of acceptable competency. In plain English: the panel agrees on the profile of the weakest candidate who should still pass, and uses that agreed profile as the reference point for every question they rate. This description is referenced repeatedly during rating and is published in the form’s Standard-Setting Report.
Three rounds of rating
Round 1 (independent): Each panelist reviews each item independently and records the probability, expressed as a decimal from 0.00 to 1.00, that the MCC would answer that item correctly. Panelists do not see each other's ratings or any item performance data.
Round 2 (calibration): The panel discusses items with substantial rater disagreement (typically σ ≥ 0.15), then re-rates all items independently. Item difficulty and discrimination data from piloting are provided. The goal is convergence, not forced consensus.
Round 3 (confirmation): Panelists are shown their own Round 2 ratings alongside the panel mean and the impact data (projected pass rate at each candidate cut score). They may adjust their ratings one final time. The Round 3 mean, summed across items, is the unadjusted cut score.
Standard-error adjustment
The unadjusted cut score is adjusted downward by one conditional Standard Error of Measurement (SEM) at the cut score. The effect is to grant the candidate the benefit of measurement error at the decision boundary. The adjusted cut score is the operative passing standard.
Published: the MCC description, the panel's round-by-round pass-rate projections, the unadjusted and adjusted cut scores (as proportion-correct values), the SEM at the cut score, and the demographic composition of the panel. Not published: individual panelist ratings, individual item identities, or item-level rater data.
Cut-score expression
The cut score is reported in three forms: (a) the raw proportion-correct value; (b) the scaled score equivalent on the form-invariant 200–800 reporting scale; (c) a pass/fail decision with a Standard Error band disclosed to the candidate. Candidates within ±1 SEM of the cut score are noted as "cut-score band" results in the Standard-Setting Report; the pass/fail decision is nevertheless final.
Build + Capstone scoring
Panel-reviewed Build Tasks and Capstone deliverables are scored against the published rubric by a minimum of two SME reviewers, with adjudication by a third reviewer when dimension scores diverge by more than one rubric level. Inter-rater reliability is computed quarterly and published. SeeCAP-2026-01for Capstone-specific procedures.
Different forms. Same standard.
A candidate who sat a harder form should not be penalized for it. A candidate who sat an easier one should not benefit. Equating is how we enforce that.
We publish multiple versions of the exam so not everyone sees the same questions. To make that fair, we use a statistical method called equating that corrects for small differences in difficulty between versions. Every candidate is held to the same standard regardless of which version they sat.
Multiple forms of each tier’s examination are in active rotation for security and scheduling reasons. Scores across forms are made comparable through statistical equating. The procedure is documented here and re-documented in each form’s Equating Report.
Common-item non-equivalent groups (CINEG) design
CINEG is the name of the equating design used by most major professional certification programs. Each new form carries an anchor set of 20–30 items drawn from a stable internal-anchor bank. Anchor items are domain-balanced, represent the full difficulty range, and do not contribute to candidate scores. Anchor-item performance is used to place the new form onto the reference scale of the prior form via the Tucker linear-equating method (during the classical era of the bank) and subsequently via Item Response Theory (IRT) characteristic-curve equating once item-bank stability permits.
Transition to IRT
As the CAIS item bank matures, equating will transition to an Item Response Theory (IRT) framework. IRT is the modern statistical model used to score exams like the GMAT and GRE — specifically, a 3-parameter logistic (3PL) model for multiple-choice items and a Partial Credit Model (PCM) for multi-stage scenario items. The transition plan, including the minimum-sample-size thresholds and the back-equating procedure, is published in the Equating Transition Memorandum (EQT-2026-01, planned Q4 2026).
Scaled scoring
Reported scaled scores use a 200–800 range with a fixed cut score of 500 (i.e., the scale is anchored so that the passing standard on every form maps to 500). This convention decouples reporting from raw-score volatility across forms.
Fairness is a process, not a claim.
No examination can be proven unbiased. It can be subjected to a documented, defensible fairness regime.
After every exam, we check whether any question was systematically harder for one group of candidates than another — regardless of their actual competency. This check is calledDifferential Item Functioning (DIF). Questions that fail the check are pulled and reviewed. This is the same fairness procedure used by major licensing exams in medicine, law, and engineering.
Pre-administration: content fairness review
Every candidate form passes through the Fairness Subcommittee of the Ethics Review Board (ERB) before going live. The Subcommittee reviews items for: cultural specificity that is job-task-irrelevant, gendered or region-coded language, scenarios that privilege a narrow cultural frame, and illustrations or contexts that depend on assumed background unrelated to the competency tested. Flagged items are revised or retired.
Post-administration: Differential Item Functioning (DIF)
For every form administration that reaches the minimum sample threshold, items are screened for DIF using the Mantel-Haenszel procedure — a widely adopted statistical test for item bias — across candidate sub-groups defined by self-reported demographic and geographic categories. Items classified as Category C (large and significant DIF) are removed from the form and returned to the Fairness Subcommittee for review. Category B items (moderate DIF) are flagged for subject-matter-expert content review.
Published fairness metrics
The annual CAIS Psychometric Report publishes, at minimum: form-level pass rates by sub-group, item-level DIF flag counts, Subcommittee review outcomes, form-level Cronbach’s α and conditional SEMs, and any retired-item counts. The underlying micro-data is not published; the aggregate reporting is sufficient for audit.
Accommodations
Testing accommodations are available on documented request and are administered to protect the validity of the competency inference — extended time, alternative formats, adaptive interface settings, and private administration are supported. Accommodation requests are adjudicated by the CAIS Accommodations Panel under published criteria.
A credential that can't be verified isn't worth earning.
A credential that can be forged isn't worth verifying.
Security is layered from identity through item bank through administration through attestation.
Candidate identity
- Pre-enrollment identity verification via government-issued ID matched to candidate profile.
- Live proctor face-match and environment scan at session start.
- Continuous proctor monitoring throughout administration.
- Holder wallet binding: candidate Standards Council issuance is registered at enrollment and bound to all future attempts.
Item bank protection
- Item bank is never published. A cryptographic hash of the active bank is published in the Standards Library on each version update; this is the integrity anchor, not the content.
- Item exposure is tracked per item per administration. Items exceeding exposure thresholds are rotated out.
- Authors, reviewers, and psychometric staff sign bank-access agreements; all access is audit-logged.
Administration
- All administrations are delivered inside the Prompt Atlas secure examination environment.
- Browser lockdown, clipboard isolation, network policy enforcement, keystroke monitoring.
- Session video retained for Council review for 24 months. Retention policy disclosed to candidate at enrollment.
- Prohibited-conduct flags (multiple persons in frame, secondary device, external communication) trigger immediate session suspension and ERB referral.
Public Registry attestation
Upon session completion, a structured attestation record is committed to the Public Verification Registry via the GAISB Standards Council. The record contains: candidate wallet, form ID hash, administration window, seat-time, pass/fail decision, and (on pass) the credential tier. It does not contain item-level responses or video material. The attestation signature is the authoritative record; the human-readable Registry is the mirror.
A database can be edited. A signed transaction on a Public Registry cannot. Committing administration records in the Public Registry provides three properties no hosted database can: independent third-party verifiability, institutional permanence that survives GAISB ceasing operations, and cryptographic proof-of-administration that is useful in evidentiary and regulatory contexts.
Suspected misconduct
Suspected misconduct is referred to the Ethics Review Board for adjudication under the published Sanction Guidelines Matrix. Established misconduct — impersonation, item exfiltration, coordinated cheating — results in: credential revocation (if issued), a public in the Public Registry revocation instruction with reason code, and a five-year re-sit ban. The revocation is permanent; the registry retains both issuance and revocation instructions in perpetuity.
When a candidate fails. What happens next.
Failure is a data point, not a verdict.
- Waiting period: 90 days between attempts, regardless of tier.
- Attempt cap: A maximum of three attempts per 12-month rolling window.
- After three failed attempts: mandatory remedial pathway inside Prompt Atlas — directed CBK review, Faculty-reviewed practice Builds, and a documented readiness sign-off before a fourth attempt is permitted.
- Form refresh: A candidate will not sit the same form twice; form rotation is enforced at administration.
- Code of Conduct bans: Where misconduct is established, re-sit is barred for the full ban period (typically five years for cheating; ten years or permanent for impersonation). See Code of Professional Conduct.
- Fee relief: Candidates who fail once at the cut-score band may elect a discounted re-sit fee as a policy matter; this is a Standards Council discretion, not a right.
How this blueprint can be challenged. How it can be audited.
A published methodology you can't contest is a press release, not a standard.
Public comment
This blueprint is open for structured public comment for 180 days from the publication of each revision. Comments are submitted through the Standards Library public-comment form, received on the public record, and disposed of by the Examination Working Group with a published disposition matrix (accepted, accepted-with-modification, rejected-with-reason, deferred). Material changes trigger a new 180-day window.
Regulator audit access
National competent authorities and recognized accreditation bodies may request Regulator Audit Access under the Regulator Engagement Office charter. Access includes, under NDA: the full Standard-Setting Report for specified forms, panel composition with conflict-of-interest declarations, item-bank integrity artefacts (not items), Fairness Subcommittee minutes, and the Psychometric Report appendices. Access is free of charge. See Cryptographic Auditability.
Employer due diligence
Employers conducting due diligence on the credential may request a Blueprint Briefing through the Employer Recognition Network. Briefings cover the public blueprint in Q&A format and map CAIS tiers to hiring bands. See For Employers.
No exam.
Body of work.
Council vote.
The Architect tier is awarded for what a candidate has shipped into the field, not for what they can recall in a test booth.
Architect (CAIS-A) is a recognition credential. To be considered, a candidate must (1) hold CAIS-Operator in good standing for at least 12 months, and (2) have made a documented contribution to a Standard, Framework, Policy, or Ecosystem. Eligible candidates apply directly to the GAISB Standards Council, which reviews the application and decides by vote. There is no proctored examination and no item bank.
Eligibility
- Tenure. CAIS-Operator (or higher) held in good standing for ≥ 12 months at the date of application.
- Contribution. Material contribution to at least one of: a published Standard (technical or professional), a Framework adopted by an institution or community, a Policy instrument adopted by a government or regulator, or an Ecosystem (community, protocol, methodology) with documented adoption.
- Professional standing. No open Ethics Review Board matters; references confirm professional integrity.
Application requirements
- Body-of-work dossier. A structured submission identifying the Standard, Framework, Policy, or Ecosystem; the candidate’s specific contribution; the date and venue of publication or adoption; and any third-party citations, adoption metrics, or institutional attestations.
- Contribution statement. A written statement (≤ 2,000 words) describing the contribution, the problem it addressed, the candidate’s specific role, and the documented impact.
- Two professional references. At least one from a current CAIS-A holder (waived during charter phase, see Founding Cohort below).
- Application fee. Non-refundable review fee per the published fee schedule.
Council review process
The full GAISB Standards Council reviews each application. The dossier is circulated to seated Council members along with the references and contribution statement. Council members may request additional materials or a clarifying interview with the candidate. After deliberation, the Council votes by secret ballot.
Decision threshold
Induction requires a two-thirds (≥ 2/3) majority of seated Council members. Decisions are recorded in the Council Register with a brief written rationale. Candidates not inducted receive written feedback and may reapply after a 12-month cooling-off period.
Cadence
Architect applications are reviewed in annual cohorts with a single application window per year. The annual cohort is published in the Public Verification Registry on induction.
Founding Cohort — Charter Phase only
Because there are no existing CAIS-A holders to provide references at launch, the Founding Cohort of Architects is seated directly by the GAISB Standards Council Charter Members under the Founding Charter authority. Founding Cohort members are inducted in two windows during charter phase and are subject to the same eligibility and contribution standards as later cohorts; the only departure is the reference-letter waiver. The Founding Cohort is published transparently in the Registry with the basis of induction recorded for each member.
At Architect altitude, the construct being measured is contribution to the field, not capability under timed conditions. The validity evidence is the body of work itself — it is in the world, citable, and verifiable by third parties. This is consistent with how senior recognition credentials are awarded by IEEE, ACM, the Royal Academy of Engineering, and other mature professional bodies. Auditors and regulators inspect the Council process and the documented contributions, not an item bank.
The Architect Council Review process will be re-codified as standalone instrument ARC-2026-01 in the Standards Library on its first scheduled refresh.
Document Control
A methodology your regulator can cite.
A cut score your auditor can reconstruct.
Every CAIS examination is scored against a Modified Angoff cut score set by a 9-member Standard-Setting Panel. Every administration is proctored inside Prompt Atlasand attested in the Public Verification Registry. Every form is equated. Every item is screened for fairness. The blueprint is public. The item bank is not.
Authored by GAISB · Earned inside Prompt Atlas · Proven by Real Builds