Claude Code in CI/CD Pipelines

Your CI pipeline script runs claude "Analyze this pull request for security issues" and the job hangs indefinitely. The logs show Claude Code is waiting for interactive input. You cancel the job, try redirecting stdin from /dev/null, and it still does not work. You set an environment variable called CLAUDE_HEADLESS=true. Nothing changes.

The fix is one flag: -p.

claude -p "Analyze this pull request for security issues"

The -p flag (short for --print) runs Claude Code in non-interactive mode. It processes the prompt, outputs the result to stdout, and exits. No waiting for user input. No interactive terminal. This is how Claude Code works in CI/CD pipelines, pre-commit hooks, and any automated script.

Exam Connection

Exam Question 10 directly tests this. The scenario describes a pipeline that hangs because -p is missing. The correct answer is adding -p. The other options (CLAUDE_HEADLESS, stdin redirect, --batch) are not real Claude Code features. Know the flag.

Exam Question 11 tests a related concept: when to use the Message Batches API (50% cheaper, up to 24 hours processing) vs real-time API calls. Blocking pre-merge checks need real-time. Overnight technical debt reports are a batch job. Task Statement 3.6 covers the full CI integration picture.

The -p Flag: Non-Interactive Mode

The -p flag transforms Claude Code from an interactive assistant into a scriptable command-line tool. Everything you can do interactively, you can do non-interactively with -p:

# Ask a question about the codebase
claude -p "What does the auth module do?"

# Run a task with tool access
claude -p "Run the test suite and fix any failures" \
  --allowedTools "Bash,Read,Edit"

# Create a commit from staged changes
claude -p "Look at my staged changes and create an appropriate commit" \
  --allowedTools "Bash(git diff *),Bash(git log *),Bash(git commit *)"

The --allowedTools flag is critical in CI. Interactive Claude Code asks for permission before running tools. In a pipeline, nobody is there to approve. You pre-approve the specific tools Claude needs:

Tool pattern	What it allows
`Read`	Read any file
`Edit`	Edit any file
`Bash`	Run any shell command
`Bash(git diff *)`	Only commands starting with `git diff`
`Bash(npm test *)`	Only commands starting with `npm test`

The trailing * enables prefix matching. Bash(git diff *) allows git diff --staged, git diff HEAD~1, and any other command starting with git diff. The space before the * matters: Bash(git diff*) without the space would also match git diff-index.

Bare Mode for Reproducible CI

Add --bare to skip auto-discovery of hooks, skills, plugins, MCP servers, and CLAUDE.md files:

claude --bare -p "Summarize this file" --allowedTools "Read"

In bare mode, only the flags you pass explicitly take effect. A hook in a teammate's ~/.claude or an MCP server in the project's .mcp.json will not run. This makes CI runs reproducible across machines.

If you need project context in bare mode, pass it explicitly:

claude --bare -p "Review this PR for security issues" \
  --append-system-prompt-file ./CLAUDE.md \
  --allowedTools "Read,Bash(git diff *)"

Structured Output with JSON Schema

Plain text output works for human-readable reports. But CI pipelines need machine-parseable data: structured findings that a script can turn into PR comments, Slack notifications, or issue tracker entries.

Basic JSON Output

Use --output-format json to get structured metadata with the response:

claude -p "Summarize this project" --output-format json

The response includes:

{
  "session_id": "abc-123",
  "result": "This project is a REST API built with FastAPI...",
  "usage": {
    "input_tokens": 15420,
    "output_tokens": 892
  }
}

The result field contains Claude's free-text response. The surrounding fields provide metadata for tracking costs, resuming sessions, and debugging.

Schema-Constrained Output

For CI, you typically need output in a specific structure. Use --json-schema to enforce a schema:

claude -p "Review src/auth.py for security issues" \
  --output-format json \
  --json-schema '{
    "type": "object",
    "properties": {
      "findings": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
            "file": {"type": "string"},
            "line": {"type": "integer"},
            "description": {"type": "string"},
            "suggestion": {"type": "string"}
          },
          "required": ["severity", "file", "line", "description"]
        }
      },
      "summary": {"type": "string"}
    },
    "required": ["findings", "summary"]
  }'

The response now has a structured_output field that conforms to your schema:

{
  "session_id": "abc-123",
  "result": "I found 3 security issues...",
  "structured_output": {
    "findings": [
      {
        "severity": "high",
        "file": "src/auth.py",
        "line": 45,
        "description": "SQL query built with string concatenation",
        "suggestion": "Use parameterized queries to prevent SQL injection"
      }
    ],
    "summary": "1 high-severity SQL injection risk found"
  },
  "usage": { ... }
}

Use jq to extract the structured output in a pipeline:

claude -p "Review this PR" \
  --output-format json \
  --json-schema "$SCHEMA" \
  | jq '.structured_output.findings[] | select(.severity == "critical")'

CLAUDE.md as CI Context

In Lesson 1, you learned that CLAUDE.md provides project conventions to every interactive session. The same mechanism works in CI. When Claude Code runs with -p (without --bare), it reads the same CLAUDE.md hierarchy: user-level, project-level, and directory-level.

For CI-specific context, add a section to your project CLAUDE.md or use --append-system-prompt:

# CI Review Standards

## What to report

- SQL injection, XSS, command injection, and other OWASP Top 10 vulnerabilities
- Race conditions in concurrent code
- Missing input validation at API boundaries
- Hardcoded secrets or credentials

## What NOT to report

- Minor style issues (handled by linters)
- TODO comments (tracked separately)
- Missing documentation (separate review)

## Test generation standards

- Use pytest with fixtures from conftest.py
- Test both success and failure paths
- Include edge cases: empty input, unicode, maximum length
- Do not duplicate scenarios already covered in existing test files

This is the CI equivalent of briefing a human reviewer. Without it, Claude reviews everything and produces noise. With it, Claude focuses on what matters and skips what your linters already handle.

Providing Existing Test Files

When using Claude Code for test generation in CI, provide existing test files so it does not suggest scenarios you already cover:

claude -p "Generate additional test cases for src/auth.py. \
  Here are the existing tests: @tests/test_auth.py \
  Only suggest tests for scenarios NOT already covered." \
  --allowedTools "Read"

This avoids the common problem of AI-generated tests duplicating your existing suite.

Avoiding Duplicate PR Comments

When a CI review runs on every push to a PR, subsequent runs may flag the same issues. Your PR ends up with 15 identical comments about the same SQL injection risk. This erodes developer trust in the automated review.

The solution: feed prior review findings back into the next run.

# Step 1: Fetch existing review comments from this PR
prior_comments=$(gh pr view "$PR_NUMBER" --json comments \
  --jq '.comments[] | select(.author.login == "claude-bot") | .body')

# Step 2: Run the review with prior findings in context
claude -p "Review this PR for security issues.

PRIOR REVIEW FINDINGS (already reported):
$prior_comments

Only report NEW issues or issues from the prior review that are
still unresolved in the current code. Do not duplicate findings
that have already been reported and are still present." \
  --output-format json \
  --json-schema "$SCHEMA"

The prompt explicitly tells Claude what has already been reported. Claude then only flags new issues or issues from the prior review that the developer has not yet addressed.

Session Context Isolation

In Lesson 5, you learned about the interview pattern and test-driven iteration. In CI, there is a related principle: the session that generated code should not review its own code.

Why? The generator session retains its reasoning context. It made deliberate decisions about tradeoffs, shortcuts, and assumptions. When you ask that same session to review the code, it is reviewing its own decisions. It is less likely to question choices it just made.

An independent review instance has no prior context. It sees only the code, not the reasoning behind it. This makes it more likely to catch:

Assumptions that were never validated
Edge cases the generator did not consider
Inconsistencies with project patterns the generator was not aware of

In CI, this isolation happens naturally. The pipeline spawns a fresh Claude Code process for each job. The review job has no memory of the generation job. This is the correct architecture.

If you are running both generation and review in the same pipeline, use separate steps with separate sessions:

steps:
  - name: Generate code
    run: claude -p "Implement the feature from issue #${{ github.event.issue.number }}" --allowedTools "Read,Edit,Bash"

  - name: Review generated code
    run: claude -p "Review the changes in this PR for bugs, security issues, and consistency with project patterns" --output-format json

Each claude -p invocation is a fresh session. The review step has no access to the generation step's reasoning context.

Building a GitHub Actions Workflow

Here is a complete workflow that runs Claude Code as a PR reviewer. It triggers when someone comments @claude review on a pull request, runs a structured review, and posts the findings as a PR comment.

# .github/workflows/claude-review.yml
name: Claude Code Review

on:
  issue_comment:
    types: [created]

jobs:
  review:
    if: |
      github.event.issue.pull_request &&
      contains(github.event.comment.body, '@claude review')
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR diff
        id: diff
        run: |
          gh pr diff ${{ github.event.issue.number }} > pr_diff.txt
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Run Claude Code review
        id: review
        run: |
          claude -p "Review the following PR diff for bugs, security
          issues, and deviations from project conventions.

          $(cat pr_diff.txt)

          Focus on actionable findings. Skip style issues handled by
          linters." \
            --output-format json \
            --allowedTools "Read,Bash(git log *),Bash(git show *)" \
            > review_output.json
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Post review comment
        run: |
          REVIEW=$(jq -r '.result' review_output.json)
          gh pr comment ${{ github.event.issue.number }} \
            --body "## Claude Code Review

          $REVIEW

          ---
          *Automated review by Claude Code*"
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Using the Official Claude Code Action

For a more streamlined setup, the official anthropics/claude-code-action handles trigger detection, context gathering, and comment posting automatically:

# .github/workflows/claude.yml
name: Claude Code

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]

jobs:
  claude:
    if: contains(github.event.comment.body, '@claude')
    runs-on: ubuntu-latest

    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

This minimal configuration responds to @claude mentions in PR and issue comments. Claude reads the PR context, analyzes the code, and responds directly in the conversation.

For automated reviews on every PR (no trigger needed):

name: Claude PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: "Review this pull request for code quality, security, and correctness. Post findings as review comments."
          claude_args: "--max-turns 5"

Real-Time vs Batch: Choosing the Right API

Not every CI workflow needs real-time responses. The Message Batches API offers 50% cost savings but with processing times up to 24 hours:

Workflow type	API choice	Why
Pre-merge check (blocking)	Real-time (`claude -p`)	Developers wait for results before merging
Nightly technical debt report	Batch API	No urgency; 50% cost savings
PR review comment	Real-time (`claude -p`)	Developers expect prompt feedback
Weekly code health dashboard	Batch API	Runs overnight, report ready by morning

The decision rule: if a human is waiting for the result before they can take their next action, use real-time. If the result can wait hours, use batch for the cost savings.

Try With AI

Exercise 1: Run Claude Code Non-Interactively (Apply)

Run Claude Code with the -p flag locally to see how non-interactive mode works before moving to CI.

Open your terminal in any project directory and run:

claude -p "List the 5 most recently modified files in this directory and explain what each one does" --allowedTools "Read,Bash(ls *),Bash(stat *)"

Then try structured output:

claude -p "List all functions in the main source file" \
  --output-format json \
  --json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"line":{"type":"integer"},"description":{"type":"string"}},"required":["name","line"]}}},"required":["functions"]}'

Pipe the result through jq to extract just the function names:

claude -p "List all functions in the main source file" \
  --output-format json \
  --json-schema '...' \
  | jq '.structured_output.functions[].name'

What you're learning: The -p flag is the foundation of all CI integration. Without it, Claude Code waits for interactive input and your pipeline hangs. With it, Claude processes the prompt, outputs the result, and exits. The --output-format json and --json-schema flags let you enforce a specific output structure so downstream scripts can parse the result programmatically. This is exactly what Exam Question 10 tests.

Exercise 2: Write CI Review Criteria in CLAUDE.md (Configure)

Create or update your project's CLAUDE.md to include CI-specific review criteria. Start a Claude Code session and paste:

I want to add CI review standards to my CLAUDE.md. Interview me about:
1. What types of issues should the CI review flag? (security, bugs, performance?)
2. What should it skip? (style issues handled by linters? TODO comments?)
3. What testing standards should generated tests follow?
4. What frameworks and fixtures are available?

After the interview, add a "CI Review Standards" section to my CLAUDE.md
with the answers.

After the section is added, test it by running:

claude -p "Review src/main.py using the review standards in CLAUDE.md" --allowedTools "Read"

Compare the output to a run without CLAUDE.md (use --bare):

claude --bare -p "Review src/main.py" --allowedTools "Read"

What you're learning: CLAUDE.md is how you give CI-invoked Claude Code the same project context that a human reviewer would have. Without it, Claude reviews everything generically. With it, Claude focuses on what your team cares about and skips what your existing tools already handle. The comparison between --bare and normal mode makes the difference visible.

Exercise 3: Build a Review Workflow (Create)

Create a GitHub Actions workflow file that runs Claude Code on pull requests. Start a Claude Code session and paste:

Create a file at .github/workflows/claude-review.yml that:

1. Triggers when someone comments "@claude review" on a PR
2. Checks out the repository
3. Gets the PR diff using gh pr diff
4. Runs claude -p to review the diff with --output-format json
5. Posts the review as a PR comment

Use these permissions: contents read, pull-requests write.
The ANTHROPIC_API_KEY comes from secrets.

Also create a second workflow at .github/workflows/claude-tests.yml that:
1. Triggers on every PR push (opened, synchronize)
2. Runs claude -p to suggest new test cases for changed files
3. Includes existing test files in context to avoid duplicate suggestions
4. Posts suggestions as a PR comment

Review the generated workflows. Check that both use -p for non-interactive mode and that the test generation workflow provides existing tests in context.

What you're learning: Building CI workflows with Claude Code requires combining several concepts: the -p flag for non-interactive mode, --allowedTools for permission management, --output-format json for structured output, and CLAUDE.md for project context. The test generation workflow specifically demonstrates duplicate avoidance: by providing existing tests in context, Claude only suggests novel test scenarios. This is a practical application of Exam Task 3.6.

The -p Flag: Non-Interactive Mode​

Bare Mode for Reproducible CI​

Structured Output with JSON Schema​

Basic JSON Output​

Schema-Constrained Output​

CLAUDE.md as CI Context​

Providing Existing Test Files​

Avoiding Duplicate PR Comments​

Session Context Isolation​

Building a GitHub Actions Workflow​

Using the Official Claude Code Action​

Real-Time vs Batch: Choosing the Right API​

Try With AI​

Flashcards Study Aid​