CLI Reference

mcpbr provides a command-line interface for running evaluations and managing configurations.

mcpbr --help
mcpbr run --help

Commands Overview

CommandDescription
mcpbr runRun benchmark evaluation with configured MCP server
mcpbr initGenerate an example configuration file
mcpbr configManage configuration templates
mcpbr modelsList supported models for evaluation
mcpbr providersList available model providers
mcpbr harnessesList available agent harnesses
mcpbr benchmarksList available benchmarks
mcpbr cleanupRemove orphaned mcpbr Docker resources

mcpbr run

mcpbr run -c CONFIG [OPTIONS]
OptionShortDescription
--config PATH-cPath to YAML configuration file (required)
--model TEXT-mOverride model from config
--benchmark TEXT-bOverride benchmark from config
--sample INTEGER-nOverride sample size from config
--mcp-only-MRun only MCP evaluation (skip baseline)
--baseline-only-BRun only baseline evaluation (skip MCP)
--output PATH-oPath to save JSON results
--report PATH-rPath to save Markdown report
--output-yaml PATH-yPath to save YAML results
--verbose-vVerbose output (-vv for detailed)
--task TEXT-tRun specific task(s) by instance_id
--filter-difficultyFilter by difficulty (repeatable)
--filter-categoryFilter by category (repeatable)

Examples

# Full evaluation with verbose output
mcpbr run -c config.yaml -v

# MCP-only with specific tasks
mcpbr run -c config.yaml -M -t django__django-11099

# Override model and sample size
mcpbr run -c config.yaml -m opus -n 50

# Save all output formats
mcpbr run -c config.yaml -o results.json -y results.yaml -r report.md

mcpbr init

mcpbr init [OPTIONS]
OptionShortDescription
--output PATH-oPath to write config (default: mcpbr.yaml)
--template TEXT-tTemplate ID to use
--interactive-iInteractive template selection

mcpbr cleanup

Remove orphaned mcpbr Docker resources (containers, volumes, networks).

# Preview what would be removed
mcpbr cleanup --dry-run

# Remove with confirmation
mcpbr cleanup

# Force remove all immediately
mcpbr cleanup -f

Exit Codes

CodeMeaning
0Success — at least one task resolved
1Fatal error — config invalid, Docker unavailable, API error
2No resolutions — evaluation ran but 0% success
3Nothing evaluated — all tasks cached/skipped
130Interrupted by user (Ctrl+C)

Environment Variables

VariableRequiredDescription
ANTHROPIC_API_KEYYesAnthropic API key for Claude models

Next Steps

Created by Grey Newell