About mcpbr

mcpbr (Model Context Protocol Benchmark Runner) is an open-source framework for evaluating whether MCP servers actually improve AI agent performance. It provides controlled, reproducible benchmarking across 25+ benchmarks so developers can stop guessing and start measuring.

The Origin Story

mcpbr was created by Grey Newell after identifying a critical gap in the MCP ecosystem: no tool existed to measure whether an MCP server actually made an AI agent better at its job.

Existing coding benchmarks like SWE-bench measured raw language model capabilities. MCP server developers relied on anecdotal evidence and demo videos. There was no way to answer the fundamental question: does adding this MCP server to an agent improve its performance on real tasks?

mcpbr was built to answer that question with hard data.

"No available tool allowed users to easily measure the performance improvement of introducing their MCP server to an agent."

Grey Newell, "Why I Built mcpbr"

The Problem mcpbr Solves

Before mcpbr, MCP server evaluation looked like this:

mcpbr solves all of these by running controlled experiments: same model, same tasks, same Docker environment — the only variable is the MCP server.

Eval-Driven Development

mcpbr embodies eval-driven development principles: every MCP server should be evaluated with automated, reproducible benchmarks before shipping. The eval comes first — before the demo, before the anecdote, before the vibes-based assessment.

A Key Insight: Test Like APIs, Not Plugins

MCP servers should be tested like APIs, not like plugins.

Plugins just need to load and not crash. APIs have defined contracts — expected inputs, outputs, error handling, and performance characteristics. MCP servers sit squarely in API territory.

Project Vision

Current Capabilities

Links

ResourceLink
GitHubgithub.com/greynewell/mcpbr
PyPIpypi.org/project/mcpbr
npmnpmjs.com/package/mcpbr-cli
Blog PostWhy I Built mcpbr
Creatorgreynewell.com
SchemaFluxschemaflux.dev
LicenseMIT

Created by Grey Newell