Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Commit Queue Gate

Pattern

A named solution to a recurring problem.

The automated landing gate between human review approval and a change reaching chromium/src: the commit queue accepts a CL only after ownership, presubmit, trybot, and tree-status checks agree that it can merge.

Review approval in Chromium isn’t a merge button. A change list (CL) can have the right OWNERS LGTM (looks good to me), a clear description, and a contributor ready to move on, yet still stop before it reaches the source tree. That stop is not a social veto. It is the commit queue (CQ), the automated gate that asks a narrower question than review did: can this exact patch set merge against the current tree without breaking the configured pre-submit signal?

Context

In Chromium’s landing path, this gate sits underneath review and governance. OWNERS File Governance decides which human reviewers have authority over a CL. Cross-Timezone Review Etiquette helps the contributor and reviewer reach an LGTM across the time gap. CQ comes next: Gerrit labels, presubmit checks, trybot builders (the continuous-integration builders run before landing), tree status, and submission policy turn the reviewed change into a landed commit.

The distinction matters because Chromium’s source tree is not a private branch. Hundreds of contributors land into the same trunk, and every downstream consumer inherits whatever trunk produces. A CIO estimating upstream contribution cost, a downstream vendor trying to land a fix, or an AI coding agent preparing a Chromium CL needs the same rule: human review is necessary, but it is not sufficient. The commit queue is the merge authority.

Problem

Human review can decide that a change is correct in context, but it cannot prove the change still builds and passes tests against the current trunk. A reviewer can LGTM a patch in the morning, another contributor can land a conflicting dependency at noon, and the original CL can be wrong by the time it would merge. A contributor can also miss a presubmit failure, rely on a manually selected trybot subset, or submit while the tree is closed after a breakage.

The recurring problem is how to let contributors land quickly without turning the shared tree into a negotiation surface. If every CL required a human release engineer to re-check ownership, test coverage, tree state, and submit order, the project would stall. If every contributor could merge after review, the project would import build failures at the speed of human optimism. Chromium needs a gate that is fast, mechanical, auditable, and strict enough to stop a reviewed change that is not safe to land right now.

Forces

  • Review approval and merge safety are different facts. An OWNERS LGTM says the change is acceptable to the directory’s authority; it doesn’t say the current patch set still passes all required checks.
  • The tree’s state changes under every CL. A patch that passed trybots yesterday may conflict with a landed change today, so the gate has to evaluate the current patch set against the current trunk.
  • Coverage is expensive. Running every builder for every CL would waste infrastructure capacity; running too small a subset lets breakage through.
  • Flake handling has to be bounded. A failing trybot may be an intermittent test, a real regression, or an infrastructure issue. The gate needs retry rules that don’t turn a flaky test into either a permanent block or an ignored signal.
  • Exceptional changes need a larger gate. Broad refactors, toolchain changes, and infrastructure-sensitive CLs need more builder coverage than ordinary CLs, but that larger coverage cannot become the default for every patch.

Solution

Place an automated commit queue between review approval and merge, and require every ordinary Chromium CL to clear it before it lands. The queue reads Gerrit’s state for the change, checks that the required human approvals are present, runs the configured presubmit and trybot set, respects the current tree status, and submits only the patch set that passed. A contributor doesn’t merge the reviewed patch directly. They ask CQ to prove the patch is landable.

The mechanism has two common entry modes. CQ Dry Run (Commit-Queue +1) runs the queue’s checks without submitting the CL. It is the contributor’s rehearsal: the patch set is tested in the same mechanical regime that submit will later use, but the result is evidence rather than a merge. Submit to CQ (Commit-Queue +2) asks the queue to land the patch if the gate clears. The distinction is operationally important. A dry run that passes is encouraging, but it is not a landing decision; the submit pass still has to evaluate the patch set at the moment it enters the merge path.

The queue selects trybots from the builders configured for the files and project area the CL touches. Presubmit checks run first; then the queue runs the selected builders, retries according to its flake policy, and refuses submission when the failures are not within the retryable bounds. If the tree is closed or throttled, the queue waits or rejects according to the tree-state policy. The contributor sees the queue’s decision in Gerrit, attached to the patch set rather than held in an informal chat or private dashboard.

Mega-CQ is the exceptional form. It runs a much broader builder set for changes whose risk is wider than the ordinary affected-file selection captures: build-system changes, large refactors, dependency rolls, or other CLs that might break a platform the normal CQ subset would not exercise. Mega-CQ costs more infrastructure time and more wall-clock time, so it is not the default. It exists because Chromium’s ordinary queue is deliberately sized for the common case, and the common case would be too weak for changes that alter the build or test surface itself.

How It Plays Out

A downstream enterprise-browser vendor has a small fix for a WebView2 integration issue in a Chromium directory they do not normally touch. The contributor uploads the CL, Gerrit computes the required OWNERS set, and a Google reviewer LGTMs the patch. The contributor runs CQ Dry Run before asking for submission. One Linux trybot fails on a presubmit rule the contributor had not run locally. The fix takes ten minutes: update the affected test expectation, upload a new patch set, rerun CQ Dry Run. Only after the dry run passes does the contributor apply Commit-Queue +2. The CL lands because the queue checked facts that review did not.

A platform team lands an implementation CL behind a Blink Intent. The Intent to Ship Pipeline has its own governance record, but the code still enters the source tree through Gerrit and CQ. The CL has three relevant states: approved by OWNERS, accepted by CQ, and later eligible for channel progression. Collapsing those states is how a team tells stakeholders that a feature has shipped because the code was approved. The queue’s role is narrower and earlier: it proves that this reviewed implementation can merge into trunk today.

A build-system owner changes a template that affects several platform builders. The ordinary affected-file trybot selection is not enough because the blast radius is the builder graph itself. The owner runs Mega-CQ, accepts the longer wait, and catches a Windows-only build failure the normal queue would not have exercised. The extra cost is the point. A change that alters the test or build surface should pay a larger validation bill before landing than a localized source change does.

Consequences

Benefits. The commit queue gives Chromium a fast, legible landing rule. Contributors know that the merge decision is not a hidden judgment by a release engineer; it is the recorded result of Gerrit labels, ownership approval, presubmit checks, trybot outcomes, and tree status. A downstream organization can estimate upstream landing cost because the gate is visible and repeatable. An AI coding agent can explain why a CL is not ready to land without inventing a social reason: one required label is missing, the dry run has not passed, the tree is closed, or a trybot is red.

The queue also narrows the blast radius of human timing. A reviewer can LGTM before lunch and the contributor can submit hours later, but CQ evaluates the current patch set against the current tree. That re-check catches the conflict, stale patch, missing dependency, or late presubmit failure that human review could not guarantee away. The Tree Sheriff then handles the post-submit side if a change still breaks the tree. The two gates divide the work: CQ blocks what it can know before merge; the sheriff reverts what only appears after merge.

Liabilities. The gate creates latency and opacity for contributors who are new to Chromium. A reviewed CL that sits in CQ for an hour feels stalled, even when the queue is doing exactly what it is supposed to do. Trybot failures require logs from platforms the contributor may not have locally, and the right response depends on the failure: adjust a test, retry a known flake, or ask an owner whether the failure is pre-existing. The queue’s result is mechanical, but interpreting it still takes project knowledge.

The queue can also teach the wrong habit if contributors treat passing CQ as proof that the change is safe in every sense. CQ’s builder subset is finite. It does not replace security review, performance review, API-owner approval, or the post-submit monitoring that Perf Sheriff and Tree Sheriff perform. A green queue result means the patch cleared the configured pre-submit gate. It doesn’t mean the change will not regress a benchmark, break a downstream fork’s private configuration, or violate a governance gate outside Gerrit.

Notes for Agent Context

Before marking a Chromium CL ready to land, check the Gerrit state for the exact patch set: required OWNERS approval, presubmit status, CQ Dry Run result, Commit-Queue +2 submit state, and current tree status. Do not treat a reviewer LGTM or a passing local test run as equivalent to CQ acceptance. Do not treat CQ Dry Run as a landing event; it is a rehearsal, and the submit pass still has to clear. When CQ fails, surface the failing builder, presubmit, or tree-state reason to the human and avoid retry advice unless the failure is explicitly classified as flake or infrastructure. Recommend Mega-CQ only for broad build, toolchain, dependency-roll, or cross-platform refactor changes whose risk exceeds ordinary affected-file trybot selection.

Sources

The current Chromium commit queue mechanics are documented in the project’s docs/infra/cq.md, which defines the queue’s purpose, dry-run and submit modes, builder selection, flake handling, and Mega-CQ distinction. The contributor-facing workflow appears in docs/contributing.md and docs/commit_checklist.md, which tell a first-time contributor how review approval, CQ Dry Run, and Submit to CQ fit together. Manual trybot behavior is covered by docs/infra/trybot_usage.md, the source for the distinction between an ad hoc tryjob and the CQ-selected builder set.

The historical Commit Queue design document records the original scaling motive: moving from human verification toward automated submission for a project already landing roughly one hundred commits per day. The 2018 chromium-dev LUCI migration PSA is the public record that CQ builders moved to LUCI, which is why current CQ behavior is read through the LUCI builder and tryjob vocabulary rather than the older Rietveld-era flow.

Technical Drill-Down