Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Surface Aggregation

Concept

Vocabulary that names a phenomenon.

The Viz display compositor’s model for combining the CompositorFrames submitted by the browser’s own UI and every sandboxed renderer into the single image presented to the screen, where each client’s surface is embedded into the scene without any client trusting another’s contents.

Where the name comes from

Viz is the Chromium project’s contraction of visuals: the components/viz/ subsystem that holds, in the project’s own words, “the client library and service implementations for compositing and gpu presentation.” The service half runs in the GPU process and is the display compositor. A surface is the unit of compositing it works in: one client’s submitted frame, identified and embeddable by other clients. Aggregation is the step that walks the tree of embedded surfaces and flattens them into one frame to draw. The trace category that surfaces all of this is viz, distinct from the cc category that names the per-renderer compositor frame scheduling upstream of it.

What It Is

Compositor Frame Scheduling ends when one renderer’s compositor thread submits a CompositorFrame across the process boundary. Surface aggregation is what the GPU process does next: the browser’s own UI is one frame source, each renderer is another, and the display compositor combines those sources into the one image the platform presents. The display compositor “uses Gpu or software to composite a set of frames, from multiple clients, into a single backing store for display to the user.”

Each client submits frames through a CompositorFrameSink, a Mojo interface that carries a CompositorFrame from the client to the GPU process. The browser’s UI compositor has a sink; each renderer has a sink; an embedded <iframe> running in its own renderer has its own. A client doesn’t draw to the screen and doesn’t see any other client’s frame. It hands its frame to its sink and stops there.

The data model that lets those frames combine is the surface. A surface is one client’s most recently submitted CompositorFrame, held in the GPU process and addressable by an identifier so that other clients can embed it. The identifier is a SurfaceId, which is the pair of a FrameSinkId and a LocalSurfaceId:

  • The FrameSinkId is derived from the embedded client. It is stable for the lifetime of that client’s sink: the same renderer keeps the same FrameSinkId across many frames.
  • The LocalSurfaceId comes from the embedder. It changes when the embedded content’s size or other surface-invalidating property changes, so a resize keeps the FrameSinkId and mints a new LocalSurfaceId. The pair lets the embedder say precisely which version of the embedded surface its own frame expects, and lets the compositor hold the old surface until the new one is ready rather than flashing a half-resized frame.

The pieces fit together in the GPU process. The SurfaceManager holds the live surfaces and the embedding relationships between them. When the Display decides it is time to produce a frame, the SurfaceAggregator walks the surface-embedding tree from the root surface (the browser window), follows each embedded SurfaceId to the surface it names, and flattens the tree into one aggregated CompositorFrame. viz::DirectRenderer then draws that aggregated frame through Skia or the GPU and presents it with SwapBuffers or, where available, by handing individual quads to hardware overlay planes.

flowchart TD
  UI[Browser UI compositor] -->|CompositorFrameSink| SM[SurfaceManager]
  R1[Renderer A] -->|CompositorFrameSink| SM
  R2[Renderer B: cross-site iframe] -->|CompositorFrameSink| SM
  SM --> AGG[SurfaceAggregator]
  AGG -->|one aggregated frame| DR[DirectRenderer]
  DR -->|SwapBuffers / overlays| Screen[Screen]

The load-bearing idea is that aggregation embeds untrusting clients. Surfaces exist, in the design document’s framing, “to allow graphical embedding of heterogeneous untrusting clients efficiently into one scene.” The browser’s own UI compositor and each sandboxed renderer are separate Viz clients. The aggregator combines their frames into one scene, but no client supplies another’s pixels, reads another’s surface, or learns another’s content. Embedding is a reference by SurfaceId, resolved entirely inside the trusted GPU process; the embedded client never hands its bitmap to the embedder.

Why It Matters

The Rendering Pipeline names Display as its seventh stage and says it “composites the rastered layers into a single back buffer in the GPU process … and swaps the back buffer to the screen,” then stops. That level of detail maps the pipeline, but it is not enough for three concrete questions: why a Display-stage frame dropped, why one renderer cannot read another’s pixels, and what an AI coding agent may assume about embedded content.

The first is debugging a dropped-frame trace that attributes the loss to Display rather than to any main-thread stage. Without the surface vocabulary, “the GPU process is dropping frames” is a dead end. With it, the trace becomes legible: the viz track shows whether the SurfaceAggregator is waiting on a surface whose LocalSurfaceId the embedder expects but the embedded client hasn’t yet submitted, whether aggregation itself is overrunning because the surface tree is large, or whether the DirectRenderer is back-pressured on SwapBuffers. Each points at a different fix, and the names are the prerequisite for telling them apart.

The second is reasoning about why one renderer cannot read another’s pixels. Site Isolation places each cross-site iframe in its own renderer, which raises the question of how those separately-rendered frames become one image without a channel that lets a malicious iframe read its embedder’s content. Surface aggregation is the answer: the iframe is a separate Viz client, its frame is held as a surface the embedder references by SurfaceId, and the flattening happens inside the GPU process where neither renderer can reach the other’s bitmap. The cross-renderer separation is reconciled into one screen image here, and only here, without breaching the boundary.

The third is an AI coding agent reasoning about a feature that touches embedded content or cross-process rendering. It needs to know that the embedder references a surface rather than receiving its contents. Otherwise it may generate code that assumes an embedder can inspect or modify an embedded renderer’s frame.

For Chromium itself, aggregation is the architectural point where the multi-process decision pays its rendering bill. Multi-Process Architecture scattered rendering across processes for isolation; aggregation is the reconvergence that turns that scattered work back into one coherent frame, at the cost of a per-frame cross-process flattening pass the single-process predecessor never paid.

How to Recognize It

The aggregation step is directly observable from tools available in a running browser.

A chrome://tracing capture with the viz category enabled shows the GPU-process compositing loop: Display::DrawAndSwap, the SurfaceAggregator::Aggregate slice that flattens the surface tree, and the SwapBuffers (or overlay-scheduling) slice that presents the result. A frame that drops at Display shows the cause directly: an Aggregate slice that overran, or a gap where the aggregator waited on a surface that had not yet arrived. The same viz track sits at the bottom of the DevTools Performance panel’s GPU section, below the per-renderer cc activity, so a trace shows the renderer-side frame production and the GPU-side aggregation as two distinct bands.

Where a developer build exposes it, the internal page chrome://surfaces enumerates the live surfaces, their SurfaceIds, and the embedding relationships between them: the SurfaceManager’s state made visible, with a page of cross-site iframes showing several surfaces with distinct FrameSinkIds embedded under the root window’s surface.

The source tree maps the model to specific code. components/viz/ is the subsystem; components/viz/service/display/ holds the Display and DirectRenderer; viz::SurfaceManager holds the surfaces; viz::SurfaceAggregator performs the flattening; viz::CompositorFrameSink is the Mojo interface clients submit through. A regression bisect that lands in components/viz/service/display/ is an aggregation-or-presentation regression specifically, distinct from a cc/-side per-renderer scheduling regression upstream.

The SurfaceId pair is a subtler cue. When an embedded element resizes and the screen briefly holds the old size before snapping to the new one, that is the embedder’s frame still referencing the previous LocalSurfaceId while the embedded client rasters the new size; the new LocalSurfaceId activates only once its surface is ready. That brief hold is surface synchronization working as designed, not a paint bug.

How It Plays Out

A team building a product with many cross-site iframes reports that scrolling the top-level page hitches whenever several iframes are visible, even though each iframe scrolls smoothly on its own. A viz trace shows the SurfaceAggregator::Aggregate slice growing with the number of embedded surfaces, because the aggregator flattens a tree that grows a branch per embedded renderer on every frame it produces. The diagnosis is at aggregation, not at any single renderer’s pipeline; the cost is the flattening pass over a large surface tree. The fix is to reduce the count of independently-composited surfaces on screen at once, a different move from optimizing any one renderer’s frame.

A security engineer auditing a Chromium-based runtime asks whether an embedded third-party iframe can read the pixels of the page that embeds it. The answer runs through the surface model: the iframe is a separate Viz client whose frame is held as a surface in the GPU process, the embedder references that surface by SurfaceId rather than receiving its contents, and the flattening that combines them happens inside the trusted GPU process where neither renderer has the other’s bitmap. The audit’s conclusion is that the compositing path doesn’t provide a cross-renderer pixel-read channel, and the reasoning names the exact step (SurfaceAggregator in the GPU process) where the separation is preserved.

Consequences

Naming the aggregation step buys several operational properties.

Display-stage frame drops become attributable. A drop at Display isn’t uniformly “the GPU is slow”; it is an aggregation pass that overran on a large surface tree, an aggregator waiting on a surface whose LocalSurfaceId has not arrived, or a back-pressured SwapBuffers. The viz-track slices attribute the drop to one of these, each with its own remediation, turning “Display dropped a frame” into a specific question.

The trust boundary becomes locatable. The point where Site Isolation’s per-renderer separation is reconciled into one screen image is exactly the aggregator inside the GPU process. Anything that lets an embedder receive an embedded surface’s contents rather than reference it would move that boundary, and that is the thing to refuse.

The cost of the model is locatable too. Aggregation is a per-frame flattening pass over the surface tree, a cost the single-process predecessor never paid. The lever is the size and count of independently-composited surfaces: a window that embeds many separately-composited clients makes each pass more expensive. The cost lands neither in any single renderer nor in the raster backend, but in the GPU-process flattening between them.

Surface synchronization carries its own small liability. The consistent view that the SurfaceId pair holds across a resize is paid for with a possible one-frame gap mid-transition. The benefit is that readers never see a torn or half-resized embedded frame in the steady state.

Notes for Agent Context

When generating code that touches embedded or cross-process rendering in Chromium, treat each renderer and the browser UI as a separate Viz client that submits a CompositorFrame through its own CompositorFrameSink and never draws to the screen directly. Do not write code that assumes an embedder can read, copy, or modify an embedded renderer’s pixels; the embedder references the embedded content by SurfaceId (the FrameSinkId + LocalSurfaceId pair), and the flattening that combines surfaces happens only inside the trusted GPU process, never in any renderer.

When code changes an embedded surface’s size or other surface-invalidating property, mint a new LocalSurfaceId rather than reusing the old one, and let the embedder activate the new LocalSurfaceId only once the embedded surface for that size is ready; reusing a stale LocalSurfaceId across a resize produces a mismatched or half-resized aggregated frame. Keep the number of independently-composited surfaces on screen bounded: each one is a branch the SurfaceAggregator flattens on every frame, so an unbounded count of embedded surfaces makes the per-frame aggregation pass the bottleneck.

Sources

The canonical primary source is the Chromium project’s components/viz/README.md, which defines Viz as “the client library and service implementations for compositing and gpu presentation,” describes the display compositor as the component that composites “a set of frames, from multiple clients, into a single backing store for display to the user,” and names Frame Sinks as the Mojo interfaces clients submit through and Surfaces as the compositing service’s data model. The Life of a Frame document in the Chromium docs/ tree gives the submission-to-presentation sequence (SubmitCompositorFrame, the display-compositor deadline provided by the GPU process, AggregateSurfaces, GPU draw, Swap, Presentation) and states that the CompositorFrames from individual compositors go to the SurfaceManager in the GPU process where the SurfaceAggregator combines them when the Display asks. The Chromium Graphics Surfaces design document on chromium.org is the authoritative description of the SurfaceId model: the SurfaceId as the FrameSinkId + LocalSurfaceId pair, the FrameSinkId derived from the embedded client and the LocalSurfaceId from the embedder, the resize behavior that keeps the FrameSinkId and mints a new LocalSurfaceId, and the framing that surfaces exist “to allow graphical embedding of heterogeneous untrusting clients efficiently into one scene.” Steve Kobes’s Life of a Pixel lecture, recorded annually for Chrome University, walks the compositing-to-presentation half of rendering in motion and is the most thorough public long-form treatment of the path through Viz.

Technical Drill-Down