UXarray MCP Server: AI-Agent Dataset Exploration with Globus Compute

Presented at SciFM26

UXarray MCP Server: AI-Agent Dataset Exploration with Globus Compute

Agentic analysis of production Earth-system meshes at facility scale — typed tools, Globus Compute, provenance, and a natural-language regional explorer.

HPC climate AI agents UXarray Globus Compute MCP

20+ typed MCP tools covering the full mesh analysis workflow

0 bytes of raw mesh data transferred over the network

HPC Validated on Argonne Improv and UCAR Derecho via Globus Compute

Getting a topology summary from a production Earth-system mesh today requires an SSH session, a conda environment, hand-written analysis scripts, a batch job, and a download step. When a colleague asks “what is the resolution near Florida?” the answer is a project, not a question.

We have been building toward a better answer: an MCP server for UXarray that exposes mesh inspection, area diagnostics, subsetting, and plotting as typed, provenance-producing tools. A Globus Compute backend routes computation to leadership-class hardware so multi-gigabyte files never leave facility storage.

I presented this UXarray MCP work at SciFoundationModels 2026 (SciFM26).

The problem

Production Earth-system meshes are large, opaque, and live on HPC clusters. A coastal ocean mesh can weigh ten gigabytes or more. Getting scientifically meaningful information from it — face counts, area distributions, resolution near a coastline — normally requires direct cluster access and the right scripts in the right environment.

That friction adds up. It blocks scientists from exploring datasets quickly. It produces results with no durable record of how they were generated. And it makes it hard for an AI agent to act on scientific data in a controlled, reproducible way.

The Model Context Protocol (MCP) addresses the interface problem: it gives an agent a typed tool catalog with explicit input schemas, structured return values, and a clear server-side execution boundary. But MCP by itself does not make scientific software agent-ready. The contribution of this work is an MCP server that makes UXarray usable as a controlled, provenance-producing action surface for agents working with unstructured Earth-system meshes at facility scale.

What the server does

The server exposes mesh analysis as a set of typed tools that any MCP-compatible client — including Claude — can call directly. The tool surface covers a coherent scientific workflow:

Discovery — scan a directory tree on a local or remote filesystem, classify files as grid or data, and return suggested next tool calls
Inspection — load a mesh file (MPAS, UGRID, SCRIP, ESMF, HEALPix) and return topology metadata: face count, node count, edge count, mesh format
Area diagnostics — compute face area statistics with automatic unit detection (steradians vs. m²)
Validation — check for NaN, Inf, fill values, and topology mismatches before any analysis runs
Subsetting — select faces by bounding box or polygon; extract cross-sections
Visualization — render inline PNG plots of mesh wireframes, face-centered variables, and zonal means
Session and workflow state — register datasets by handle, run multi-step workflows, resume from saved state, export results

Every tool call produces a structured provenance record containing the inputs, parameters, and execution context. That record is a complete specification: any result can be reproduced or modified months later without reconstructing what was done.

The HPC path is handled by Globus Compute. The MCP tool interface does not change when the backend moves from a laptop to an HPC cluster. Raw mesh files never transit the network — only compact JSON summaries or PNG images cross back to the client.

**Global companion mesh.** MPAS atmosphere grid — one of the production datasets in the campaign.

EESMPI SEATS dataset preview — **EESMPI / SEATS dataset.** NSF Raijin and DOE SEATS unstructured grids used in end-to-end agent workflow tests.

Florida Gulf coastal refinement mesh — **Florida Gulf coast mesh.** High-resolution coastal refinement region — the anchor example for the regional mesh explorer.

The campaign

We ran validation campaigns on two HPC systems — Argonne Improv and the UCAR Derecho cluster — using production Earth-system meshes staged on facility filesystems. The meshes span a wide range of formats, resolutions, and file sizes — from a compact global atmosphere grid to a large coastal-refined ocean mesh. The same MCP server and Globus Compute backend ran on both without code changes.

The campaign tested four things:

Data estate discovery — can the agent find and classify files on facility storage without SSH?
Multi-mesh topology diagnostics — does UXarray produce scientifically consistent geometry across mesh families and formats?
Structured failure recovery — can the server classify realistic failure modes (CF violations, format mismatches, topology errors) into actionable triage records rather than raw tracebacks?
Artifact economics — what does the scientist receive in exchange for the overhead of a remote call?

The presentation covered the campaign results, timing behavior, topology checks, failure triage, and artifact economics.

Agent loop diagram — reasoning steps in the UXarray MCP campaign — The agent reasoning loop: Analyze → Plan → Execute → Verify. Each iteration proposes tool calls, executes them via Globus Compute, and decides whether the result closes the question or requires another pass.

The regional mesh explorer

The most visually direct piece of the work is an agentic pipeline we built on top of the server. A scientist types a region name in plain English. The Claude API converts it to a lat/lon bounding box. That bounding box is forwarded to the HPC endpoint — Argonne Improv or UCAR Derecho — via Globus Compute, which subsets the mesh, renders a wireframe plot on the worker, and returns the image to the laptop.

No coordinates entered by hand. No SSH. No code written.

WC14to60 mesh subset: Florida coast, rendered on Improv. — Florida coast subset of a Western-Atlantic coastal refinement mesh, rendered on an HPC worker via Globus Compute and returned to the laptop as a PNG. The agent extracted this bounding box from the phrase "Florida coast" with no scientist-provided coordinates.

Continental US mesh subset. — **Continental United States.** Full CONUS bounding box. Coastal refinement visible along the East Coast and Gulf of Mexico.

New York City coast mesh subset. — **New York City coast.** Tight coastal region. LLM-extracted bounding box; warm Globus Compute worker.

San Francisco Bay coast mesh subset. — **San Francisco Bay coast.** West-coast fine cells. Same pipeline, different region description.

The resolution ratio between the subsetted region and the full-mesh mean tells you whether the mesh actually achieves its stated coastal refinement goal — confirmed numerically in the paper.

Provenance as infrastructure

Every tool call in this server returns a structured provenance record alongside the result. That record captures the grid path, variable name, bounding box, plot parameters, endpoint ID, library version, and wall time — everything needed to reproduce or modify the result without reconstruction.

This matters most six months later, when a scientist wants to regenerate a figure for a revised manuscript, compare results across mesh versions, or zoom in on a different subregion. With MCP provenance, that is a matter of editing one JSON field and resubmitting. Without it, it is a matter of finding the script, guessing the parameters, and hoping the environment still matches.

The three-tier model

The system composes three layers:

Natural language (LLM) converts unstructured intent into structured tool calls — bounding boxes from region names, variable selections from prose descriptions. This is the tier where hallucination risk is highest, and where the schema contract of the MCP layer provides the most protection.

Typed tool surface (MCP server) is the trust boundary. It validates every tool call against a schema, rejects ill-formed inputs before they reach any compute resource, routes to local or HPC execution, and attaches provenance. The agent cannot call a function absent from the catalog.

Execution backend (HPC / Globus Compute) runs actual computation on data-local hardware. Raw mesh files never leave facility storage. Only compact artifacts cross the network. The backend has no knowledge of MCP or the LLM; it receives a serialized Python callable and returns a result.

Future work: MCP SEPs we are watching

This project also gives us a practical lens on the MCP standards work now moving through SEPs. Scientific workflows stress parts of the protocol that are easy to ignore in short local demos: authentication, long-running operations, progress reporting, artifact handling, and pre-execution review.

For UXarray MCP, the most important pieces are:

Authentication and delegation — remote mesh analysis often runs against facility filesystems and HPC endpoints, so the client/server boundary needs a clean way to represent user identity, delegated access, and revocation without baking site-specific credentials into every tool.
Long-running tasks and progress — a 20-second mesh inspection, a queued PBS job, and a multi-step regional workflow should not all look like the same blocking function call. We are watching the task/progress proposals because scientific users need stage-level status, cancellation, retry, and resumability.
Artifacts and provenance — plots, JSON summaries, logs, and derived mesh subsets need first-class metadata. The useful return value is not only a PNG or table; it is the artifact plus where it came from, which file was used, which endpoint ran it, and how to reproduce it.
Pre-execution hooks and policy checks — before an agent launches an HPC job or reads a restricted path, the user or site policy may need a checkpoint. Interceptor-style SEPs matter here because scientific agents need controlled execution, not just tool access.

The broader point is that UXarray MCP is not only waiting for more tools. It is waiting for the protocol surface to mature around the realities of scientific work: authenticated data, expensive runs, durable artifacts, and human checkpoints at the right places.

Why it matters

I presented this work at SciFoundationModels 2026 (SciFM26), with the broader goal of making scientific mesh analysis more reproducible and easier to run where the data already lives.

Supported by the U.S. National Science Foundation under Grant No. 2126458 (EarthCube) and the U.S. Department of Energy Office of Science SEATS project.