Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.expanse.sh/llms.txt

Use this file to discover all available pages before exploring further.

Expanse treats a cloud account, project, subscription, or workspace as a cloud-batch compute. The daemon observes managed batch and training services through provider control planes, so jobs launched through native cloud APIs appear beside the rest of your Expanse telemetry.

What gets captured

For each managed batch or training job, Expanse records:
  • Job definition, image, command, entrypoint, queue, pool, region, and project context.
  • Submitter, service account, tags, labels, and parameters.
  • Requested CPU, memory, GPU, accelerator, and machine shape.
  • Status transitions, runtime, exit status, retry state, logs, and failure context.
  • Live utilisation metrics where the provider exposes them.
Those records appear in the Console and feed expanse analyse, expanse diagnose, and the intelligence layer.

Supported services

  • AWS Batch.
  • AWS SageMaker Training Jobs.
  • Azure Batch.
  • Azure ML Jobs.
  • Google Batch.
  • Vertex AI Custom Training.

Register Cloud Batch

Register a compute and choose cloud-batch when prompted:
expanse compute register
The CLI prints the provider-specific configuration for the account, project, subscription, or workspace to monitor. Run one daemon per boundary you want represented as a separate compute in Expanse.

Typical boundaries

ProviderRegister one compute for
AWSAn account and region running AWS Batch or AWS SageMaker Training Jobs.
AzureA subscription, resource group, or workspace running Azure Batch or Azure ML Jobs.
Google CloudA project and region running Google Batch or Vertex AI Custom Training.

Verify capture

Submit a managed batch or training job after the daemon starts. Within a minute of the provider reporting the job, the cloud-batch compute and the job appear in console.expanse.sh.

Next steps

Computes

How cloud batch maps to the Expanse compute model.

Telemetry

What Expanse captures before, during, and after each job.