Intelligence

Expanse exposes two intelligence commands. Both reason over evidence from your own compute.

expanse analyse

Before submission. Recommends CPU, memory, GPU, and walltime.

expanse diagnose

After failure. Solution-oriented guidance with cited evidence.

Resource recommendation

expanse analyse recommends what a workload will need before you submit it. It accepts a SLURM batch script, a source file, a Kubernetes workload manifest, or a Nomad jobspec.

expanse analyse train.slurm

Terminal
JSON

EXPANSE ANALYSE
target                          train.slurm
job_name                        train
run                             8b2f4c1e-…
status                          run_status_succeeded

RESOURCE RECOMMENDATION
cpu                             16
memory                          36.0 GiB
gpu                             1
walltime                        02:30:00
memory_floor_required           29.8 GiB
memory_requested                24.0 GiB
confidence                      high

FAILURE RISK
oom                             high
    Requested memory is below the observed peak of similar runs.

SIMILAR FAILED EXECUTIONS
  - 9c41d2ae-… (SLURM job 41982) provenance=same_workload score=0.93 finished=2026-06-28

RESOURCE PATCH (review-first)
  Raise #SBATCH --mem from 24G to 36G.
  inspect: expanse analyse diff 8b2f4c1e-… --artefact <artefact-id> --expected-hash sha256:…
  apply:   expanse analyse apply 8b2f4c1e-… --artefact <artefact-id> --expected-hash sha256:…

RECOMMENDATION ANCHORS (successful runs)
  - 5b8e1f04-… (SLURM job 41776) provenance=same_workload score=0.91 finished=2026-06-26: succeeded with 36G, peak memory 31.2 GiB

expanse analyse train.slurm --json

{
  "run_id": "8b2f4c1e-…",
  "status": "RUN_STATUS_SUCCEEDED",
  "result": {
    "recommended_resources": {
      "cpu_cores": 16,
      "memory_bytes": "38654705664",
      "gpu_count": 1,
      "walltime_seconds": "9000",
      "rationale": "…"
    },
    "confidence": "CONFIDENCE_LABEL_HIGH",
    "evidence": [
      {
        "execution_id": "5b8e1f04-…",
        "similarity_tier": "same_workload",
        "similarity_score": 0.91,
        "summary": "succeeded with 36G, peak memory 31.2 GiB"
      }
    ],
    "failure_risks": [
      {
        "kind": "FAILURE_RISK_KIND_OOM",
        "severity": "RISK_SEVERITY_HIGH",
        "explanation": "Requested memory is below the observed peak of similar runs."
      }
    ],
    "artifacts": [
      {
        "artifact_id": "…",
        "artifact_type": "ARTIFACT_TYPE_UNIFIED_DIFF",
        "content_hash": "sha256:…"
      }
    ]
  }
}

Recommendations are tied to your compute: when similar runs exist, the output cites the executions that anchored the numbers. For genuinely novel workloads there is no history to cite, so the recommendation is derived from your source alone, flagged with low confidence and explicit missing-evidence warnings rather than a fabricated history. Patches are review-first: analyse never touches your files on its own. Inspect the patch with expanse analyse diff, then apply it with expanse analyse apply (interactive confirmation, or --yes).

Failure diagnosis

expanse diagnose returns solution-oriented guidance for a failed execution. It cites the evidence it used: telemetry, logs, the captured source bundle, and similar executions of the same workload.

expanse diagnose 7f3e9a2b-…

Terminal output
JSON output

EXPANSE DIAGNOSE
target                          7f3e9a2b-…
job_name                        train
state                           failed
failure_class                   out_of_memory
compute                         hpc-01

RESOURCE SNAPSHOT
runtime_s                       2531
gpu_type                        NVIDIA A100-SXM4-80GB
cuda_version                    12.4
gpu_count                       1
gpu_util_pct                    91%
gpu_mem_avail                   80.0 GB
gpu_mem_observed                78.9 GB
cpu_requested                   16
host_ram_requested_gb           24.0 GB

PATCH
diff                            expanse diagnose diff <run-id> --artefact <artefact-id>
patch_status                    pending

Would you like to see the suggested fix and patch? [y/n] y

SUGGESTED FIX
Raise #SBATCH --mem from 24G to 36G, or reduce batch_size from 64 to 32.

SUGGESTED PATCH
diff --git a/train.slurm b/train.slurm
--- a/train.slurm
+++ b/train.slurm
@@ -3,7 +3,7 @@
 #SBATCH --job-name=train
 #SBATCH --cpus-per-task=16
-#SBATCH --mem=24G
+#SBATCH --mem=36G
 #SBATCH --gres=gpu:1
 #SBATCH --time=02:30:00

Apply this fix with git apply? [y/n]

expanse diagnose 7f3e9a2b-… --json

{
  "run_id": "d41c8a7e-…",
  "status": "RUN_STATUS_SUCCEEDED",
  "result": {
    "root_cause": "The job was killed by the cgroup out-of-memory handler: peak host memory reached the 24G request during loss.backward.",
    "recommended_fix": "Raise #SBATCH --mem from 24G to 36G, or reduce batch_size from 64 to 32.",
    "confidence": "CONFIDENCE_LABEL_HIGH",
    "evidence": [
      {
        "execution_id": "5b8e1f04-…",
        "summary": "9 similar workloads succeeded with 36G or more"
      }
    ],
    "artifacts": [
      {
        "artifact_id": "…",
        "artifact_type": "ARTIFACT_TYPE_UNIFIED_DIFF",
        "content_hash": "sha256:…"
      }
    ],
    "display": {
      "target": "7f3e9a2b-…",
      "job_name": "train",
      "state": "failed",
      "failure_class": "out_of_memory",
      "compute": "hpc-01"
    }
  }
}

The interactive fix prompts appear only on a terminal, and only when the diagnosis produced a unified-diff artefact. Patch bytes are never embedded in the response; fetch and apply them explicitly from any shell:

expanse diagnose diff <run-id> --artefact <artefact-id>    # review the patch
expanse diagnose apply <run-id> --artefact <artefact-id>   # apply with git apply (add --yes to skip confirmation)

Patches are review-first: the server never applies them. Application always happens on your machine, after you have seen the diff.

Getting Started

Core Concepts

Deployment

Integrations

expanse analyse

expanse diagnose

Resource recommendation

Failure diagnosis

expanse analyse

expanse diagnose

​Resource recommendation

​Failure diagnosis

Resource recommendation

Failure diagnosis