.. _testing: Testing ======= The project includes comprehensive tests for both the library and the MCP server. .. _mcp-testing: Testing the MCP Server ---------------------- The MCP server ships with a dedicated integration test suite in ``integration_testing/test_mcp_server.py``. All tests call the MCP tool functions directly (not via the protocol transport) so they exercise exactly the same code paths an AI assistant uses — without requiring a running server process. The suite is **excluded from the default** ``pytest`` run (``--ignore=integration_testing`` in ``pyproject.toml``) because some tests make live S3 requests or execute full Jupyter notebooks, which would be too slow and network-dependent for a standard CI run. Prerequisites ------------- Install the ``mcp`` optional dependency group before running: .. code-block:: bash pip install -e ".[mcp]" # or with poetry: poetry install --extras mcp Test Tiers and CLI Flags ------------------------ Three custom flags control which test tiers are enabled. .. list-table:: :header-rows: 1 :widths: 25 20 55 * - Flag - Mark applied - What it enables * - *(none)* - — - Offline tests only (catalog, plot guide, Python REPL). No network access required. Runs in ~8 s. * - ``--run-s3`` - ``@pytest.mark.s3`` - Adds live anonymous S3 queries: ``check_dataset_coverage`` and ``introspect_dataset_live`` against the real AODN public bucket. Requires internet access (~10–60 s per test). * - ``--run-notebooks`` - ``@pytest.mark.notebooks`` - Adds notebook tests (S3 also required): **Synthetic unit tests** (``TestMcpValidateNotebook``, fast): verify ``validate_notebook`` itself works on tiny synthetic notebooks. **Scripted-agent generation** (``TestMcpGeneratedNotebook``): calls ``get_dataset_schema``, ``check_dataset_coverage``, and ``execute_python_cell`` in sequence, assembles a NEW ``.ipynb``, and validates it. ~5–10 min. **Real user scenarios** (``TestMcpEndToEnd``): 5 tests driven by real user questions (Coffs Harbour, SA Gulfs, Argo Coral Sea, mooring near Sydney, SST Southern Ocean). Each chains the full MCP tool sequence and generates + validates a fresh notebook. ~10–20 min. * - ``--run-all`` - both - Short-hand for ``--run-s3`` + ``--run-notebooks`` combined. Marked tests are **automatically skipped** unless the corresponding flag is passed — there is no need to use ``-m`` expressions. Running the Tests ----------------- .. code-block:: bash # Fast offline tests (catalog, plot-guide, Python REPL) — no network needed: pytest integration_testing/test_mcp_server.py -v # Include live S3 queries: pytest integration_testing/test_mcp_server.py -v --run-s3 # Scripted-agent notebook generation (also needs --run-s3): pytest integration_testing/test_mcp_server.py -v --run-s3 --run-notebooks # Run everything: pytest integration_testing/test_mcp_server.py -v --run-all Test Classes ------------ .. list-table:: :header-rows: 1 :widths: 35 10 55 * - Class - Tests - Description * - ``TestMcpCatalogOffline`` - 13 - ``list_datasets``, ``search_datasets``, ``get_dataset_info``, ``get_dataset_schema``, ``get_dataset_config``. Includes parent/child config inheritance assertions (e.g. radar child inheriting schema from parent via ``load_dataset_config``). * - ``TestMcpNotebookOffline`` - 8 - ``get_notebook_template``, ``get_plot_guide`` (parquet, zarr, radar), ``get_dataquery_reference``. Verifies safe-date helpers, xarray anti-pattern warnings, and standalone function signatures. * - ``TestMcpExecutePython`` - 8 - ``execute_python_cell``: success output, error tracebacks, magic stripping, session persistence, session isolation, pre-populated symbols (``GetAodn``, ``plot_ts_diagram``), timeout handling. * - ``TestMcpLiveS3`` - 7 - ``check_dataset_coverage`` and ``introspect_dataset_live`` against real S3. Verifies Argo coverage inside/outside Australia, SST zarr coverage, ``JULD`` vs ``TIME`` in Argo, and the ``wind_speed`` ⚠️ flag (present in JSON config but absent from the live Zarr store). * - ``TestMcpValidateNotebook`` - 2 - ``validate_notebook`` on synthetic ``.ipynb`` files: clean notebook returns only ✅ cells; notebook with ``1/0`` returns ❌ + traceback. * - ``TestMcpDatasetSummary`` - 10 - ``get_dataset_summary`` tool: validates data type classification (parquet/zarr/radar), AWS description inclusion, coordinate variables, data variable tables, matching notebook path, and recommended code patterns for different dataset formats. * - ``TestMcpNotebookBuilder`` - 8 - ``start_notebook`` / ``add_notebook_cell`` / ``save_notebook`` lifecycle. Tests: session creation, setup cell auto-execution, code cell validation, markdown cells added unconditionally, rejection of broken cells (with traceback), variable persistence across cells, save to valid ``.ipynb``. * - ``TestMcpGeneratedNotebook`` - 2 - **Scripted-agent notebook generation.** Simulates an AI workflow for Argo (parquet) and SST (zarr): calls ``get_dataset_schema`` to discover real variable names (e.g. ``JULD`` not ``TIME``), calls ``check_dataset_coverage`` to confirm data exists, uses ``execute_python_cell`` to iteratively test each code snippet, then assembles a new ``.ipynb`` and validates it with ``validate_notebook``. Fails if the schema tool returns wrong variable names, proving the MCP tools produce working notebooks. Requires ``--run-s3 --run-notebooks`` (or ``--run-all``). * - ``TestMcpEndToEnd`` - 5 - Five real user-question scenarios. Each chains the full MCP tool sequence (``search_datasets`` → ``check_dataset_coverage`` → ``get_dataset_schema`` → ``execute_python_cell`` → ``validate_notebook``) and generates a fresh ``.ipynb``. See :ref:`mcp-e2e-scenarios` below. .. _mcp-e2e-scenarios: End-to-End Scenarios -------------------- Each scenario encodes a real user question. The test chains MCP tools in the same order an AI assistant would, builds a new notebook from the tool outputs, and validates it executes cleanly — no pre-existing notebooks required. .. list-table:: :header-rows: 1 :widths: 35 65 * - Test / User question - MCP tool chain * - **Coffs Harbour Jan 2020 sea state** - *"A notebook showing radar data at Coffs Harbour in January 2020, compare with Argo and gridded SST."* search → coverage (argo + SST) → schema (discovers ``JULD``) → REPL (load + query both) → generates ``generated_coffs_harbour.ipynb`` → validate. * - **SA Gulfs HAB — April, chlorophyll** - *"Explore AODN datasets in the HAB area (SA Gulfs 134–141.5°E, 34–39.5°S). Compare datasets across April. Add chlorophyll."* search (radar, chlorophyll, argo) → coverage → schema → REPL → generates ``generated_sa_gulfs.ipynb`` → validate. * - **Argo Coral Sea 2018** - *"Argo float T/S profiles in the Coral Sea in 2018."* schema (confirms ``JULD``) → coverage → REPL → generates ``generated_argo_coral_sea.ipynb`` → validate. * - **Mooring temperature near Sydney** - *"Mooring temperature near Sydney 2018–2022, include T/S diagram."* search → schema (``TIME``, ``TEMP``) → coverage → REPL → generates ``generated_mooring_sydney.ipynb`` → validate. * - **SST Southern Ocean Dec–Feb** - *"SST anomaly in the Southern Ocean during austral summer."* search → schema → coverage → REPL → generates ``generated_sst_southern_ocean.ipynb`` → validate. Adding New Tests ---------------- To add a test for a new dataset or scenario: 1. Import the MCP tool functions at the top of ``test_mcp_server.py``:: from aodn_cloud_optimised.mcp.server import check_dataset_coverage, ... 2. Subclass ``_McpAgentMixin, unittest.TestCase`` and decorate with ``@pytest.mark.s3`` + ``@pytest.mark.notebooks``. 3. Follow the 5-step pattern: search → coverage → schema → REPL test → ``_make_notebook`` + ``validate_notebook``. **Do not call boto3, xarray, or pandas directly** — use only MCP tool functions.