Update All Metadata Script

cloud_optimised_update_all_metadata is a command-line utility for validating dataset configuration files and updating metadata directly in cloud-optimised formats (Zarr or Parquet) without having to process new data.

Usage

cloud_optimised_update_all_metadata

This script will:

  1. Validate ALL JSON configuration files.

  2. Load the corresponding dataset configuration.

  3. Detect the output format (Zarr or Parquet).

  4. Use the appropriate handler (zarr/parquet) to update global and variable metadata.

By default, it processes all files in:

aodn_cloud_optimised/config/dataset/

To specify a subset of JSON files, modify the json_files argument in the script.

Output

  • Logs validation errors or metadata update failures.

  • Applies changes directly to the cloud-optimised datasets without rewriting the entire store.

[DEBUG] Initializing logger: animal_acoustic_tracking_delayed_qc
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_animal_acoustic_tracking_delayed_qc_2025-08-04.log
2025-08-04 17:33:44,717 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset animal_acoustic_tracking_delayed_qc against schema_validation_parquet.json.
2025-08-04 17:33:48,630 - INFO - GenericParquetHandler.py:897 - _add_metadata_sidecar - None: Existing Parquet store found at s3://imos-data-lab-optimised/animal_acoustic_tracking_delayed_qc.parquet. Updating Metadata
2025-08-04 17:33:50,947 - INFO - GenericParquetHandler.py:975 - _add_metadata_sidecar - None: Parquet metadata file successfully published to s3://imos-data-lab-optimised/animal_acoustic_tracking_delayed_qc.parquet/_common_metadata

[DEBUG] Initializing logger: animal_ctd_satellite_relay_tagging_delayed_qc
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_animal_ctd_satellite_relay_tagging_delayed_qc_2025-08-04.log
2025-08-04 17:33:50,966 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset animal_ctd_satellite_relay_tagging_delayed_qc against schema_validation_parquet.json.
2025-08-04 17:33:51,873 - INFO - GenericParquetHandler.py:897 - _add_metadata_sidecar - None: Existing Parquet store found at s3://imos-data-lab-optimised/animal_ctd_satellite_relay_tagging_delayed_qc.parquet. Updating Metadata
2025-08-04 17:33:52,087 - INFO - GenericParquetHandler.py:975 - _add_metadata_sidecar - None: Parquet metadata file successfully published to s3://imos-data-lab-optimised/animal_ctd_satellite_relay_tagging_delayed_qc.parquet/_common_metadata

[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_satellite_ghrsst_l4_gamssa_1day_multi_sensor_world_2025-08-04.log
2025-08-04 17:35:20,127 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset satellite_ghrsst_l4_gamssa_1day_multi_sensor_world against schema_validation_zarr.json.
2025-08-04 17:35:21,026 - ERROR - GenericZarrHandler.py:646 - _update_metadata - Dataset satellite_ghrsst_l4_gamssa_1day_multi_sensor_world does not exist yet - cannot update metadata
[DEBUG] Initializing logger: satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia_2025-08-04.log
2025-08-04 17:35:21,060 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia against schema_validation_zarr.json.
2025-08-04 17:35:21,966 - INFO - GenericZarrHandler.py:599 - _update_metadata - None: Existing Zarr store found at s3://imos-data-lab-optimised/satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia.zarr. Updating Metadata
2025-08-04 17:35:21,968 - INFO - GenericZarrHandler.py:603 - _update_metadata - Dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia: Updating Global Attributes
2025-08-04 17:35:23,165 - INFO - GenericZarrHandler.py:608 - _update_metadata - Dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia: Updating Variable Attributes
2025-08-04 17:35:23,305 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'time': schema says 'timestamp[ns]', Zarr store has 'int32'
2025-08-04 17:35:24,531 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'lat': schema says 'float', Zarr store has 'float32'
2025-08-04 17:35:26,358 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'lon': schema says 'float', Zarr store has 'float32'
2025-08-04 17:35:29,033 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'sea_ice_fraction': schema says 'double', Zarr store has 'float64'
2025-08-04 17:35:31,021 - WARNING - GenericZarrHandler.py:712 - update_store_varattrs_from_schema - None: ⚠️  Variable 'analysed_sst' in schema not found in Zarr store. Skipping.
2025-08-04 17:35:31,163 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'analysis_error': schema says 'double', Zarr store has 'float64'
2025-08-04 17:35:32,708 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️  Type mismatch for 'mask': schema says 'float', Zarr store has 'float64'
2025-08-04 17:35:35,315 - WARNING - GenericZarrHandler.py:712 - update_store_varattrs_from_schema - None: ⚠️  Variable 'crs' in schema not found in Zarr store. Skipping.
2025-08-04 17:35:55,934 - INFO - GenericZarrHandler.py:641 - _update_metadata - None: All expected global attributes successfully updated for dataset 'satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia'.

Notes

  • Make sure your configuration files are valid before applying updates.

  • Only metadata is modified; no data is changed or re-encoded.

  • For a Parquet dataset, the metadata sidecar file is updated

  • For a Zarr dataset, variable attributes and global attributes are updated directly into the store. The dataset is then loaded to be checked