Update All Metadata Script
cloud_optimised_update_all_metadata is a command-line utility for validating dataset configuration files and updating metadata directly in cloud-optimised formats (Zarr or Parquet) without having to process new data.
Usage
cloud_optimised_update_all_metadata
This script will:
Validate ALL JSON configuration files.
Load the corresponding dataset configuration.
Detect the output format (Zarr or Parquet).
Use the appropriate handler (zarr/parquet) to update global and variable metadata.
By default, it processes all files in:
aodn_cloud_optimised/config/dataset/
To specify a subset of JSON files, modify the json_files argument in the script.
Output
Logs validation errors or metadata update failures.
Applies changes directly to the cloud-optimised datasets without rewriting the entire store.
[DEBUG] Initializing logger: animal_acoustic_tracking_delayed_qc
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_animal_acoustic_tracking_delayed_qc_2025-08-04.log
2025-08-04 17:33:44,717 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset animal_acoustic_tracking_delayed_qc against schema_validation_parquet.json.
2025-08-04 17:33:48,630 - INFO - GenericParquetHandler.py:897 - _add_metadata_sidecar - None: Existing Parquet store found at s3://imos-data-lab-optimised/animal_acoustic_tracking_delayed_qc.parquet. Updating Metadata
2025-08-04 17:33:50,947 - INFO - GenericParquetHandler.py:975 - _add_metadata_sidecar - None: Parquet metadata file successfully published to s3://imos-data-lab-optimised/animal_acoustic_tracking_delayed_qc.parquet/_common_metadata
[DEBUG] Initializing logger: animal_ctd_satellite_relay_tagging_delayed_qc
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_animal_ctd_satellite_relay_tagging_delayed_qc_2025-08-04.log
2025-08-04 17:33:50,966 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset animal_ctd_satellite_relay_tagging_delayed_qc against schema_validation_parquet.json.
2025-08-04 17:33:51,873 - INFO - GenericParquetHandler.py:897 - _add_metadata_sidecar - None: Existing Parquet store found at s3://imos-data-lab-optimised/animal_ctd_satellite_relay_tagging_delayed_qc.parquet. Updating Metadata
2025-08-04 17:33:52,087 - INFO - GenericParquetHandler.py:975 - _add_metadata_sidecar - None: Parquet metadata file successfully published to s3://imos-data-lab-optimised/animal_ctd_satellite_relay_tagging_delayed_qc.parquet/_common_metadata
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_satellite_ghrsst_l4_gamssa_1day_multi_sensor_world_2025-08-04.log
2025-08-04 17:35:20,127 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset satellite_ghrsst_l4_gamssa_1day_multi_sensor_world against schema_validation_zarr.json.
2025-08-04 17:35:21,026 - ERROR - GenericZarrHandler.py:646 - _update_metadata - Dataset satellite_ghrsst_l4_gamssa_1day_multi_sensor_world does not exist yet - cannot update metadata
[DEBUG] Initializing logger: satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia
[DEBUG] Adding StreamHandler...
[DEBUG] StreamHandler added.
[DEBUG] Adding FileHandler...
[DEBUG] FileHandler added. Log file path: /tmp/cloud_optimised_satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia_2025-08-04.log
2025-08-04 17:35:21,060 - INFO - CommonHandler.py:343 - validate_json - Successfully validated JSON configuration for dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia against schema_validation_zarr.json.
2025-08-04 17:35:21,966 - INFO - GenericZarrHandler.py:599 - _update_metadata - None: Existing Zarr store found at s3://imos-data-lab-optimised/satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia.zarr. Updating Metadata
2025-08-04 17:35:21,968 - INFO - GenericZarrHandler.py:603 - _update_metadata - Dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia: Updating Global Attributes
2025-08-04 17:35:23,165 - INFO - GenericZarrHandler.py:608 - _update_metadata - Dataset satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia: Updating Variable Attributes
2025-08-04 17:35:23,305 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'time': schema says 'timestamp[ns]', Zarr store has 'int32'
2025-08-04 17:35:24,531 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'lat': schema says 'float', Zarr store has 'float32'
2025-08-04 17:35:26,358 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'lon': schema says 'float', Zarr store has 'float32'
2025-08-04 17:35:29,033 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'sea_ice_fraction': schema says 'double', Zarr store has 'float64'
2025-08-04 17:35:31,021 - WARNING - GenericZarrHandler.py:712 - update_store_varattrs_from_schema - None: ⚠️ Variable 'analysed_sst' in schema not found in Zarr store. Skipping.
2025-08-04 17:35:31,163 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'analysis_error': schema says 'double', Zarr store has 'float64'
2025-08-04 17:35:32,708 - WARNING - GenericZarrHandler.py:687 - update_store_varattrs_from_schema - None: ⚠️ Type mismatch for 'mask': schema says 'float', Zarr store has 'float64'
2025-08-04 17:35:35,315 - WARNING - GenericZarrHandler.py:712 - update_store_varattrs_from_schema - None: ⚠️ Variable 'crs' in schema not found in Zarr store. Skipping.
2025-08-04 17:35:55,934 - INFO - GenericZarrHandler.py:641 - _update_metadata - None: All expected global attributes successfully updated for dataset 'satellite_ghrsst_l4_ramssa_1day_multi_sensor_australia'.
Notes
Make sure your configuration files are valid before applying updates.
Only metadata is modified; no data is changed or re-encoded.
For a Parquet dataset, the metadata sidecar file is updated
For a Zarr dataset, variable attributes and global attributes are updated directly into the store. The dataset is then loaded to be checked