Quick Start
Once you’ve installed the package, here’s a simple example to process a dataset.
Core Data Processing
If you installed the core package with make core, you can process a dataset using the command line:
generic_cloud_optimised_creation \
--dataset-config config/dataset/your_dataset.json \
--paths "s3://bucket/path/to/data/" \
--cluster-mode local
This will: 1. Read the dataset configuration from JSON 2. Discover files matching the path pattern 3. Convert and optimize the data to cloud format (Zarr or Parquet) 4. Upload to S3
For detailed configuration options, see Dataset Configuration.
Using DataQuery API (Notebooks)
If you installed with notebooks support (make dev or pip install .[notebooks]),
you can query data from Jupyter:
from DataQuery import GetAodn
aodn = GetAodn()
# Query data from a dataset
df = aodn.get_dataset('argo.parquet').get_data(
date_start='2020-01-01',
date_end='2020-03-01',
lat_min=-35,
lat_max=-27,
lon_min=150,
lon_max=158
)
print(df.head())
See Usage and individual dataset notebooks in Notebooks for more examples.
Next Steps
Configure a new dataset: Dataset Configuration
Write a Jupyter notebook: Notebooks
Set up the MCP server: MCP Server
Contribute code: Development