!pip install aiohttp requests s5cmd "xarray[io]"
import xarray as xr
Bulk Data Download#
This notebook shows how to perform bulk downloads with a S3 command line tool. This is useful if you want to have local access to a big subset of the data or event download the whole archive!
We can download data in bulk using any command line that supports the S3 protocol. We recommend using the s5cmd tool, which can be simply installed by running:
pip install s5cmd
Now we can download data using the cp command.
In this example, we are going to transfer the thompson scattering data for shot 30420 locally.
We need to set the endpoint of where the bucket is hosted (for now: https://s3.echo.stfc.ac.uk) and we need to set --no-sign-request for annonymous access.
%%capture --no-display
%%bash
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk cp s3://mast/level2/shots/30420.zarr/thomson_scattering/* ./30420.zarr/thomson_scattering;
Finally, we can open the file locally:
xr.open_zarr('30420.zarr/thomson_scattering', consolidated=False)
/srv/fair-mast/.docs-venv/lib/python3.12/site-packages/zarr/core/group.py:551: UserWarning: Both zarr.json (Zarr format 3) and .zgroup (Zarr format 2) metadata objects exist at file:///srv/fair-mast/docs/30420.zarr/thomson_scattering. Zarr format 3 will be used.
warnings.warn(msg, stacklevel=1)
/srv/fair-mast/.docs-venv/lib/python3.12/site-packages/zarr/core/group.py:3376: UserWarning: Object at .zattrs is not recognized as a component of a Zarr hierarchy.
warnings.warn(
/srv/fair-mast/.docs-venv/lib/python3.12/site-packages/zarr/core/group.py:3376: UserWarning: Object at .zgroup is not recognized as a component of a Zarr hierarchy.
warnings.warn(
<xarray.Dataset> Size: 257kB
Dimensions: (time: 88, major_radius: 120)
Coordinates:
* major_radius (major_radius) float64 960B 0.3 0.31 0.32 ... 1.47 1.48 1.49
* time (time) float64 704B -0.0568 -0.0518 -0.0468 ... 0.3732 0.3782
Data variables:
n_e_core (time) float64 704B ...
p_e (major_radius, time) float64 84kB ...
n_e (major_radius, time) float64 84kB ...
t_e (major_radius, time) float64 84kB ...
t_e_core (time) float64 704B ...
Attributes:
name: thomson_scattering
description: Thomson scattering diagnostic
imas: thomson_scattering
license_name: Creative Commons 4.0 BY-SA
license_url: https://creativecommons.org/licenses/by-sa/4.0/