Bulk Data Download

!pip install aiohttp requests s5cmd "xarray[io]"

Bulk Data Download#

This notebook shows how to perform bulk downloads with a S3 command line tool. This is useful if you want to have local access to a big subset of the data or event download the whole archive!

We can download data in bulk using any command line that supports the S3 protocol. We recommend using the s5cmd tool, which can be simply installed by running:

pip install s5cmd

Now we can download data using the cp command.

In this example, we are going to transfer the thompson scattering data for shot 30420 locally.

We need to set the endpoint of where the bucket is hosted (for now: https://s3.echo.stfc.ac.uk) and we need to set --no-sign-request for annonymous access.

%%capture --no-display
%%bash
s5cmd --no-sign-request --endpoint-url https://s3.echo.stfc.ac.uk cp s3://mast/level2/shots/30420.zarr/thomson_scattering/* ./30420.zarr/thomson_scattering;

Finally, we can open the file locally:

import xarray as xr
xr.open_zarr('30420.zarr/thomson_scattering', consolidated=False)
<xarray.Dataset> Size: 257kB
Dimensions:       (time: 88, major_radius: 120)
Coordinates:
  * time          (time) float64 704B -0.0568 -0.0518 -0.0468 ... 0.3732 0.3782
  * major_radius  (major_radius) float64 960B 0.3 0.31 0.32 ... 1.47 1.48 1.49
Data variables:
    n_e           (time, major_radius) float64 84kB ...
    p_e           (time, major_radius) float64 84kB ...
    t_e_core      (time) float64 704B ...
    n_e_core      (time) float64 704B ...
    t_e           (time, major_radius) float64 84kB ...
Attributes:
    description:  
    imas:         thomson_scattering
    label:        core temperature
    name:         thomson_scattering
    uda_name:     AYC_TE_CORE
    units:        eV