Use the distributed file format Zarr on Jetstream Swift object storage


March 3, 2018


Zarr is a pretty new file format designed for cloud computing, see documentation and a webinar for more details.

Zarr is also supported by dask, the parallel computing framework for Dask, and the Dask team implemented storage backends for Google Cloud Storage and Amazon S3.

Use OpenStack swift on Jetstream for object storage

Jetstream also offers (currently in beta) access to object storage via OpenStack Swift. This is a separate service from the Jetstream Virtual Machines, so you do not need to spin any Virtual Machine dedicated to storing the data but just use the object storage already provided by Jetstream.

Read Zarr files from object store

If somebody else has already made available some files on object store and set their visibility to “public”, anybody can read them.

See this notebook

Need openstack RC file version 3 from:

pip install python-openstackclient

source the openstackRC file, put the password, this is the TACC password, NOT the XSEDE Password. I know.

now create ec2 credentials with:

openstack ec2 credentials create -f json > ec2.json

test if we can access this.

I installed this on js-169-169

actually we can skip ec2 credentials and just use openstack:

openstack object list zarr_pangeo

save credentials in ~/.aws/config

import s3fs
fs = s3fs.S3FileSystem(client_kwargs=dict(endpoint_url=""))"zarr_pangeo")

Zarr with dask on 1 node works fine

Need to test: * access from multiple nodes with distributed * test read-only access without authentication