import healpy as hpHEALPix maps are generally stored in FITS format, they can be Gzipped to save disk space. Reading gzipped maps is natively supported in healpy, however, it takes longer because maps are uncompressed on the fly.
As usual, the best way to assess this is to test. In this case, I am testing using the JupyterHub@NERSC running on a Cori shared CPU server, the maps have about 30% of unobserved pixels and are stored in double precision.
For this case, the performance loss due to compression is significant, in particular if accessing just a subset of the maps. Therefore it is hard to justify compression, unless storage is really a limiting factor.
This is heavily case-dependent, so a new test (possibly reusing this notebook) should be performed on a different dataset and machine.
hp.version.__version__'1.16.2'
cd /global/cfs/cdirs/cmbs4/dc/dc1//global/cfs/cdirs/cmbs4/dc/dc1
!rm -r test_readgzip || true%mkdir test_readgzipcd test_readgzip/global/cfs/cdirs/cmbs4/dc/dc1/test_readgzip
Stage data
folders = ["test1", "test2", "test1gz", "test2gz"]
folders_string = " ".join(folders)!mkdir -p $folders_stringinput_folder = "../staging/noise_sim/outputs_rk/coadd/LAT0_CHLAT"
m = "coadd_LAT0_CHLAT_f150_001of001_map.fits"
cov = "coadd_LAT0_CHLAT_f150_001of001_cov.fits"for f in folders:
print(f)
!cp $input_folder/$m $input_folder/$cov $ftest1
test2
test1gz
test2gz
Compression
for f in folders:
if f.endswith("gz"):
!time gzip $f/$m
!time gzip $f/$cov
real 3m24.095s
user 3m13.681s
sys 0m4.717s
real 5m27.430s
user 5m13.233s
sys 0m7.657s
real 3m22.517s
user 3m12.237s
sys 0m4.510s
real 5m21.693s
user 5m7.241s
sys 0m7.664s
ls test1gzcoadd_LAT0_CHLAT_f150_001of001_cov.fits.gz
coadd_LAT0_CHLAT_f150_001of001_map.fits.gz
for each in [m, cov]:
fits = !stat -c "%s" test1/$each
each += ".gz"
gz = !stat -c "%s" test1gz/$each
print(f"{each} compression factor: {int(gz[0])/int(fits[0]):.0%}")coadd_LAT0_CHLAT_f150_001of001_map.fits.gz compression factor: 63%
coadd_LAT0_CHLAT_f150_001of001_cov.fits.gz compression factor: 42%
Benchmark map access
%%time
_ = hp.read_map(f"test1/{m}", (0,1,2))CPU times: user 33.2 s, sys: 26.4 s, total: 59.6 s
Wall time: 1min 6s
%%time
_ = hp.read_map(f"test1gz/{m}.gz", (0,1,2))CPU times: user 1min 36s, sys: 24.7 s, total: 2min 1s
Wall time: 2min 2s
%%time
_ = hp.read_map(f"test2/{m}", 0)CPU times: user 10.6 s, sys: 9.98 s, total: 20.6 s
Wall time: 27.3 s
%%time
_ = hp.read_map(f"test2gz/{m}.gz", 0)CPU times: user 1min 12s, sys: 11.8 s, total: 1min 24s
Wall time: 1min 25s
Benchmark noise covariance access
%%time
_ = hp.read_map(f"test1/{cov}", (0,1,2,3,4,5))CPU times: user 1min 13s, sys: 50 s, total: 2min 3s
Wall time: 2min 23s
%%time
_ = hp.read_map(f"test1gz/{cov}.gz", (0,1,2,3,4,5))CPU times: user 3min 10s, sys: 54.9 s, total: 4min 5s
Wall time: 4min 6s
%%time
_ = hp.read_map(f"test2/{cov}", 0)CPU times: user 10.9 s, sys: 13.5 s, total: 24.4 s
Wall time: 44.2 s
%%time
_ = hp.read_map(f"test2gz/{cov}.gz", 0)CPU times: user 2min 7s, sys: 19.5 s, total: 2min 27s
Wall time: 2min 28s