0

I have created a hdf5 file in Matlab with a Matrix size of (1 x 19,000,000). The file had a size of 150 megabytes.

  1. My question is on how to find the perfect chunk size and deflate ratio? After playing around I have discovered that a chunk size of 1 x 1,000,000 with deflate set to 7 achieves a file of 100 megabytes.

  2. My second problem is that I am unable to import this file in Python

Matlab

h5create('Xn.h5','/rawdata',size (data),'ChunkSize',[1 1000000],'Deflate',7 )

Python

import h5py
filename = 'Xn.h5'
f = h5py.File(filename, 'r')

print("Keys: %s" % f.keys())

I expected that Python will handle the data smoothly just as matlab but this never happened

11
  • 1
    A hdf5 file of 150 mb is certainly not huge. You should not need to worry about compression at all in this regime. Could you post the error message you get when attempting to read it in python. Commented Feb 11, 2019 at 10:09
  • @FlorianDrawitsch, thanks for your comment. I am not getting an error but I am also not able to read the the dataset inside my hdf file. My plan is to use the data and plot it and python os running in the backhround without error or data Commented Feb 11, 2019 at 10:31
  • If you are not getting an error, what makes you think the dataset is not read properly then? What is returned for e.g. f[list(f.keys())[0]] Commented Feb 11, 2019 at 11:54
  • @FlorianDrawitsch, executing a_group_key = list(f.keys())[0] ;data1 = list (f1[a_group_key]) takes 2 hours Commented Feb 11, 2019 at 11:59
  • 1
    Please also note that the data1 = list (f1[a_group_key]) command you issue converts the returned data into a list. Please execute exactly what I suggested: f[list(f.keys())[0]] Commented Feb 11, 2019 at 12:18

1 Answer 1

0

Ok as it seems to turn out, this question is more related to "How do I access my data in a hdf5 container in python?".

You find a very good quick start guide here.

The process of accessing your data works like this:

import h5py
f = h5py.File(filename, 'r') 
key = list(f.keys())[0]
dataset = f[key]

# To retrieve e.g. the first 10 elements of a 1D dataset execute
data = dataset[0:9]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.