2

I wanted to load the text file borrowed from here, where each line represent a json string like following:

{"overall": 2.0, "verified": true, "reviewTime": "02 4, 2014", "reviewerID": "A1M117A53LEI8", "asin": "7508492919", "reviewerName": "Sharon Williams", "reviewText": "DON'T CARE FOR IT.  GAVE IT AS A GIFT AND THEY WERE OKAY WITH IT.  JUST NOT WHAT I EXPECTED.", "summary": "CASE", "unixReviewTime": 1391472000}

I would like to extract only reviewText and overall feature from the dataset using tensorflow but facing following error.

AttributeError: in user code:

    <ipython-input-4-419019a35c5e>:9 None  *
        line_dataset = line_dataset.map(lambda row: transform(row))
    <ipython-input-4-419019a35c5e>:2 transform  *
        str_example = example.numpy().decode("utf-8")

    AttributeError: 'Tensor' object has no attribute 'numpy'

My code snippet looks like following:

def transform(example):
    str_example = example.numpy().decode("utf-8")
    json_example = json.loads(str_example)
    overall = json_example.get('overall', None)
    text = json_example.get('reviewText', None)
    return (overall, text)

line_dataset = tf.data.TextLineDataset(filenames = [file_path])
line_dataset = line_dataset.map(lambda row: transform(row))
for example in line_dataset.take(5):
    print(example)

I am using tensorflow 2.3.0.

2 Answers 2

3

The input pipeline of a dataset is always traced into a graph (as if you used @tf.function) to make it faster, which means, among other things, that you cannot use .numpy(). You can however use tf.numpy_function to access the data as a NumPy array within the graph:

def transform(example):
    # example will now by a NumPy array
    str_example = example.decode("utf-8")
    json_example = json.loads(str_example)
    overall = json_example.get('overall', None)
    text = json_example.get('reviewText', None)
    return (overall, text)

line_dataset = tf.data.TextLineDataset(filenames = [file_path])
line_dataset = line_dataset.map(
    lambda row: tf.numpy_function(transform, row, (tf.float32, tf.string)))
for example in line_dataset.take(5):
    print(example)
Sign up to request clarification or add additional context in comments.

Comments

1

A bit wordy, but try it like this:

def transform(example):     
    str_example = example.numpy().decode("utf-8")     
    json_example = json.loads(str_example)     
    overall = json_example.get('overall', None)     
    text = json_example.get('reviewText', None)     
    return (overall, text)  

line_dataset = tf.data.TextLineDataset(filenames = [file_path]) 
line_dataset = line_dataset.map(
    lambda input:     
        tf.py_function(transform, [input], (tf.float32, tf.string))
)  
for example in line_dataset.take(5):     
    print(example)

This particular snippet works for any python function, not only the for numpy functions. So, if you need functions like print, input and so on, you can use this. You don't have to know all the details, but if you are interested, please ask me. :)

3 Comments

What is the purpose of decorator used in tf_function?
I modified your snippet to cut it short and my version of your snippet works as well. def transform(example): str_example = example.numpy().decode("utf-8") json_example = json.loads(str_example) overall = json_example.get('overall', None) text = json_example.get('reviewText', None) return (overall, text) line_dataset = tf.data.TextLineDataset(filenames = [file_path]) line_dataset = line_dataset.map(lambda input: tf.py_function(transform, [input], (tf.float32, tf.string))) for example in line_dataset.take(5): print(example)
With your permission, I'll update my answer with your snippet. Your code looks better :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.