0

Could you please help me with the memory?

I got below MemoryError.

File "pandas_libs\algos_common_helper.pxi", line 361, in pandas._libs.algos.ensure_int64 MemoryError

Then, I output all the memory of all variables using the below codes:

def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

print('Memory size of each Varaible:')
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in locals().items()),
                         key= lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

Memory size of each Varaible:
                      df_baker: 572.6 MiB
                       df_hall: 37.5 MiB
                 df_WSGT_baker: 12.1 KiB
                  df_B12_baker: 12.1 KiB
                  df_WSGT_hall:  7.7 KiB
                   df_B12_hall:  7.7 KiB
                      __file__:  178.0 B
               __annotations__:  136.0 B
                         MyWho:   72.0 B
                    sizeof_fmt:   72.0 B

The largest memory is only 570MB for the Pandas dataframe df_baker. And I have 5GB memory. so why I have a Memory Error? Thanks for your help. I appreciate it.

1
  • I agree with the answer below, and it would be helpful to see your code to debug this error rather than the memory dump of variables. You could be doing something that loads things to memory, but not in a variable you created directly. Commented Jan 8, 2020 at 16:52

3 Answers 3

2

I don't think you measure what you think you measure.

To quote the docs for sys.getsizeof():

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

(Emph. mine)

So anything that is not referenced directly from locals(), everything that functions allocate internally, is not shown here.

There are other tools to look at Python heap. Understanding memory consumpltion when the reference graph is not a tree is not easy, and I bet there are lots of multiple links to the same objects on the heap.

Anyway, the ground truth is the RSS size of your Python process (or its equivalent in Windows). This is the amount actually allocated, including everything intermediate, malloc'd by C code (which is plentiful in pandas / numpy), etc.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. how to RSS size of the Python process as you said? Thanks
In your terminal under Linux or macOS, run top, locate the process you're running (likely python + something), look at the RSS column. Or run a GUI utility that shows running tasks, do the same in it. Sorting by RSS helps easily find the heaviest processes. RSS is the amount of RAM currently used by the process ("resident size"), as opposed to VSS, which includes virtual memory not necessarily occupying any physical RAM at a particular moment. (IDK about Windows; Task Manager shows numbers in a way I don't always can make sense of, should be helpful, too, just needs reading some docs.)
Thank you 9000 so much!
1

To display RSS memory in MegaBytes at any point in a program this function could be used after import psutil, os

def usage():
    process = psutil.Process(os.getpid())
    return f'{process.memory_info()[0] / float(2 ** 20):,.1f}'  + ' MB'

Comments

1

Pandas gives you the memory usage you are after, including memory used by python objects in your columns, commonly strings. As an example

> df = pd.DataFrame({'a': ['a' * 100] * 100})
> sys.getsizeof(df)
15852
> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 1 columns):
a    100 non-null object
dtypes: object(1)
memory usage: 15.5 KB

The call to info gets you the total memory; you need the memory_usage= parameter since the default arguments give you the shallow memory usage not accounting for the strings.

1 Comment

Thanks. df.info(memory_usage='deep') give me same memory information as sys.getsizeof(df).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.