0

Hi I would like to subclass a pandas dataframe but the subclass of the dataframe will also inherit from a custom class of my own. I want to do this because I would like to make multiple subclassed dataframes, as well as other subclasses (that are not dataframes) that will share properties and methods of this base class.

To begin my base class is

class thing(object):
   def __init__(self, item_location, name):
      self.name = name
      self.file = item_location
      self.directory = os.path.join(*item_location.split(os.path.sep)[0:-1])
   @property
   def name(self):
       return self._name
   @name.setter
   def name(self,val):
       self._name = val

   @property
   def file(self):
       return self._file
   @file.setter
   def file(self,val):
       self._location = val

   @property
   def directory(self):
      return self._directory
   @directory.setter
   def directory(self,val):
      self._directory = val

And now one of my subclasses that will inherit from pandas and thing

class custom_dataframe(thing,pd.DataFrame):
   def __init__(self, *args, **kwargs):
      super(custom_dataframe,self).__init__(*args,**kwargs)

   @property
   def _constructor(self):
      return custom_dataframe

I simply try to make a blank dataframe, and only give it name file location

custom_dataframe('/foobar/foobar/foobar.html','name')

and I get an error

(I cannot post the entire stack trace as its on a computer that is not connected to the internet)

File "<stdin>", line 1, in <module>
File "<path to file with classes>", line x, in __init__
  self.name = name
<a bunch of stuff going through pandas library>
File "<path to pandas generic.py>", line 4372, in __getattr__
  return object.__getattribute__(self,name)
RecursionError: maximum recursion depth exceeded while calling a Python object

I'm using pandas 0.23.4

edit:

changed item_location.split(os.pathsep)[0:-1] to *item_location.split(os.path.sep)[0:-1]

3
  • 1
    See pandas.pydata.org/pandas-docs/stable/development/…. Pay attention to the warning. Commented Sep 25, 2019 at 13:06
  • I've read that, and as I've said I don't want to add my thing properties as _metadata because other subclasses than my dataframe subclasses will also have the same properties. I'd rather not have to change stuff in multiple places. Commented Sep 25, 2019 at 13:27
  • 1
    not have to change stuff in multiple places - composition should work - it may seem too bulky in simple cases but the effect decreases with increasing complexity. Anyway, in your case try to avoid simple attribute names like name, file, etc. Commented Sep 25, 2019 at 13:56

1 Answer 1

1

You stated in the comment section I've read that. However, you did not. That's the source of the problem. Since that describes the steps to subclass pandas dataframe including ways to define original properties.

Consider the followig modification of your code. The key part is _metadata. I removed all the properties from thing class because they inflate amount of original attribute names - they all must be added to the _metadata. Also I added __repr__ method to fix another RecursionError. Finally, I removed directory attribute as it gives me TypeError.

import pandas as pd

class thing(object):

    def __init__(self, item_location, name):
        self.name = name
        self.file = item_location

    def __repr__(self):
        return 'dummy_repr'

class custom_dataframe(thing, pd.DataFrame):

    _metadata = ['name', 'file', 'directory']

    def __init__(self, *args, **kwargs):
        super(custom_dataframe, self).__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return custom_dataframe

if __name__ == '__main__':
    cd = custom_dataframe('/foobar/foobar/foobar.html', 'name')

EDIT. A little bit enhanced version - pretty poor implementation.

import pandas as pd

class thing:

    _metadata = ['name', 'file']

    def __init__(self, item_location, name):
        self.name = name
        self.file = item_location

class custom_dataframe(thing, pd.DataFrame):

    def __init__(self, *args, **kwargs):
        item_location = kwargs.pop('item_location', None)
        name = kwargs.pop('name', None)
        thing.__init__(self, item_location, name)
        pd.DataFrame.__init__(self, *args, **kwargs)

    @property
    def _constructor(self):
        return custom_dataframe

if __name__ == '__main__':

    cd = custom_dataframe(
        {1: [1, 2, 3], 2: [1, 2, 3]},
        item_location='/foobar/foobar/foobar.html',
        name='name')
Sign up to request clarification or add additional context in comments.

8 Comments

I really appreciate your time but how does this help me from not having to add my properties/methods from thing to my custom_dataframe metadata? Maybe I'm being dense, I am newish to python. For the record I was following this article dev.to/pj_trainor/extending-the-pandas-dataframe-133l
@MathWannaBe456 Since you inherit from thing you may move _metadata from custom_dataframe to thing.
So I guess my next question is, how do I handle the inputs to pd.DataFrame? Like I try cd = custom_dataframe('/foobar/foobar/foobar.html', 'name',data={'A':[1,2,3]}) and I get TypeError: __init__() got an unexpected keyword argument 'data'
@MathWannaBe456 Your next question is about inheritance in Python. Not that easy topic especially when you have multiple base classes. So make it right - learn it. Making long story short you should separate arguments for different base classes and then call __init__ methods with appropriate argument sets.
I tried changing the __init__ of custom_dataframe to __init__(self,item_location,name,data=None,index=None,columns=None,copy=None,*args, **kwargs) but i couldn't figure out how to delegate what inputs to what inits of the base classes.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.