use singleton logic within a classmethod

Question

I am currently using this piece of code :

class FileSystem(metaclass=Singleton):
    """File System manager based on Spark"""

    def __init__(self, spark):
        self._path = spark._jvm.org.apache.hadoop.fs.Path
        self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
            spark._jsc.hadoopConfiguration()
        )

    @classmethod
    def without_spark(cls):
        with Spark() as spark:
            return cls(spark)

My object depends obviously on the Spark object (another object that I created - If you need to see its code, I can add it but I do not think it is required for my current issue).

It can be used in 2 differents ways resulting the same behavior :

fs = FileSystem.without_spark()

# OR

with Spark() as spark:
    fs = FileSystem(spark)

My problem is that, even if FileSystem is a singleton, using the class method without_spark makes me enter (__enter__) the context manager of spark, which lead to a connection to spark cluster, which takes a lot of time. How can I make that the first execution of without_spark do the connection, but the next one only returns the already created instance?

The expected behavior would be something like this :

    @classmethod
    def without_spark(cls):
        if not cls.exists:  # I do not know how to persist this information in the class
            with Spark() as spark:
                return cls(spark)
        else:
            return cls()

It seems that your context manager is not really needed as you will leave the context with the return statement and all subsequent usage of the returned FileSystem instance happens outside of the context. — user2390182
– user2390182, Commented Nov 19, 2021 at 14:33
@user2390182 I know the behavior is a bit strange. But actually, I need spark to be connected when I define my FileSystem. And once disconnected, FileSystem keeps working. They use java objects behind. — Steven
– Steven, Commented Nov 19, 2021 at 14:46

chepner · Accepted Answer · 2021-11-19 15:09:11Z

1

I think you are looking for something like

import contextlib

class FileSystem(metaclass=Singleton):
    """File System manager based on Spark"""

    spark = None


    def __init__(self, spark):
        self._path = spark._jvm.org.apache.hadoop.fs.Path
        self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
            spark._jsc.hadoopConfiguration()
        )

    @classmethod
    def without_spark(cls):
        if cls.spark is None:
            cm = cls.spark = Spark()
        else:
            cm = contextlib.nullcontext(cls.spark)
            
        with cm as s:
            return cls(s)

The first time without_spark is called, a new instance of Spark is created and used as a context manager. Subsequent calls reuse the same Spark instance and use a null context manager.

I believe your approach will work as well; you just need to initialize exists to be False, then set it to True the first (and every, really) time you call the class method.

class FileSystem(metaclass=Singleton):
    """File System manager based on Spark"""

    exists = False

    def __init__(self, spark):
        self._path = spark._jvm.org.apache.hadoop.fs.Path
        self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
            spark._jsc.hadoopConfiguration()
        )

    @classmethod
    def without_spark(cls):
        if not cls.exists:
            cls.exists = True
            with Spark() as spark:
                return cls(spark)
        else:
            return cls()

edited Nov 19, 2021 at 15:09

answered Nov 19, 2021 at 14:34

chepner

538k77 gold badges594 silver badges746 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2390182 Over a year ago

What is the purpose of the context manager when the context is closed with the return statement and all subsequent usages of cls.spark are "out of context"?

chepner Over a year ago

I fixed the name error. I'm almost certainly still botching the intended semantics. (I'm answering this from a strictly formal standpoint, which could be completely wrong.) It's not clear to me what the semantics of the context manager are; the OP's example suggests that fs can be used outside the context manager, but maybe that's not the case.

Steven Over a year ago

@chepner yes, fs can be used outside the context manager. It just need to be inside the context manager when __init__. But your anwser does not work; It still makes me go through with spark each time I call the class method. I want to do the with only once, the first time i use the methode. I made an edit on my post, added a piece of what I would like (but I'm not sure it is easy to understand)

chepner Over a year ago

OK, I've updated the answer to use a null context manager on subsequent calls to without_spark, though I think your approach will work as well: you just need to initialize exists = False when the class is defined, then set cls.exists = True when it is first seen to be false.

Steven Over a year ago

Haaaa ... I tried to do this, but actually, I added the statement self.exists = True in __init__ which was not working because it was only changing the value for the object, not for the class. Stupid mistake. But thanks for the help

user2390182 · Accepted Answer · 2021-11-19 15:05:46Z

0

Can't you make the constructor argument optional, and initiate the Spark lazily, e.g. in a property (or functools.cached_property):

from functools import cached_property

class FileSystem(metaclass=Singleton):
    def __init__(self, spark=None):
        self._spark = spark

    @cached_property
    def spark(self):
        if self._spark:
            return self._spark
        return self._spark := Spark()

    @cached_property
    def path(self):
        return self.spark._jvm.org.apache.hadoop.fs.Path

    @cached_property
    def fs(self):
        with self.spark:
            return self.spark._jvm.org.apache.hadoop.fs.FileSystem.get(
                self.spark._jsc.hadoopConfiguration()
            )

answered Nov 19, 2021 at 15:05

user2390182

73.7k6 gold badges71 silver badges95 bronze badges

2 Comments

Steven Over a year ago

I've never seen cached_property. Need to study a bit to understand.

user2390182 Over a year ago

It's just like a normal property, but only evaluated on the first call, then cached.

Collectives™ on Stack Overflow

use singleton logic within a classmethod

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related