When we left last time, I presented a simplified version of the JavaObject and JavaMember classes in Py4J. In Py4J, when a JavaMember is called, Py4J calls the equivalent method on the JVM and when a JavaObject is garbage collected, it is dereferenced on the JVM. Then, I asked asked the question: what is wrong with the way garbage collection is handled:
class JavaObject1(object): def __init__(self, id): self._id = id self._methods = {} self._wr = weakref.ref(self, lambda wr : inc1()) def __getattr__(self, name): if name not in self._methods: self._methods[name] = JavaMember(name) return self._methods[name] class JavaMember(object): def __init__(self, name): self.name = name def __call__(self, *args): j = 0 for i in xrange(1, 10): j += i
The problem comes from the fact that JavaMember does not reference JavaObject1, so in the following statement:
javaObjet.method1()
if javaObject is no longer referenced in the Python program, javaObject could be garbage collected before method1() is called. Indeed, the order of the operations could be:
- get attribute method1()
- decrease reference count of javaObject
- garbage collect javaObject on Python VM
- garbage collect javaObject on Java VM
- call method1.__call__() method
- call javaObject.method1 on Java VM
- Error!!! javaObject no longer exists on the Java VM!
One solution is to make sure that javaObject is never garbage collected until all its methods have been garbage collected too. This is done by adding a reference to JavaObject from JavaMember:
class JavaObject1(object): def __init__(self, id): self._id = id self._methods = {} self._wr = weakref.ref(self, lambda wr : inc1()) def __getattr__(self, name): if name not in self._methods: self._methods[name] = JavaMember(name, self) return self._methods[name] class JavaMember(object): def __init__(self, name, container): self.name = name self.container = container def __call__(self, *args): j = 0 for i in xrange(1, 10): j += i
Now, if we try to run the following test, we see some strange results:
def m1(): for j in xrange(0, 100): java_object = JavaObject1('o' + str(j)) for i in xrange(10000): java_object.method1() if __name__ == '__main__': timer(m1,'With JavaObject1: ') # Returns: With JavaObject1: 1.8906121254 acc1:0
Although there is a circular reference between JavaMember and JavaObject1, this should not be a problem because weak references do not prevent objects to be garbage collected as __del__ methods do. Right?
Well, from the output of the test, we see that acc1 = 0 so the finalizer of JavaObject1 was never called! The reason, and it took me a while to figure this out, is that the finalizer is registered by an instance of JavaObject1 (self._wr), which gets deleted itself before the finalizer has a chance to run. Indeed, a weak reference callback is not invoked if the instance holding the callback is garbage collected before the callback is invoked.
It follows that the instances are really garbage collected (this can be seen by using the gc module), but the finalizers are never called.
The problem, for Py4J, is that the Python VM must tell the Java VM when an object is garbage collected to avoid creating a memory leak. It turns out that there are only two families of solutions, each family bringing its own trade-offs. This is the topic of the next post.
March 15, 2010 at 4:04 pm |
[...] Go to Part 2 of this series. Possibly related posts: (automatically generated)adding methods to singular objects in groovyAOP with SpringObject.finalize() [...]
March 27, 2010 at 7:40 am |
[...] Management and Circular References in Python – Part 3 By Barthelemy Dagenais In the previous post, we saw that finalizers are difficult to write in the presence of circular references, even when [...]