Thursday, April 4, 2013

Super-wrong

Every year for the last couple of years, at or around PyCon US, I run into code that uses super(type(self), self)(or the equivalent super(self.__class__, self)). I don't go looking for it, it's just coincidence. Every year, at PyCon US, I intend to give a little lightning talk about why this is wrong, and how super-wrong it is, and perhaps shame a bunch of big-name projects that show up in a codesearch. (Originally Google Code Search, but as +Aaron Gallagher, who reviewed this article, pointed out to me, Github's search makes a good enough replacement.) Every year I look at the long, long signup sheet for the lightning talks and I think "oh, why bother, everyone knows this is wrong anyway" and leave my hypothetical spot for something much more awesome.

And every year, shortly after PyCon US, I regret my decision. This year it was at dinner during the sprint days, when I mentioned the super-wrongness of super(type(self), self) and someone who shall remain anonymous said "wait, what? I was always told to do it that way." So, clearly, I need to say this explicitly.


If you see super(type(self), self), the code is doing it wrong. It is a bug. It is super-wrong.


I understand why people do it super-wrong. super(ThisClass, self) is annoyingly verbose and repeats the class name and it doesn't work when you do things like rebind the ThisClass name. But super(type(self), self) has a bigger problem: it does the wrong thing for all subclasses of your class. In fact, if I see code that uses super(type(self), self), it tells me two things: there's a bug in the code and a glaring gap in the tests.

How is it broken, you ask? If you use super() super-wrong, your class will appear to work:

>>> class C(object):
...     def spam(self):
...         pass
...
>>> class D(C):
...     def spam(self):
...         print "In D.spam"
...         super(type(self), self).spam()
...
>>> D().spam()
In D.spam
>>>

But as soon as you involve a subclass, even one that doesn't define the spam method, it will fail:

>>> class E(D):
...     pass
...
>>> E().spam()
In D.spam
In D.spam
[...]
In D.spam
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in spam
  File "<stdin>", line 4, in spam
[...]
  File "<stdin>", line 4, in spam
RuntimeError: maximum recursion depth exceeded while calling a Python object

So why does it break? super() is a way of telling Python "call the method that would have been called had this class not defined it". It does that by giving you a proxy object (an instance of the super type) that continues attribute lookup, starting from just after the class you give as first argument. (If you want the full story on how attribute lookup works, +Guido van Rossum gave a nice explanation a while ago.)

So super(ThisClass, self).spam() says "skipping just past ThisClass, call the spam attribute of self." And super(type(self), self).spam() says "skipping just past the actual class of self, call the spam attribute of self." If type(self) is ThisClass, that is the same thing. If type(self) happens to be a subclass of ThisClass, this will end up calling the same method again, and again, and again, and your code will recurse until Python stops you.

Of course, this is all assuming you're using super() to call the same method in one of the baseclasses. There are other ways of using super() incorrectly. For example, if you're using super() to call another method altogether, you should really ask yourself why that would need to skip just the first class in the method resolution order, regardless of what it is -- and if you come up with a good answer, please tell me; I'd be interested to find cases where this isn't just a plain bug. If you're using super() in situations where you know for a fact that the class will never be subclassed (something I've never managed to know in any of the code I wrote) you should ask yourself why you're using super() at all, and what hoops you want to jump through when you change your mind. I'll bet it's not going to be worth the minor short-term convenience of not repeating a class name.

Bottom line, if you want to use super() -- and you should -- then you can't pass type(self) or self.__class__ as the first argument. If you aren't using Python 3 (in which you can just use super() with no arguments), that means repeating the name of the class. There are things you could do that aren't obviously wrong, like the self.__super trick Guido suggested here, but they're complicated and still fragile and frankly just not worth it. Or you could decide not to use super() at all, but then you're back to repeating the baseclass name(s) instead and nobody wants that. Using super() the most straightforward way, despite the repetition of the class name, is by far the best solution.

Now let's fix all the code, so I won't have a reason to give a lightning talk next year.