Does IronPython need LINQ?
Before I get started, I think I need to clear up some misconceptions that the Python community might have about LINQ. From the posts I’ve read, Python developers have dismissed LINQ as only a series of language enhancements that allow for SQL-like querying of lists. Though it is true that both C# and VB.NET have made changes to their respective languages for working with lists, you can still use LINQ without the aforementioned language enhancements. Python developers would be better served to think of LINQ as wrappers around the map, filter and reduce functions. In fact, LINQ is not much different than the “recipes” found in the official itertools documentation.
So the question still stands: does IronPython need LINQ? In my last post, I described how to use LINQ from an IronPython application and implied that I didn’t think it was the best idea. Don’t get me wrong, I really enjoy writing LINQ statements, I just don’t think LINQ fits well with IronPython. Why not? The first problem I have with the IronPython/LINQ combination is put on prominent display in this code sample.
print Enumerable.Average[object](
list, Func[object, int](lambda x:x))
If you are a static language aficionado, you probably are wondering what the problem is. If you’re a Python developer, however, the fact that I’m explicitly naming the “object” and “int” types is sending up giant, red flags. Python folks don’t really care what type of object it is they’re using, they only want to know what the object can do and what it contains (i.e., duck-typing). It’s much easier – and more readable - to simply write a completely Python version of the Average function. Consider the following chunk of LINQ code from C#:
List<string> values = new List<string>("the quick brown fox jumped over the lazy dogs".Split());
double avgLengthOfLongWords = values
.Select(x => x.Length)
.Where(x => x > 3)
.Average();Console.WriteLine(avgLengthOfLongWords)
To convert that to IronPython, we’d be forced to write something like the following:
import clr
clr.AddReferenceToFileAndPath("c:\users\darrell\desktop\System.Core.dll")import System.Linq
from System import Func
from System.Linq import Enumerablestuff = "the quick brown fox jumped over the lazy dogs".split()
def Select(col, func):
return Enumerable.Select[object, object](col, Func[object, object](func))
def Where(col, func):
return Enumerable.Where[object](col, Func[object, bool](func))
def Average(col):
return Enumerable.Average[object](
col, Func[object, float](lambda x:x))avgLengthOfLongWords = Average(Where(Select(stuff, lambda x:len(x)), lambda x:x > 3))
print avgLengthOfLongWords
Before I’m even able to write useable code, I need to add a reference to a dll, import a namespace and a couple of classes and then write some helper functions; all so that I can write LINQ statements without the benefits of chaining. That’s right – I have to nest my LINQ statements. No chaining. That’s because when we use IronPython, the Select and Where don’t return an IEnumerable or an IQueryable. Instead they returna WhereSelectEnumerableIterator.
But do you know what really rankles me about using LINQ in IronPython? It contributes absolutely nothing to the Python community. The only way you can use LINQ is if you are using IronPython, not CPython, not Jython. I want to engage the entire Python community, not just my corner of the world. If ever there was a case where implementing the principle trumps migrating the implementation, this is it.
Enter the itertools module. Itertools is a Python standard library module containing tools for working with iterators. It also will serve as the cornerstone for my implementation of LINQ. When you visit the official itertools documentation at python.org, you’re even presented with code “recipes” that could be described as LINQ-ish. The only problem is that the recipes only accomodate nesting of statements. Why is this a problem? Consider the differences in readability between nested and chained in the following example:
stuff = "the quick brown fox jumped over the lazy dogs".split()
getLength = lambda x:len(x)
isNotTooLong = lambda x:x<6
isNotTooShort = lambda x:x>3nested = ifilter(isNotTooShort, ifilter(isNotTooLong, imap(getLength, stuff)))
chained = IterHelper(stuff).select(getLength).where(isNotTooLong).where(isNotTooShort)
As always is the case with nesting methods, you have to read the statement populating nested backwards. That is, the outermost function call is the last to be run and you work your way inwards from there to find the origin. This is not a big deal to most of us since this is something we had to get used to very early in our careers. But when you compare that to the statement populating chained, I know I can read it from left to right. My eyes aren’t jumping back and forth to understand what is happening in what should be a relatively simple statement.
So what does IterHelper look like?
class IterHelper(object):
def __init__(self, someIterable):
self.someIterable = someIterable
def __iter__(self):
return self.someIterable
def next(self):
return self.someIterable.next()
def select(self, func):
return IterHelper(itertools.imap(func, self.someIterable))
def where(self, predicate):
return IterHelper(itertools.ifilter(predicate, self.someIterable))
This extremely simplistic version of IterHelper is really only a wrapper around functions found in the itertools module. This is great news since any application written using Python 2.3 or later can make use of this pattern with little effort. There’s no special libraries AND everyone is invited regardless of your flavor of Python.
My plan is to release IterHelper as an open source project by sometime by mid-October. I have a well-tested, basic implementation now, but I still have questions regarding licensing, hosting and deployment. I will be doing a bit of research on these topics and be posting what I’ve learned in my Notes to Self series.
Labels: .NET, IronPython, Python





