Nick Bastin

Nick Bastin header image 1

Python building update…

January 10th, 2010 · No Comments

It turns out this issue had already been reported, at:

http://bugs.python.org/issue4366

The initial (unaccepted) patch in there did the same thing I did in my source tree to make it work, but it was clearly the wrong thing (it had the “wrong thing” smell from the start, really). After a bunch of digging, the reality is that there are a bunch of posix-like platforms for which if you have Py_ENABLE_SHARED set in distutils, put -L. into the LDFLAGS, just FreeBSD isn’t one of them.

So anyhow, as the thread on the issue now states, the seemingly proper fix is to extend the list of platforms which add the current working directory to the list of directories searched by the linker during a build. Barring any last minute complaints, I’ll probably commit this to 2.7 trunk, 3.2 trunk, 2.6 maint and 3.1 maint early next week.

→ No CommentsTags: Python · Software · Software Development

Building Python 2.x on FreeBSD 5.3

January 5th, 2010 · No Comments

Ok, really that should be 2.6.4 and 2.7.1a1, but I’d imagine this problem might exist for more than just those versions. I haven’t tried to build 3.x.

I don’t quite know the exact steps that produce this problem (maybe any/all configure option?), but if you do:

./configure --enable-shared

as is my wont to do, Python happily builds and then barfs all over building modules, something like:


building '_struct' extension
gcc -shared build/temp.freebsd-5.3-RELEASE-i386-2.6/u1/Python/Python-2.6.4/Modules/_struct.o -L/usr/local/lib -lpython2.6 -o build/lib.freebsd-5.3-RELEASE-i386-2.6/_struct.so
/usr/bin/ld: cannot find -lpython2.6
building '_ctypes_test' extension
gcc -shared build/temp.freebsd-5.3-RELEASE-i386-2.6/u1/Python/Python-2.6.4/Modules/_ctypes/_ctypes_test.o -L/usr/local/lib -lpython2.6 -o build/lib.freebsd-5.3-RELEASE-i386-2.6/_ctypes_test.so
/usr/bin/ld: cannot find -lpython2.6

….ad nauseum….

This is because the compiler isn’t looking for where your python2.6 library currently is (even worse if you have a 2.6 in /usr/local/lib and it has different options than the one you’re currently building) – which is in your current working directory (‘.’). The hackish fix to the problem is to edit setup.py in the build root, and add in detect_modules:

add_dir_to_list(self.compiler.library_dirs, '.')

Do this *after* the line to add /usr/local/lib, so that (perhaps non-intuitively) -L. comes before -L/usr/local/lib on the compile command line.

Now if you rebuild everyone will be happy. There may be some sufficient CFLAGS or LDFLAGS jank to set in your environment that has the same effect, but you shouldn’t have to do this – Python ought to know where to find the library it just built if it needs it for further build machinery.

→ No CommentsTags: Python · Rants · Software Development

Oh, slashdot..

December 20th, 2009 · No Comments

Slashdot simply reposted this blog entry at:

http://developers.slashdot.org/story/09/12/20/1433257/The-Environmental-Impact-of-PHP-Compared-To-C-On-Facebook

Passing on an opportunity to bash slashdot for lacking any sort of edited treatment of these stories (if they advertised themselves as a curated RSS aggregator, I wouldn’t even care about this, but they don’t), the source entry itself is not much short of a marketing pitch. It’s written by the people who write (and sell) a C++ web toolkit known as wt (memories of reviews written by game publishers, anyone?).

They obviously have an agenda, and they get a bit overzealous in their pitch. The fastest C++ is almost certainly faster than the fastest PHP – you probably won’t find anyone who would argue with that. However, they make a couple of rather outlandish claims:

  • “assuming a conservative ratio of 10 for the efficiency of C++ versus PHP code” – using a general computing benchmark which contains no code that spends any time doing things you’d find on the web.
  • “As they only say that “the bulk” is running PHP, let’s assume this to be 25 000 of the 30 000.” – Even if 25000 servers are in fact running PHP, it’s ridiculous to assume that 100% of their CPUs are consumed running PHP scripts. One would hope, certainly, that while PHP is used to generate dynamic data, that a lot of that data can be forward-cached and served again.

Certainly without more detailed data from Facebook, all of this is speculation at best. We have no idea what the mix of objects served by Facebook is between static/dynamic/cached. I have no doubt that if you started a reasonable cap and trade system for emissions, data centers might start caring a lot more about how efficient their code is, and this is a discussion that needs to be had. However, it needs to be had responsibly, bounded by the data we have, and not filled with conjecture created merely to sell a competing product.

→ No CommentsTags: Rants · Software Development

Consider your audience…

November 24th, 2009 · No Comments

I generally abhor wasted space on web pages, but some situations are less of a problem than others. As with all things, you should consider your audience. For example, if you’re the Mozilla wiki, you probably ought to consider that your users are likely reading your pages while trying to do something else on their computer, and you should be as considerate as possible when consuming their screen real estate.

It’s not that I mind a wiki-wide table of contents (although when compounded with an article-wide table of contents, it does seem a bit ridiculously placed):

It’s that you’ve now created a huge margin of wasted space below that table of contents:

And the wiki is designed such that the margin is fixed regardless of the width of the page:

Resulting in a workflow where I can only make the content smaller and small on the page in an effort to also see my terminal window, instead of eliding the (now-giant) margin.

Let this serve as a lesson to future authors of websites tailored towards software users and developers – your users are probably trying to use their computer while reading your content.

→ No CommentsTags: Rants · User Interface

And the letter of the day is…

November 2nd, 2009 · No Comments

Children, when writing Python code, remember, bare except clauses are bad. BAD.

→ No CommentsTags: Python · Software

Hiatus…and trac…

October 20th, 2009 · No Comments

I know I haven’t written in a while. I might start again. I might not. I’m really unreliable like that.

However, I would like to take this moment to state that I really, really, hate Trac. Possibly I hate how people deploy it, but the common deployment for the wiki is quite annoying. The search goes to issues and changesets (and the wiki) by default, so then you have to remind yourself to uncheck those boxes and run your search again. Also, it seems not to rank results by anything meaningful – at least, if you search for the exact name of a page title, that page is NOT guaranteed to be the top result. Why!!!

→ No CommentsTags: Rants · Software · Software Development

Python, DTrace, OSX, Oh my!

September 6th, 2009 · No Comments

I’ve been poking around using dtrace on OS X (Snow Leopard) recently and have some interest in using it with Python. Apple provides probes in Python (although they have yet to make it upstream, for reasons that I can’t fathom, given that they’ve been available for years) but I’m having some issues with them.

There is the excellent DTraceToolkit from OpenSolaris which has some nice Python scripts (in $install-dir/Python/), but it appears that Apple’s dtrace is limited in some unfortunate ways (no #define support, for one). Also, when using ‘python$target:::’. dtrace sees probes, but fails to actually report on them – perhaps I don’t understand how to invoke dtrace properly?

I have:

python$target:::function-entry
{
@lines[pid, uid, copyinstr(arg0)] = count();
}

dtrace:::END
{
printf("\n %6s %6s %6s %s\n", "PID", "UID", "FUNCS", "FILE");
printa(" %6d %6d %@6d %s\n", @lines);
}

But all I get is:

sudo dtrace -s test.d -c "/usr/bin/python test.py"

dtrace: script 'test.d' matched 1 probe
dtrace: pid 20908 has exited
CPU ID FUNCTION:NAME
7 2 :END
PID UID FUNCS FILE

Which is troubling…. Even if I use python*:::function-entry, I still can’t seem to use -c:

sudo dtrace -s test.d -c "/usr/bin/python test.py"

dtrace: script 'test.d' matched 3 probes
dtrace: pid 20900 has exited
CPU ID FUNCTION:NAME
2 2 :END
PID UID FUNCS FILE

Scripts which use python*::: and don’t attempt to attach to a given process (via -c) seem to work fine, but that’s less than optimal.

→ No CommentsTags: Python · Software · Software Development

It’s coming…..!

August 24th, 2009 · No Comments

I finally broke down and ordered a Das Keyboard Professional. It has had several good mentions over at StackOverflow and I have had zero luck walking down the aisle of keyboards at MicroCenter/BestBuy/CircuitCity and finding anything that felt better than plastic depressing into Jello. The standard Apple keyboard on my Mac Pro has good action, but the keys have ridiculously short throw – more like modern laptop keyboards. So, hopefully it’ll arrive on wednesday and I won’t have an angry rant about how much it sucks…

→ No CommentsTags: Hardware · Software Development

PyOhio from afar…

July 26th, 2009 · No Comments

Sadly I have the flu, so I haven’t been able to go to PyOhio. Word on the mailing list is that things are going well, and I’m disappointed that I’m not there sprinting, speaking, and generally hanging out with other fun python developers, but infecting other people is regarded as bad form.. :-/

I’ve spent the last few days hacking around on my android phone and I’ll probably have some more to say on that in the near future. At the moment, my only solid impression is that the emulator is quite slow on MacOS X – perhaps this is true on all platforms – almost disappointingly so, and using Eclipse makes me cry.

→ No CommentsTags: Python · Software

Why collections.deque is my new favorite data structure

July 15th, 2009 · No Comments

In the process of unrolling some recursive algorithms to be iterative, I’ve started using collections.deque, and I’ve decided it’s my new favorite general-purpose python data structure. This wasn’t a hard decision – lists are the general utility collection type and they suck, so really anything ordered and mutable was probably going to be better (I almost never need the rotational capabilities of deques, so a somewhat lighter structure would actually be better, like a doubly linked list).

Here’s the fundamental problem: python lists aren’t lists by any standard computer science definition of the word – they’re arrays. I think this should be a surprise to people who don’t know anything about Python. There are at least a few core python developers who have tried to convince me that making the “list” data structure be an array “is just easier for newbies” (along with some rather condescending comments about how it’s obvious that it’s an array since it supports direct indexing). This is complete horseshit. If someone is new to programming, they really don’t care if you call it an array or a vector or a list – they don’t understand what any of those things are. In fact, calling it a list calls up connotations from real life that are not generally accurate, so best not to confuse them anyhow, in my opinion.

You might just think this is a stupid arguments about semantics, but it has much larger implications towards when you would use a given data structure (and choosing the right data structure for a given job is something that most programmers, and certainly new ones, do very poorly – lets not design a language that encourages that practice, eh?). If a list were really a linked list, you would probably take away direct element access (list[35]) – although you might not, there are ways to make some common cases of direct element access reasonably efficient in linked lists – but you’d leave the API otherwise the same. I’m not really sure who this would really hurt (and of course you’d be free to implement vector() or array() to solve that specific problem), since the most common list use case is iteration over some or all of the objects (a cursory examination of the standard library indicates this is true, but I don’t have hard numbers). However, the list API already offers some truly inefficient members – ones which I think might have more common use (at least in my experience) than direct element access. Notably, that member is list.insert(idx, data) – it allows you to insert anywhere you’d like in the list, although in practicality one of the most common places to insert is at position zero. This is incredibly inefficient.

Of course, if you want to always insert on the top of the list then it can be argued that you should just use list.append() – which is very fast – and reverse() or pop off the bottom of list. The problem is when you want to be adding elements to both ends. List is a VERY bad data structure for double-ended operation, especially if you have a lot of items. list.insert() can cause some serious memmove() action when your lists get bigger than 10000 items (more data below). Of course, for that matter even append() can cause serious problems when your collections get to be more than a couple million elements – you can very easily fail to allocate a few megabytes due to memory fragmentation even when you have plenty of memory free, since arrays have to be contiguous blocks. So, I would call that something other than a list, because it very much behaves differently, and bring it in line with other programming texts that people might pick up. You can still make it your primary data structure if you like, but please, please, call it an array or a vector, and write about performance guarantees in the python documentation (for all data types).

So anyhow, because Python doesn’t actually have an efficient doubly-linked list collection, you’re stuck with deque. The implementation is efficient, though, so it’s plenty serviceable as a doubly-linked list if you need a double-ended data structure with efficient insertion at both ends or need to store millions of rows (deques are allocated in blocks, but those blocks can reside anywhere on the heap, making reallocation a relative non-issue).

I made a few tests that mimic my use case and I’ve put the results below. The test is the same code for both lists and deques – just the container type is changed (and list.insert(0, data) is replaced with deque.appendleft(data)). The test creates an empty structure and inserts 100000 elements into it, alternating every other element between appending and inserting on the top. Timings are from the test loop run 10 times, each test is then run 3 times.

Deque: 0.46s, 0.47s, 0.47s
List: 48.86s, 48.22s, 49.62s

I had originally put a million items into each collection, but the list test did not want to complete in anything resembling reasonable time, even alternating between append and insert (insert(0, data) on a 10000 item vector 10 times takes around a second, while using a 100000 item vector takes around 100 seconds). list.append() is slightly faster than deque.append(), but it’s only a couple % (probably due to the efficiency of having a LIST_APPEND opcode), so I continue to use deques anytime I have any kind of medium-to-large data set, even if I’m just appending and iterating.

→ No CommentsTags: Python · Rants · Software