This is it. I've had enough! Seriously, people. OAuth is about maintaining control as a user and everyone wants me to give it up! I'm tired of constantly clicking deny.
What am I complaining about? The constant abuse of Twitter OAuth login. Every site that I've visited that uses Twitter OAuth requires both read and write access to my account. The latest to do this is Paper.li, a service that looks really cool,
but…

So what's the fix? Websites should ask for the minimum amount of information needed to get started. In nearly every single case, the sites are using it for login purposes. Instead of a username and password, you talk to Twitter to verify that you have a legitimate user. Those "Tweet This For Me" buttons are optional add-ons that you can do.
You should handle those automatic cases by performing an upgrade when the user decides they want to allow your application to update for them. Unfortunately, Twitter doesn't allow you to specify which level of access you want when you request a token, you have to do it when you setup your application.
Registering two applications is an easy solution to this problem. You use the read-only application for authentication, then switch to the other app when you're attempting to write. It requires a little overhead when you store the authentication token, but it's trivial to store a flag showing which set of credentials to use.
Honestly, I know most applications are completely trustworthy. Especially those I've found through recommendations of others, but it's still unnerving to give 100% access to my account to a new service for the shear pleasure of being able to login and see if I like it. It should be to you too.
The entire subject of two talks and mentioned in several other, MongoDB was definitely a buzz at TekX this year. It's long been in favor in the tech community in Lawrence and has been used for some data crunching for a few projects at the local paper. Even with all of this exposure, I've yet to sit down and actually explore it.
That changed Friday afternoon while I sat at O'Hare waiting on my flight back to Lawrence (which subsequently got canceled). I installed Mongo earlier in the week and opened up a bunch of tabs on the various intros and tutorials available on the Mongo wiki. The rest of this article a mix of stream-of-conscious as I played around with Mongo for the first time and some of my reflections this past week.
Note on typefaces
I use both Mongo and mongo throughout this article. The first, the
title-case Mongo refers to the software as a whole. Whenever you see mongo
with a lowercase and in monospace, it's referring to the Mongo client program
you run from the command line.
Installation
On a Mac, it's a breeze. I use Homebrew to manage software on my Mac, so a
quick brew install mongodb was all I needed and a minute later I was ready to
go.
Starting Up the Server
Mongo is run by the mongod process. I don't know if it's pronounced
mongo-d or mon-god though. It's a fun play on words if the latter is the
case.
Brew includes a basic configuration to get up and running, so I use that inside
a screen instance so I can leave it running in the background while I use the
mongo tool to interact with it.
Interacting with Mongo
I started out with the basic tutorial to get going. It looks like that needs some love though. It shows the version in the startup as 0.9.8. Homebrew ships with 1.4.2 and I did find a few things that were out of date. No, I' haven't been a good open source community member and submitted fixes yet.
The first thing that's different than a traditional RMDBS with Mongo is that
you don't have to explicitly create a database. Pretty straight forward: from
within mongo, type use <database>. This creates a brand new database for
you and you're off. For the examples below, I'm using use mydb to select
mydb as my database.
It's kind of nice to just be able to connect and go, but it feels odd. Not
good or bad, just odd. Sort of like the first time you run git checkout
inside a repository to switch branches when you're used to Subversion.
The shell feels like a Javascript console. I don't have access to the source code in my off-line mode, so I don't know but that it is. The syntax seems remarkably similar, so it's at least Javascript inspired.
Adding Records
Mongo stores documents, not rows of columns. This distinction allows Mongo to ignore schema—continuing the theme of leaving it up to the developer. Those documents can be made up any number key-values that look remarkably like JSON. Need to store a new data point, just add it as a field to a document and you're set.
Here's an example inspired by Mongo's tutorial for adding a few records:
> person = {name: "Travis Swicegood"}
> city = {city: "Lawrence", state: "KS"}
> db.things.save(person)
> db.things.save(city)
Here I created two new objects with various data attached to them, then saved
them all inside the things collection. Collections in Mongo are like a table
inside the SQL world. You don't have to create a collection, you just declare
it on the db object, and you're set.
Comparing this to the same code in a database, I've got to say I love this. No boilerplate code to get going. I didn't have to create a database, no tables were created. I just started using them. This appeals to my laziness—err, I mean desire for efficiency, but also looks very promising to teach someone new. Every abstract idea you can remove is one less potential stumbling block for someone starting out.
Back to the data I entered. Notice that neither have the same fields.
Collections inside Mongo are made up of a series of keys and values—they
can be whatever you want them to be. This is perfect for lazy migrations:
migrating the data as its requested instead of doing it all at once. ming,
a Python wrapper around Mongo already provides this. This is especially
useful for large sites with lots of data that may or may not ever been
requested again.
Finding Records
Now that the records are there, finding them. The db.things object comes
back now:
> db.things.find()
{ "_id" : ObjectId("4bf9a96b7d04f51b48499011"), "name" : "Travis Swicegood" }
{ "_id" : ObjectId("4bf9a96f7d04f51b48499012"), "city" : "Lawrence", "state" : "KS" }
That gives me everything. The find method takes optional parameters to
filter the results. This is actually a good time to bring up the built-in help
in mongo. Entering only the value of any function (i.e., without calling it)
displays the implementation of the function:
> db.things.find
function (query, fields, limit, skip) {
return new DBQuery(
this._mongo, this._db, this, this._fullName,
this._massageObject(query), fields, limit, skip);
}
Note: I changed the formatting so it's more easily viewable online.
The parameters are optional (like all Javascript function), so you can pass in
as many or as few as you want. Filtering the results is done by providing a
hash for the query parameter (the first one). For example, to find my
record:
> db.things.find({name: "Travis Swicegood"})
{ "_id" : ObjectId("4bf9a96b7d04f51b48499011"),
"name" : "Travis Swicegood" }
One thing you can't do is full-text searching. I can't ask for all of the
records that begin with Travis or have a portion of my name in it. The
current recommendation (at least via the wiki) is to build your own list of
keywords as an array, then search that array. For example:
> var person2 = {name: "Travis Swicegood",
> name_field: ["Travis", "Swicegood"]};
> db.things.save(person2)
> db.things.find({name_field: "Travis"})
{ "_id" : ObjectId("4bf9afa17d04f51b48499014"),
"name" : "Travis Swicegood",
"name_field" : [ "Travis", "Swicegood" ] }
For something like a name, this can be useful. For full-text searching of an article, it's probably best to delegate searching off to something like Solr and let Mongo focus on storage and retrieval.
Querying for sub-objects
Of course, I had to try sub-objects to see if they would work:
> db.things.find({person: person2})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"person" : { "name" : "Travis Swicegood",
"name_field" : [ "Travis", "Swicegood" ],
"_id" : ObjectId("4bf9afa17d04f51b48499014") },
"city" : { "city" : "Lawrence",
"state" : "KS",
"_id" : ObjectId("4bf9a96f7d04f51b48499012") } }
You can also query using the dot-notation to &lquot;reach through&rquot; an object and look at its children. This returns the same result as the previous query:
> db.things.find({"person.name_field": "Travis"})
Limiting returned columns
This ability to dynamically add columns to a record and definitely provides a
breading ground for massive documents with lots of keys. Most of the time a
small subset of those keys are all that's needed. The second parameter in find
provides us with that functionality:
> db.things.find({person: person2}, {city:1})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"city" : { "city" : "Lawrence",
"state" : "KS",
"_id" : ObjectId("4bf9a96f7d04f51b48499012") } }
Likewise, you can reach through the object and pull out a subfield:
> db.things.find({person: person2}, {"city.state":1})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"city" : { "state" : "KS" } }
These examples bring up a syntax thing with Mongo that I'm not crazy about: the use of the number one. It's the standard C style: 1 is true, 0 is false. I'd love to see the client and the libraries adopt an intent revealing name. Granted, this is a minor niggle, but the little things are what make a good system an amazing one.
Few issues
The docs, being that they are community run and Mongo's still relatively new, are a little loose. I've found a bunch of examples looking through them that don't work the way they were documented.
Another potential issue (or at least something you need to be aware of) is that Mongo's geospatial support isn't 100% year. They only provide 2d and the math they use assumes that 1º of longitude is the same at the poles as it is at the equator. For many applications, this isn't a huge issue, but if precision is important, Mongo's not ready for this type of use.
One thing that I'm looking forward to is Mongo's sharding. That is going to allow Mongo to scale horizontally really well. Some of the initial test results look amazing. What will be really interesting is to see how well is scales down. It's one thing to have over 300,000 ops/sec on a bigger box, another thing to be able to manage it on something like a 1gb instance on Rackspace Cloudservers.
Two Biggest Issues
First, Mongo's a master-slave system. It appears really robust, but whenever a box takes on a special role I start to get nervous. One of the promises of &lquot;NoSQL&rquot; is that it provides a tremendous amount of resilience. Any time you start to add special nodes you're taking away from that.
For example, if you're running 5 homogeneous servers and one goes down, the other 4 can pick up the slack—assuming you're not running 5 servers at peak capacity. This makes failure planning easy: figure up the amount of CPU time you need to handle your load, provision that many servers, then add enough servers to be comfortable when they start failing. Need 3 servers, provision 5 and you can have two failures before you peg your machines.
This isn't to say Mongo can't handle failures. It's current model is
rebalancing the load when one of the servers goes out. mongos is the tool to
read up on for handling this. Unfortunately, I haven't been able to dive into
it yet. The only way to know for sure is to build up a cluster then start
killing servers. Of course, this type of testing is preferred for any data
storage system.
Second, the license. I'm not anti-AGPL, but there's some ambiguity. The Mongo
team has addressed this both on the
wiki and through an in-depth
blog post. According to
that, I can write up a service such as MongoHQ and as long as I don't
actually change the mongod or mongos code I'm fine.
On the other hand, most of the definitions I've read of the AGPL mean that code that talks to it is subject to being hit with the AGPL. I don't have any doubts with 10gen, but if they don't always own the copyright…
Of course, those last two paragraphs are with the caveat I am not a lawyer.
I think Mongo is an amazingly compelling piece of software in the non-standard database realm. With the upcoming sharding and what I would have to imagine is an eminent fix to the geospatial queries, Mongo's definitely worth a look.
Link: http://travis.domain51.com/post/496088464/its-about-the-story-stupid-non-profits-online
Originally posted over at the horribly designed travis.domain51.com (see what happens when you give a programmer CSS access), I thought I'd share this post here too since that blog is just getting underway.
It truly is a shame that so many amazing non-profits are hidden behind horribly thought out websites. Most of these sites deluge their visitors with information, even though great sites such as Charity Navigator exist to provide raw statistics and facts about non-profits. The problem is that most non-profits are missing the point. Their websites are there to tell a story.
Let me say it again: a non-profit's website is there to tell a story. Nothing else.
People are natural story tellers and are drawn to an authentic story. Each of the websites linked to above have an amazing story behind them just waiting to be unleashed. A story that engages their visitors and potential donors. A story that sticks with them while they navigate their life the next few days. A story that ends with another beginning. One the visitor is a part of—where they help chose the ending by getting involved and helping that non-profit reach its goals, whether those goals are getting girls back to school to defeat the spiral of poverty in East Africa; feeding the abandoned, mentally ill of India; or helping people afford water while lifting themselves out of poverty.
When I left Ning a year ago this day I set out to figure out how I could increase my impact on the world. Through a series of fortuitous events, I ended up working with non-profits, helping them tell their story online. I need to make sure I remember that.
Everyone has a story to tell. These organizations are trying to change the world with theirs, and we're there to help them.
Please leave your comments on the original post if you'd like to comment on it.
This past few weeks has been crazy hectic. Business is going crazy, I'm in the final stages of launching multiple websites, and I've had a cold for the better part of two weeks. Unfortunately, these things have been conspiring against me and this past week I had to notify PyCon that I won't be able to attend and speak this year.
Please accept my apologies if you were planning on attending. I have a ton of great information together in various forms and as things get back under control I've got a slew of blog posts about testing and web dev that I'll post up. I plan on tagging the posts with the pycon tag if you're interested in keeping up with them.
I spent an hour yesterday afternoon on a conference call organized by the Sunlight Foundation about open government in Kansas. The Sunlight Foundation is an organization whose self-proclaimed mission is to use "cutting-edge technology and ideas to make government transparent and accountable." It was really encouraging to see the interest in open government, but there's lot's to be done.
We have some counties (20 according to the Sunshine Review) that don't even have websites, much less accessible data about their governments. You can't make claims of openness when you're not even presenting basic information about yourself online.
There were also some issues with the Freedom of Information Act requests. The state has to respond within 3 days to tell you when they'll respond, but they haven't been extremely helpful and have even gone so far as to encourage people not to pester them. The old "more flys with honey than vinegar" argument.
One comment I heard in the conversation was an admonition from someone on the ground here in the state (I didn't catch the name) to watch shooting for the stars. He said he was tired of seeing data being hidden behind "well, we want to do this right" instead of just getting information out there. He was advocating scanning things like minutes and making them available that way. I applaud the release early, release often mentality of that approach, but as a tech guy, that scares me.
I want raw data. In my ideal world, I get an API key from the state and can query the databases of any branch of government and get the information that they're responsible for. Scanned JPGs don't give me that. I have to run an OCR on the images and hope it's a font (or in some cases handwritting) that's recognizable by a computer to get any raw data out of it. Forget the semantics in it, those are almost completely lost without manually vetting all of the data.
I don't think this is a matter of people trying to hide data through obscurity, rather, I think it's more a matter of not fully understanding the issues here. How can you expect those 20 counties to understand the difference between "available" and "accessible" when they don't even think a website is important enough to maintain?
One thing that really makes today's conversation interesting, however, is the recent Secretary of State development. This past Monday, the Kansas Secretary of State gave one week's notice that he was resigning his post to pursue a career in the private sector.
The Secretary of State controls all of the public financial records for campaigns. Voter files, campaign contributions, expenditures, all of that is stored behind the firewall at the Secretary of State's office. There's information on the current site about campaigns and the money they raised, but good luck finding it. I dare you. Give that site to someone who's Internet savvy but not familiar with the site and ask them to find the the filings for the Governor's race. I know my way around the site and it still took me 5 minutes to find it. Hint: it's burried under sub-links that are only exposed when you're on certain pages.
What would be amazing to see is someone appointed Secretary of State who gets open government. Who realizes that scanned tiff files aren't "open government." They're a good faith step in the right direction, but they hide the potential that raw data provides. How can I feed that information into a program to analyze it and look for patterns?
I hope the state doesn't squander this opportunity. There's an opportunity to appoint someone who gets open government and who would make it their mission to open the floodgates on the information that the SoS office controls. Once you provide citizens (and journalists) with the raw numbers about campaigns, where money's being spent, where money's coming from, you open yourself up to all sorts of interesting interpretations. You start moving from that "available" column toward the "accessible" column. That's what the open government movement is all about afterall.
It's 37º F (2.7º C) and dropping. It's going to hover near freezing tonight, and come midnight... flip a coin - head's it rains, tails it's dry. And he's out there.
I know he's not my responsibility. But isn't he?
There's a big guy that's homeless here in Lawrence. You know him if you've lived in or around downtown Lawrence. The guy's really big. He started hanging around South Park shortly after the Salvation Army closed their shelter. We noticed him hanging out early in the morning and in the evenings. It wasn't long before we put two and two together. He'd taken up residence. At least over night.
As the temperature started dropping, he and the other few souls that would spend the night in South Park on the benches, in the gazebo, or under the stairs at the gazebo started disappearing. I assumed they'd found shelter somewhere warm, or moved south following the warmer temperatures. Tonight, as it warms up to a balmy 37, he was back.
We noticed him earlier this evening as we walked the dogs. Meg pointed him out as we started in to South Park. He was headed down the red-bud path. As he got closer, we could see the two pieces of cardboard he was carrying. My heart went out to him as I realized what he had done. He'd scavenged up some insulation for the night to come.
Continuing on our walk, he milled around South Park some more. As we finished our loop through the park with our dogs, I saw him lumbering off back toward downtown. "Well, maybe he isn't spending the night in the park," I reassured myself and didn't give it another thought. Meg and carried on our evening, including some pizza and beer at the Oread and a Jayhawks game.
Walking back down the mountain tonight, I couldn't help but scan the park. The benches were all empty, as was the gazebo. Maybe he had found a spot in the shelter. Then I saw under the stairs. His unmistakable girth, under a pile of blankets, huddled up against the gazebo.
I quietly motioned to Meg. Our banter, lively all the way home, died.
After we had passed the gazebo I said, "I know he's not my responsibility. But if he isn't mine, who's is he?"
I don't know what I could have done. I don't know his story. Could I approach him, or is he unstable? Why's he homeless? Is it a "lifestyle choice", as certain organizations would have you believe of the majority of Lawrence's homeless, or is he one of those guys who just had one too many blows and hasn't been able to get back up?
I wonder though, what is my responsibility? What is our responsibility? If you believe in a higher power, when he comes asking "where's the big guy?", are you going to respond "am I my brother's keeper?", or will you response "I am my brother's keeper, he's over here"?
Want to help?
Head over to the Lawrence Community Shelter website. They can use cash or supplies.
17 January
Packaging reuseabe & testable Django apps with virtualenv, pip, and Fabric
Posted by Travis Swicegood
As someone noted the other day on one of my Facebook posts, I've been doing a lot of Python development. I've moved almost entirely to Python for development, web and otherwise. Instead of PHP, I reach for Django when I need to prototype an application quickly.
One of the things I've been struggling with is how to build re-usable
applications that are testable without having the entire Django stack running.
Until recently, I've used buildout to handle this. There's a
djangorecipe for creating a Django repository. I include that, a sample
project, the necessary requirements in a buildout.cfg and away we go.
All was well, until I included two project that had a sample project/
directory. The base Django project couldn't figure out what was what, problems
abound. There are other solutions. I could have set the project (doesn't
this seem like the Misses Bennett in Pride and Prejudice?) variable to change
it on a per-app basis, but that still left me with some problems.
Namely, I don't like the default layout of djangorecipe's Django project. I wanted to change it, but after some digging around in buildout's internals, I realized it wasn't going to be a solution I could live with long term. I'd heard a lot of people (by that, I mean James) state their preference for virtualenv and pip. The separation of concerns (one application for isolation, another for installation) instead of the all-in-one approach of buildout felt better to me, so I started exploring.
And I came up empty.
A lot of people talk about using virutalenv and pip together; pip documents how to install into a virtual environment; but no one talks about how to use everything together with Django. Specifically, no one mentions what to include in your repository. Until now. :-)
Setting up the repository
The most important part of this for me was what to store in the repository. It's simple enough, really. First, you need a requirements.txt file. For most simple Django apps, it contains one line. For example, this is what the requirements.txt file for d51.django.apps.tagging looks like:
Django
The next thing I need is a simple .gitignore file. My mantra is to not
commit anything that I can generate. This means all of the files generated by
virtualenv and pip need to be ignored. I also ignore the swap files created by
Vim (hey, I'm not the only one at the company who uses it, so might as well
ignore it) and I ignore all .pyc and .pyo files. The resulting
.gitignore file looks like:
bin/*
include/*
lib/*
.*.swp
*.py[co]
Now we're ready. Of course, you need to have virtualenv and pip installed, but once you've done that, running tests are pretty simple. First, you have to initialize the environment:
prompt> git clone git://github.com/domain51/d51.django.apps.tagging.git
prompt> cd d51.django.apps.tagging
prompt> virtualenv .
prompt> pip install -E . -r requirements.txt
The observant might notice my call to virtualenv. I've left out the parameter
--no-site-packages. Two reasons. First, I don't keep things like Django
installed at the site-packages level. Second, the things I do install, tools
like Fabric, I want access to them while in the virtual environment without
having to re-install them.
Now that the virtualenv has been initialized, now you need to activate the virtual environment:
prompt> source ./bin/activate
(d51.django.apps.tagging)prompt>
Notice that the prompt changes. It's prefixed with the name of the directory you're in to signify that you're inside virtualenv. Now running the tests are dead simple:
(d51.django.apps.tagging)prompt> python ./run_tests.py
Testing Django apps inside virtualenv
I need audio that plays when you get to this line. That screeching record
player coming to halt. The visual question mark. What's this ./run_tests.py
file you ask? The secret sauce.
Django wants to be setup in order to run. Normally that's requires a project,
settings.py, and a partridge in a pear tree. Unless you call
settings.configure. You can use that to mimic the normal Django settings,
tweaking the settings to match your needs for testing.
For d51.django.apps.tagging, the settings are pretty simple. I need to make
sure that my app is available along with django.contrib.contenttypes since
I make use of the generic relationship code. There's also some cargo culting
required, as Django won't run without a DATABASE_ENGINE specified. The end
result looks like this:
from django.conf import settings
from django.core.management import call_command
def main():
# Dynamically configure the Django settings with the minimum necessary to
# get Django running tests
settings.configure(
INSTALLED_APPS=(
'django.contrib.contenttypes',
'd51.django.apps.tagging',
),
# Django replaces this, but it still wants it. *shrugs*
DATABASE_ENGINE='sqlite3'
)
# Fire off the tests
call_command('test', 'tagging')
if __name__ == '__main__':
main()
Running without activating virtualenv
This works, but requires that you always have virtualenv activated. For
example, if you deactivate virtualenv and try to run the test, you get
an ImportError:
(d51.django.apps.tagging)prompt> deactivate
prompt> python run_tests.py
Traceback (most recent call last):
File "run_tests.py", line 1, in <module>
from django.conf import settings
ImportError: No module named django.conf
You can programmatically activate virtualenv, however, by including this snippet of code in a .py file located in the root of your repository:
execfile('./bin/activate_this.py',
dict(__file__='./bin/activate_this.py'))
You can add that line to the top of the file and execute run_tests.py without
needing to activate the virtualenv before hand. The line needs to go before
the from django.conf line to make sure that Python knows where to find Django
and any other requirements of the test.
This requires that you activate the virtual environment prior to running the
test, or have Django installed at the system level. This can be further
simplified and remove the need to activate and deactivate the environment prior
to test runs by executing the bin/activate_this.py file that virtualenv ships
with.
Making this reusable
There's a lot of code in that run_tests.py file that is going to be
duplicated for every project. Actually, there's only three lines, well, two
lines and one variable in a line, in it that are unique: the two line of
INSTALLED_APPS and the app name to test in the call_command line.
To keep from repeating that for every single project, I created a simple harness for initializing tests. Introducing d51.django.virtualenv.test_runner, a very small package for running Django tests inside virtualenv.
Using this, run_tests.py now looks like:
try:
from d51.django.virtualenv.test_runner import run_tests
except ImportError:
print "Please install d51.django.virtualenv.test_runner to run these tests"
def main():
settings = {
"INSTALLED_APPS": (
"django.contrib.contenttypes",
"d51.django.apps.tagging",
),
}
run_tests(settings, 'tagging')
if __name__ == '__main__':
main()
The first four lines give the user some input when they run the tests without
test harness. That's optional, depending on how user friendly you want to be.
After that, all I do is call run_tests with a settings dictionary and the
name of the app I want to test.
There is one downside, though. You have have it installed outside of your
virtualenv in order to run your tests. Personally, I'm not to worried about it,
as I have it installed, but if you're really paranoid, you could include it in
your requirements.txt file, which would require that the user be inside the
virtualenv to run your tests.
Wrapping it all up in a Fabric cloth
The final step is to make all of this bullet proof is creating a Fabric file that handles all of the initialization and running of the tests for me. For good measure, it should also be capable of cleaning up after itself. I don't need a bazillion copies of Django laying around, afterall.
The end result looks something like this (using Fabric 1.0a):
from fabric.api import local
def test():
"""
Run tests for d51.django.apps.schedules
"""
local("python ./run_tests.py")
def init():
"""
Initialize a virtualenv in which to run tests against this
"""
local("virtualenv .")
local("pip install -E . -r requirements.txt")
def clean():
"""
Remove the cruft created by virtualenv and pip
"""
local("rm -rf bin/ include/ lib/")
Now you can run the initialize the environment, run the tests, and clean up after yourself with three commands:
prompt> fab init
prompt> fab test
prompt> fab clean
One drawback to this method, Fabric's local command swallows the output
of the test. This isn't a problem until you have a failure. local does
contain a capture parameter, but it doesn't display the output from an
failed command. That's fixable, but for the time being, my recommendation
is to use Fabric as your quick sanity check, but rely on straight
python ./run_tests.py for your real testing.
Conclusion
That brings us to the end of our quick tour. Hopefully this provides you with the information you need to get started using virtualenv and pip with Django. It's not that complicated. Actually, once you have your bearings, it's downright easy. The problem is more that people who have trodden down this path haven't documented their way. Hopefully, this post helps serve as a rough map.
Thanks
I'd like to thanks James Bennett for illuminating a few pieces of this
puzzle for me (in particular, pointing me toward settings.configure) and his
preview a draft of the article before I posted. I'd also like to thank
Jeff Triplett for his comments on a draft and pointing out an unexplained
inconsistency with the rest of the world's examples of virtualenv. And, as
always, my good buddy Roder for his constant encouragement.
22 December
The problem with Python namespaces modules (or, Python Namespaces. There be dragons this way.)
Posted by Travis Swicegood
Yesterday I lamented the issues with namespaces in Python. It's not really the namespaces, it's the marketing of namespaces. Newbies to the community (something I still consider myself for most purposes) are drawn to modules thinking that there's a one-to-one relationship between file hierarchy and namespaces. And there is. Well, sort of.
You have to read the entire manual or happen to have someone to point out the difference between namespaces and modules to even realize there is a difference. Under most circumstances, you won't even realize they are different until you start to do something slightly complex. Say, for example, building an application with multiple modules inside a similar namespace, each module in a separate repository and its own history. There needs to be a "there be dragons" warning to let people know.
Those dragons are such: you have two paths inside sys.path that contain similar code. Such as I do with all of the Domain51 code. Package one has foo as a module while package two contains bar. The assumption would be that Python acts like most other languages and would exhaust it's sys.path trying to find both packages, but that's not the case. It'll get to the first one, then pretend the second doesn't exist.
The fix for this is the explicitness, one of Python's cardinal virtues. You have to declare the namespace in order for it to work. As you don't see this hardly anywhere in Python because Pythonistas feel that namespaces are a bad idea, here's the code you need to include in your __init__.py file to make it declare itself as a namespace:
import pkg_resources
pkg_resources.declare_namespace(__name__)
That's it. Now Python becomes smart again, and you can have real namespaces with similar directory structures existing side-by-side. Python is perfectly capable of finding them - now. Which raises an interesting question: why does Python scan the entire sys.path looking for files, building up this list of what declares what namespaces and where only to ignore it later unless they're explicit about it? I haven't dove into the source to be sure, but it seems it would have to scan the __init__.py files in order to know whether there's something there.
But I digress. There's a bigger dragon that's not even hinted at. Python's inability to find modules.
Take, for example, my python-stupidity repository on GitHub. Run the test.py file and you can see the error for yourself. There are two barfoo modules within the path, but Python decides to act the villagedolt and stop as soon as it hits the first one that might match. This particular case is caused by foobar trying to import a method from barfoo that doesn't exist in foobar.barfoo
This is, in my opinion, a huge issue. Note that foobar.barfoo declared it's namespace. It said loudly, "I am me", and Python ignored that fact in favor of relative includes. Not only that, but it stopped and started pouting as soon as one module that said it was foobar.barfoo couldn't match.
Why not finishing looking through the rest of the sys.path? Why not pay attention to that precious declaration Python wants you to add to explicitly become a namespace?
Like almost all problems with programming languages, however, there is a fix. At first glance, I thought it might be the way PHP handled it - just include a separator at the beginning of the import. That didn't work, but in searching for the solution, I found out that Python supports relative imports through it's support of intra package references. The fix within the foobar module is to do from ..barfoo import base_barfoo, but this only covers you if you're in Python 2.5 or later.
According to my understanding of it, you use it to explicitly say I want a sibling module names X without having to declare the entire namespace or accidentally picking up a module from the global namespace. Fair enough, but my solution to the problem above is to use the relative import to trick Python into thinking it couldn't find a module named the same.
You can see the code in the d51.django.auth package. I have a d51.django.auth.facebook module which takes precedent over PyFacebook's facebook module, but only inside the d51.django.auth.
I'm not saying that namespaces are a bad idea in Python or any other language. I'll gladly take namespaces, in any form I can get them and use them. They provide a great way to segregate code into small, independent, re-usable packages while continuing to say "I'm from over here." They allow facebook to be used as a module in multiple places without causing an issue, other than the ones listed here.
No, my problem is not with namespaces. My problem is with Python's current method for searching for them; it's lack of exposing namespaces and modules and their differences up front, and it's brain dead way of halting on the first partial hit. I'm amazed that a language that prides itself on explicitness—on not doing anything that's not asked for—decides that it's ok to stop looking for matching code just because it found one thing that doesn't match. It smacks of premature optimization.
Keith Casey's recent post on a book recommendations got me thinking. I get asked what books I recommend. I whole-heartedly agree with his first recommendation. The Pragmatic Programmer sits on my desk and is often in my laptop bag. I reach for it if I have five minutes and want to flip through something technical without having to load up my Google Reader.
Not on his list is the Passionate Programmer. I highly recommend this book to anyone working in the industry. I've reviewed it on Amazon, feel free to check that out for more information.
But this post isn't about what I recommend, it's what I don't. There are two books that Keith recommends that I no longer recommend off the bat to programmers. PEAA and Refactoring. Both are excellent books, but neither should be read by people starting out.
Why? The books are both excellent. I've read both. I own both. I do recommend them, some of the time. Both are geared toward programmers that already know how to program. Both contain a wealth of information which, in the right hands, can take the development of the reader to the next level.
But they also contain information that can be abused tremendously. For example, Active Record in it's abused, single-world form of ActiveRecord has caused a tremendous about of scaling issues. All because it was a Fowler endorsed method for dealing with data in a database. Developers without an understanding of its limitations started copying what Rails did and in the process mixed business logic and database logic into the same object. What happens when your data needs to be stored in a flat file or a nosql style database? Your business logic is now all tied up in your database and its painful to extract that and retrofit your application to be scalable.
Of course, since you have Refactoring you can fix that. Right? Well, sort of. Patterns and refactoring go hand in hand. Ideally, refactoring code should lead to some more understandable, reusable code; something that's more easily explained to another developer. Patterns fill the bill. Nowhere in Refactoring or Joshua Kerievsky's Refactoring to Patterns will you find anything about pattern hoping. To fix an improper use of ActiveRecord by moving to another pattern, however, you end up doing exactly that.
Couple that with the sweater string issue. We've all had a loose string on a sweater, right? You give it a tug instinctively to get rid of it, and before you know it a hem in gone and the sweater is ruined. Refactoring, in the wrong hands, causes the same thing.
Well, I need to extract this logic out. Next this piece needs to be mutable. I should inject this object so I can replace it out later if I need to. And so on, and so on, ad infinitum.
Both patterns and refactoring are powerful tools, but not in the hands of a novice. Both books should be read, but not when you're starting out. Get your hands on Pragmatic and Passionate, spend some time looking for a mentor to help guide you down the right path. Once you get your feet under you as a programmer, then start learning about the mechanics of patterns and refactoring. Once you're to that point, PEAA and Refactoring are great material.
Wow! Has it already been nearly a week since ZendCon wrapped up?! Time is flying right now, and with IPC and the holidays right around the corner, I don't see it getting any less hectic.
ZendCon was fun. This was my first time attending/presenting this particular conference. It had a distinctly different feel too it than the conferences put on by Marco and company. ZC felt more corporate than tek does, but that's not necessarily a bad thing. We tend to forget that if it weren't for the business interests at play none of us would have jobs.
I made it to only a handful of talks, but everything I attended was top-notch. I'd highly recommend checking out Josh Holmes' The Lost Art of Simplicity talk. He presented it in the uncon, but accept it as a keynote if he submits it to your conference.
Speaking of Microsoft (Josh works for them) they stepped up in a huge way Wednesday night. The topic of what to do after the Ask Zend session that night and the idea of heading up to San Francisco came up. Josh jumped into gear and not only got us into the Bing party that was going on in conjunction with Web 2.0, but also scored us all copies of Windows 7 and a party bus to get us there and back!
Microsoft catches a lot of crap from the open-source community, some of it warranted, but they're heading in the right direction as long as they keep hiring guys like Josh and making them the face of Microsoft. The company will have no choice to change if it's made up guys who are really taking the blue monster to heart.
The cynic in me has to ask if this isn't just a long-view version of embrace, extend, and extinguish via slowly causing liver failure in the main contributors to open source! :-D
I also finally got a Fork You shirt from the guys over at GitHub. I'm thinking of forking the shirt design and updating it to say:
Fork Me
On the Git front, I might have made Elizabeth M. Smith mad enough at the state of the current Git Windows tools to actually do something about it. She started hacking a bit on it at the conference, and I'm sure we'll see something by the end of the year (right, Liz?).
Next up for me (and my final conference of 2009) is the International PHP Conference in Germany. I'll be giving a talk titled Building Real-Time Applications using XMPP. Meg is tagging along and we're going to see some of southern Germany after the conference. Definitely looking forward to the time off on another continent.
