Pragmatic Guide to Git
My latest book on Git is available as a beta book and for pre-order
My latest book on Git, Pragmatic Guide to Git goes on sale today! I’m really excited about this new book. It’s the culmination of a year’s worth of work on an entirely new format for Pragmatic Bookshelf.
From today’s announcement:
Last summer, Pragmatic author Travis Swicegood proposed a new kind of book covering the popular version control system, Git. We thought it was a good idea–so much so that we’ve launched a whole new series in this format. Check out the details on the series below, as well as Travis’ new book, Pragmatic Guide to Git, now available in beta.
Having a good idea is cool. Having a good idea that you can get other people excited about too is even better. I thought I was onto something interesting when I pitched the idea to Prag, and so did everyone else there. A year later and here we are with an entire new series lined up.
Here’s the quick description of the new book:
Need to learn how to wrap your head around Git, but don’t need a lot of hand holding? Grab this book if you’re new to Git, not to the world of programming. Git tasks displayed on two-page spreads provide all the context you need, without the extra fluff.
Visualization
How to make meaning out of raw numbers
David McCandless gave an excellent TED talk on visualization of data. The take-away quote from this was:
Data is the new soil
I love that line. Data gives rise to new ways of understanding something. Presenting something spatially gives you the ability to put it in context more quickly than trying to explain the context.
Take 20 minutes, this video is well worth the time investment:
Razors and Development
Story of my razors
A few years ago I switched to an old fashion safety razor and haven’t looked back. The latest entry into the razor market has reaffirmed my decision as the right one.
The new Pro Glide from Gillette only costs less than $10 to purchase. Good deal, right? Nope. The replacement blades cost $3-$4 each! Assuming you get a few weeks out of each blade, you’re looking at paying between $6 and $10 every month to use this razor blade.
It’s a great deal—for Gillette.
I use Merkur razor. I paid a lot, comparatively speaking, up front but I can buy better quality razors for less than $0.75 each. They last a lot longer and I end up with a much better shave.
I view the trade-off here as the same one you have to look at when deciding what framework you choose to develop your code in. There are a lot of frameworks that provide a lot of help getting off of the ground. It almost seems too easy.
Write your on custom blog in 5 minutes? Sure! Want to have a RESTful API? Add a couple of classes, some new routes, mark as complete.
Look at the framework and read some of the comments from its detractors. Those complaining generally have one of two problems:
- They’re going to complain about anything, they’re just ranting. Ignore these people.
- They’ve hit a legitimate pain point in the framework where they deviated too far from the intended use. Pay attention to what these people are talking about.
If you’re application is significantly complex, no off-the-shelf framework is going to do everything you need it to. Some frameworks may even get in the way. Make sure you realize the trade-offs before you commit.
What makes a good framework?
The best ones serve as scaffolding—in the original meaning.
… a temporary structure used to support people and material in the construction or repair of buildings and other large structures.
Put another way for software development:
… helps you ramp up quickly, then gets out of the way.
Historically, frameworks manage the first part of this well. That’s where they shine. It’s the last part that they’ve had a problem with.
Django manages both of these well. My one complaint with it is that it manages the latter part better than the first. There’s a lot of boilerplate needed to get started, but I can live with that. When my applications outgrow Django, removing Django from the equation is easy with one exception.
Models.
Models are like your razor’s blades. Without blades, your razor doesn’t shave; without models your application doesn’t have any data to work with. The fix I’ve found works best for me is to keep my models then and put all of my logic for operating on them in other areas of the code base.
This separation helps me keep my business logic portable. I might be using the cheap route to get started, but the heavy lifting goes with me if I decide something else is a better fit.
The Gamepocalypse
Jesse Schell on our future corporate gaming overlords
Good, or bad? Jesse Schell envisions a future where every aspect of our daily lives are rewarded by corporate gaming overlords.
Honestly, how different is this than the current frequent shopper cards? Our rewards aren’t points (at least all of the time), it’s lower prices. A lot of our credit cards are already points based. “Get a million points, fly anywhere.”
Stances
Musing on writing style and company cultures
Yesterday (essentially), I woke up and decide to stir the pot a little bit. I had come across this post while looking into the company behind MapBox. As I noted, I do this to get a feel for what the company is doing. What technology is important to them? What do they value? Are they funded, and so on, and so on.
That particular job posting rubbed me the wrong way, however, because of it’s emphatic no telecommute policy, with them going so far as to call attention to it with a bold font. To automatically shut the door to any potential employee who might be an amazing fit, just not in their particular office seems extremely one-sided.
I woke up ready to write, and did. Before writing my first book, my post would have been rather mellow. I would have presented the case, but not gone quite so far in making my case. This post, I decided to let it rip. Companies that didn’t allow telecommuting don’t get it, end of discussion.
And boy did it work. Nearly 90 comments—between Hacker News and my blog—later, I’m still amazed at the fervor with which both sides attacked the issue. Almost to a T, no one was in the middle on this. Everyone had an opinion. They loved it, or they hated it. Everything from management types saying how their teams wouldn’t be the same if IRC was their only interaction to people with severe social anxiety talking about how telecommuting affords them the opportunity to deal with that anxiety by focusing on their job, not their proximity to people.
I like to mine the edges of the conversation to get the bigger picture. My post emphasized the importance of taking a stand. Had I not been so opinionated, I doubt much interested would have been given to the post.
Honestly, the post is a little harsher than my personal opinion on the subject. Companies that start the conversation with prospective employees by outlining the things they aren’t going to stand for are starting off on the wrong foot, but beyond that I realize that telecommuting does present challenges in some environments and some companies—at their loss—aren’t willing to try to overcome those challenges.
A few months back a recruiter—the flaky kind it turns out—was talking with me about a gig. I was honest that I wasn’t looking to move, but for the right opportunity every thing was on the table. I told him:
I want to be up front with you. I’m not looking to move, but I’m not ruling it out. All I ask for in return is that you approach this with the same open-mindedness. Let’s continue the process, see if its a good fit, then figure out if we can make the logistics work.
Needless to say, they didn’t. Companies, particularly in the tech space, are asking their potential employees to take a chance on them. Companies are taking a risk too, but when companies expect the employees to be the only ones giving (broadly disregarding monetary compensation from the companies for the moment), they set the wrong tone.
Telecommuting Culture
Cutting through job postings
A few years back, I’m not even sure when, I started looking at job postings of companies I found interesting. The point wasn’t to find a new job, but to understand the company a bit more.
You can almost always discover what technology a company is using if they’re hiring. I found out Plurk uses Python through this method, that Twitter hasn’t given up on Ruby, and even when it doesn’t make sense Washington DC shops still use Drupal.
That last company is what gave rise to this tweet:

They’re pretty blunt about telecommuting, actually. Going so far as to say ”it is not ok to telecommute.” The emphasis is theirs (as seen in this Skitch). This gives me a couple of possible explanations:
- They’ve been burned in the past. They hired someone on who misrepresented himself, his abilities, his capacity, or all of the above. They feel that they gave it a try and it isn’t for them, so they’re not going down that road again. This is the most generous possibility.
- They don’t have their heads screwed on right. They live in a “constantly evolving and fast-paced” environment (as seen on other job postings) which translates roughly into we can’t control our product team, our CEO, or worst, what our sales team promises, so you’re going to have to sit by and wait for the hour-by-hour priorities. Communicating them by any other means than directly from our mouth to your ass, sitting at a desk waiting to turn our ideas into money via that magical electronic device in front of you is too inefficient.
- A slight variation of the previous option is that they lack the confidence in their abilities. They have the vision, the product specs are nailed down, but they don’t know if they can convey them without “reading” the other person.
- Finally, the last possible option is that they’re so out of touch with reality that they believe the only way for work to be done is with “asses in seats.”
Of these options, shy away—no, run, very quickly, away from the 2nd and 4th. Those companies don’t get it. The other two provide some hope, it’s up to as to whether to stick around (or start) and find out.
Companies that don’t allow telecommuting, especially companies that are up front about it, don’t get it. Telecommuting is no longer a technical challenge. Reliable high-speed Internet is ubiquitous at this point. I can see adding that as a requirement to a job posting, but not where you’re located.
With the technical out of the way, that leaves only the social aspect of the company and the position they’re hiring for. I get the culture that requires someone on site. I also realize I’m not interested in working for a company with that culture.
It’s the culture that specs and requirements are fluid based on the latest hallway session. It says interruptions are a part of the day that you should be used to—we like to chat a lot. It’s the we don’t have any realistic metrics to use to judge your performance, so we can only know if you’re doing work by whether or not you show up. That last one has worked so well in the past.
The world’s a big place. There’s a lot of developers in it. There’s even a few kick ass ones. Most of those don’t live near you—even if “this position is located at our offices in Palo Alto,” which is another euphemism for no telecommute. By being up front about telecommuting not being and option, you’re telling me that company comes first (we go to the mountain). Hiring someone good is important, but hiring the best you can afford isn’t.
A company that doesn’t go out of its way to build the best team possible, regardless of where they’re located, isn’t a company I want to work for. It’s a matter of priorities. I’ve got mine straight, they don’t have theirs.
Critical Thinking
Musing on the Twitter Generation
By now, I hope we’ve all read Is Google Making Us Stupid from The Atlantic. Here’s the quick version, which ironically makes the author’s point:
People are changing the way they interact with the world, which is changing the way their brains model the world, which is changing their brains. We’re not exercising the critical, reflective part of our brains; instead, we’re using almost exclusively the parts that focus on quick, immediate activities. We’re skimming everything rather than thinking more critically, more reflectively.
The reaction to the Shirley Sherrod case is an excellent example of this. Rather than step back, the NAACP and White House jumped on it and stepped in the heaping pile of burning feces that was laid at their doorstep.
So what’s this got to do with thinking and Twitter and such? Well, it’s the reason I’ve started blogging again. I don’t want to lose my ability, what little it sometimes appears to be, of critical thinking and reasoning so I’m starting to blog more.
I noticed my thought process was becoming much more reactionary and shallow. I was allowing the insignificant to bubble up and take over. Rather than ponder something and try to write out, I was firing off a quick tweet and calling it a day. Couldn’t fit it in a tweet? Oh well, it must not be that important.
So that’s why I’m coming back to blogging. The first four years I ran this blog, I had over 100 posts each year—closing in on 200 in 2006. Starting in 2008, however, I let it slide. 66, 22, and then only 13 posts so far this year. I’m working on reversing that trend.
Enough Complaining
It's all bad code. Can we move on?
I’ve spent a lot of time working on a lot of different code. I’m fortunate in that I’ve been exposed to all three of the modern scripting languages and their communities. All of them, to varying degrees, bemoan their code. All of them, way too much of, bemoan the code of everyone else.
Ruby is too clever, I can never figure out what’s going on or what to expect.
Python developer’s can’t write a test to save their lives.
PHP developers, well, do I really need to say more?
Everyone’s moaning, from the top to the bottom about something with software. To listen to them, you’d think we had no tools to do anything and that all software sucked.
I’ve got news for you. It all does, so can we please move on now?
This isn’t a new concept. Actually, I’m stealing it mostly from Chris and his keynote as Ruby Midwest, but I’m coming at it from a slightly different point of view.
I had the unfortunate task of talking with a client today about the state of their code and recommending that they put a few small tweaks on hold until a larger revisiting could happen. Ultimately, it’s their decision and I’ll do what they want, but it’s going to cost them more for what can only be a temporary solution now, versus revisiting it in a few months as part of a larger redesign.
Why? Because their current code sucks. Everything is hard coded—nary a loop in sight. There’s HTML mixed up in their business logic and business logic in their HTML. They have a haphazard layout to their project. Form validation? They got it. To the tune of if statements in functions that are hundreds of lines line. Touching one part invariably breaks two other parts because they depended on that particular HTML tag to exist in that particular place, with those types of siblings, etc. etc., etc.
This code is the definition of spaghetti code. Adding features that are less than 30 minutes on any normal project are taking upwards of 5 to 6 hours, most of it in testing and fixes for other things you break along the way.
I’m relatively new to the project and wasn’t brought on in a code review capacity. I was explaining my reasoning and a knowing grin came across their face. “Well, I’ve been told that most developers hate projects that they come into when they’re new to it because they didn’t code it. It’s not how they would have done it.”
Gee, thanks. Every single developer out there who’s complained because someone used a different bracing system or they used a different naming convention than they would have or any one of the hundreds (if not thousands) of other asinine things coders like to complain about make it impossible for me to convey to a new client that his code truly is in need of some serious help because “developers hate projects.”
Can we all please shut up?
My point is this. Code sucks. It’s all a compromise to take an idea that we (or our clients) have formed and turn it into something machines can understand. It’s all trade offs. There’s always another way to approach something, a different language it could be written in, a more scalable data back end that could have been used. Always.
So why spend time talking about it? All we’re doing is making it easier for the people who need to be listening to us to tune us out. They don’t know the difference someone like me complaining about the internal structure of Wordpress (which I think is an abomination) and my explaining that it is the wrong choice for their project (which I’d argue is the case for at least half of business deployments of WP). That’s not their job to known and understand the difference, but it is ours. It’s also our job to be able to convey that information to them.
This reminds me of a few tweets from Ivo (if I recall correctly) during Confoo. One of the speakers had tweeted about the wifi and issues they were having with it. Then a bunch of people started “me too” tweeting about how bad it was and Ivo called the original person out on it. What good did it do for a speaker to complain publicly about a situation that was already known (and being worked on, I might add) other than to encourage everyone else to follow suit?
We’re doing the same thing every time we bitch and moan about relatively insignificant things in code. We’re doing everyone a disservice by increasing the noise-to-signal ratio when we complain for no good reason. So please, let’s all put a sock in it.
Show me the Code
How to lose my interest in your project
I’m lazy when it comes to code. Not in a bad way, but in an efficient way. I want to get to the crux of the matter quickly and move on. Truth be told, that’s why I like TDD—I don’t have to remember anything more than I need to know right now. My tests remember everything else I knew, but I digress.
When I start evaluating a new library to see if its something I want to use, code is what I want to see. Sure, tell me what it does, briefly, then move into the basic use-case. As an example, consider the front page of Ruby-Lang.org.
You’ve got access to all the information you need, but right up front and center (almost literally) is some code showing you how to use Ruby.
If you’re starting a new library, make sure to put the code right up front. I know there are any number of projects I have on GitHub that fail this test, but I’m not out purchasing domain names and designing custom sites for my projects either. If you bill your project as “easy” and “powerful”, show me, don’t tell me.
Otherwise, you’re no better than the phone commercials] that promise the world will bow before me if only I use their phone, without, you know, showing their phone being used.
Old code, new home
Moving some old PHP code on GitHub
Finally got around to converting some old code from SVN to Git and getting it up on GitHub. It’s like looking back through a time-warp actually, as most of the code hasn’t been touched since the summer of 2007.
Nearly all of the code is usable, but it’s all abandoned at this point. If there’s something there that strikes your fancy and you’d be interested in forking it into your own project, feel free.
There are still a few more to go, but you can start checking them out now at the Domain51 Github account. Just search for Domain51_ to filter the listing as they’re all named in the old PEAR-style package naming scheme.
New Site is Live
New Site Design; New Blog Engine
This may be premature, but it looks like I’m live with the new site design and new blog engine. The design is HTML5 (i.e., it looks great to me in Chrome, not sure what it’ll be like elsewhere) and the new engine is jekyll.
What does this mean for you, my loyal reader? Not much, really. I believe my port is transparent.
Actually, the only problem I’m seeing right now is related to disqus—some of my comments that I know are imported are not showing up yet. I just dumped nearly 2,000 comments into their system for this blog, so my guess is that it’s a caching issue and they’ll catch up. The comment count number are correct inside their admin interface, so I know the comments are in their system somewhere. :-)
If anything looks out of whack, please let me know.
Real-life global hell
Or, why global state is bad by example
Lately I’ve been playing with testing frameworks all over the spectrum of languages. I’ve come to really enjoy using Cucumber for testing web APIs. Since most of my coding lately has been in JavaScript or Python, using Ruby with Cucumber allows me to completely segregate my tests from the system under test (SUT). This separation has worked great until recently when I needed to have the test system running in Python.
I started looking at the Python bridge in Cucumber and that’s when I came across Lettuce. It’s a Cucumber-inspired, Python BDD library. I like the syntax, it had built in Django support, tons of tests (including functional and integration), so I was ready to go.
Then I ran the test suite. And it failed.
A failing test suite is a massive red-flag for me with any project. In a test suite, it’s a nuclear launch siren. I poked around a bit and figured out what was triggering the test, but not why, opened a bug report and found out that the tests were never meant to be run together.
I let it go, but last weekend decided I was going to dig into the framework, figure out what was causing the tests to fail if the functional tests were run before the unit tests, and submit a patch.
I spent the better part of Sunday afternoon cursing at code, trying various paths of exploration, trying to grok the entire framework’s codebase to understand what was happening. I ended up going so far as doing the equivalent of var_dump() debugging (pdb didn’t prove very helpful because of the intense setup required before the tests started running)
Finding the code that was causing the problem was easy - modifications to the lettuce.registry.STEP_REGISTRY were causing the failure. Figuring out how to fix that proved more difficult.
The issue, it turned out, was global state. The unit tests assumed that once they setup, the state wouldn’t change. The functional tests didn’t much care for that and stomped all over the registry of steps. By the time the unit tests rolled around, the steps that were so carefully defined inside the unit test modules were no gone and the test suite was throwing failures.
I finally landed on the solution by redefining all of the steps inside a @with_setup() for the tests that need them in the unit tests. It brings up a couple of interesting learning moments.
First, this shows the need to make no assumptions when writing tests. Need a database connection? Make sure its initialized and ready for each test that uses it. Want to make sure a step is defined for the test your checking in a BDD framework, define it immediately before running the test. It’s a good example of defensive coding.
Second, it shows the mess that global state can create. Each of the test modules was being loaded by nose, then the tests were being executed after a global state (the STEP_REGISTRY object) was defined. When other tests changed that state, things started falsely failing.
The “fix” currently is to reset that state to what you expect every time, but this causes issues with tests running in parallel. What happens when two of the same tests both try to reset the state at the same time? Don’t know, I haven’t tried it yet, but I imagine it’s gonna cause some more failures.
My fix now is short-term (and was included) and it gets the job done. Hopefully, this shows you a bit about what we mean when developers say that global state is a bad thing that leads to tricky bugs that are hard to comprehend. In this case, I literally had to understand the entire step definition system in order to comprehend what was happening here
The test cases at least gave me some guard rails to help guide me toward the solution, but had there not be global state in the first place, these tests would have worked across the board with no problems.
Using Twitter OAuth Properly
This is it. I've had enough! Seriously, people. OAuth is about maintaining control as a user and everyone wants me to give it up! I'm tired of constantly clicking deny.
What am I complaining about? The constant abuse of Twitter OAuth login. Every site that I've visited that uses Twitter OAuth requires both read and write access to my account. The latest to do this is Paper.li, a service that looks really cool,
but…

So what's the fix? Websites should ask for the minimum amount of information needed to get started. In nearly every single case, the sites are using it for login purposes. Instead of a username and password, you talk to Twitter to verify that you have a legitimate user. Those "Tweet This For Me" buttons are optional add-ons that you can do.
You should handle those automatic cases by performing an upgrade when the user decides they want to allow your application to update for them. Unfortunately, Twitter doesn't allow you to specify which level of access you want when you request a token, you have to do it when you setup your application.
Registering two applications is an easy solution to this problem. You use the read-only application for authentication, then switch to the other app when you're attempting to write. It requires a little overhead when you store the authentication token, but it's trivial to store a flag showing which set of credentials to use.
Honestly, I know most applications are completely trustworthy. Especially those I've found through recommendations of others, but it's still unnerving to give 100% access to my account to a new service for the shear pleasure of being able to login and see if I like it. It should be to you too.
MongoDB: A first look
The entire subject of two talks and mentioned in several other, MongoDB was definitely a buzz at TekX this year. It's long been in favor in the tech community in Lawrence and has been used for some data crunching for a few projects at the local paper. Even with all of this exposure, I've yet to sit down and actually explore it.
That changed Friday afternoon while I sat at O'Hare waiting on my flight back to Lawrence (which subsequently got canceled). I installed Mongo earlier in the week and opened up a bunch of tabs on the various intros and tutorials available on the Mongo wiki. The rest of this article a mix of stream-of-conscious as I played around with Mongo for the first time and some of my reflections this past week.
Note on typefaces
I use both Mongo and mongo throughout this article. The first, the
title-case Mongo refers to the software as a whole. Whenever you see mongo
with a lowercase and in monospace, it's referring to the Mongo client program
you run from the command line.
Installation
On a Mac, it's a breeze. I use Homebrew to manage software on my Mac, so a
quick brew install mongodb was all I needed and a minute later I was ready to
go.
Starting Up the Server
Mongo is run by the mongod process. I don't know if it's pronounced
mongo-d or mon-god though. It's a fun play on words if the latter is the
case.
Brew includes a basic configuration to get up and running, so I use that inside
a screen instance so I can leave it running in the background while I use the
mongo tool to interact with it.
Interacting with Mongo
I started out with the basic tutorial to get going. It looks like that needs some love though. It shows the version in the startup as 0.9.8. Homebrew ships with 1.4.2 and I did find a few things that were out of date. No, I' haven't been a good open source community member and submitted fixes yet.
The first thing that's different than a traditional RMDBS with Mongo is that
you don't have to explicitly create a database. Pretty straight forward: from
within mongo, type use <database>. This creates a brand new database for
you and you're off. For the examples below, I'm using use mydb to select
mydb as my database.
It's kind of nice to just be able to connect and go, but it feels odd. Not
good or bad, just odd. Sort of like the first time you run git checkout
inside a repository to switch branches when you're used to Subversion.
The shell feels like a Javascript console. I don't have access to the source code in my off-line mode, so I don't know but that it is. The syntax seems remarkably similar, so it's at least Javascript inspired.
Adding Records
Mongo stores documents, not rows of columns. This distinction allows Mongo to ignore schema—continuing the theme of leaving it up to the developer. Those documents can be made up any number key-values that look remarkably like JSON. Need to store a new data point, just add it as a field to a document and you're set.
Here's an example inspired by Mongo's tutorial for adding a few records:
> person = {name: "Travis Swicegood"}
> city = {city: "Lawrence", state: "KS"}
> db.things.save(person)
> db.things.save(city)
Here I created two new objects with various data attached to them, then saved
them all inside the things collection. Collections in Mongo are like a table
inside the SQL world. You don't have to create a collection, you just declare
it on the db object, and you're set.
Comparing this to the same code in a database, I've got to say I love this. No boilerplate code to get going. I didn't have to create a database, no tables were created. I just started using them. This appeals to my laziness—err, I mean desire for efficiency, but also looks very promising to teach someone new. Every abstract idea you can remove is one less potential stumbling block for someone starting out.
Back to the data I entered. Notice that neither have the same fields.
Collections inside Mongo are made up of a series of keys and values—they
can be whatever you want them to be. This is perfect for lazy migrations:
migrating the data as its requested instead of doing it all at once. ming,
a Python wrapper around Mongo already provides this. This is especially
useful for large sites with lots of data that may or may not ever been
requested again.
Finding Records
Now that the records are there, finding them. The db.things object comes
back now:
> db.things.find()
{ "_id" : ObjectId("4bf9a96b7d04f51b48499011"), "name" : "Travis Swicegood" }
{ "_id" : ObjectId("4bf9a96f7d04f51b48499012"), "city" : "Lawrence", "state" : "KS" }
That gives me everything. The find method takes optional parameters to
filter the results. This is actually a good time to bring up the built-in help
in mongo. Entering only the value of any function (i.e., without calling it)
displays the implementation of the function:
> db.things.find
function (query, fields, limit, skip) {
return new DBQuery(
this._mongo, this._db, this, this._fullName,
this._massageObject(query), fields, limit, skip);
}
Note: I changed the formatting so it's more easily viewable online.
The parameters are optional (like all Javascript function), so you can pass in
as many or as few as you want. Filtering the results is done by providing a
hash for the query parameter (the first one). For example, to find my
record:
> db.things.find({name: "Travis Swicegood"})
{ "_id" : ObjectId("4bf9a96b7d04f51b48499011"),
"name" : "Travis Swicegood" }
One thing you can't do is full-text searching. I can't ask for all of the
records that begin with Travis or have a portion of my name in it. The
current recommendation (at least via the wiki) is to build your own list of
keywords as an array, then search that array. For example:
> var person2 = {name: "Travis Swicegood",
> name_field: ["Travis", "Swicegood"]};
> db.things.save(person2)
> db.things.find({name_field: "Travis"})
{ "_id" : ObjectId("4bf9afa17d04f51b48499014"),
"name" : "Travis Swicegood",
"name_field" : [ "Travis", "Swicegood" ] }
For something like a name, this can be useful. For full-text searching of an article, it's probably best to delegate searching off to something like Solr and let Mongo focus on storage and retrieval.
Querying for sub-objects
Of course, I had to try sub-objects to see if they would work:
> db.things.find({person: person2})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"person" : { "name" : "Travis Swicegood",
"name_field" : [ "Travis", "Swicegood" ],
"_id" : ObjectId("4bf9afa17d04f51b48499014") },
"city" : { "city" : "Lawrence",
"state" : "KS",
"_id" : ObjectId("4bf9a96f7d04f51b48499012") } }
You can also query using the dot-notation to &lquot;reach through&rquot; an object and look at its children. This returns the same result as the previous query:
> db.things.find({"person.name_field": "Travis"})
Limiting returned columns
This ability to dynamically add columns to a record and definitely provides a
breading ground for massive documents with lots of keys. Most of the time a
small subset of those keys are all that's needed. The second parameter in find
provides us with that functionality:
> db.things.find({person: person2}, {city:1})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"city" : { "city" : "Lawrence",
"state" : "KS",
"_id" : ObjectId("4bf9a96f7d04f51b48499012") } }
Likewise, you can reach through the object and pull out a subfield:
> db.things.find({person: person2}, {"city.state":1})
{ "_id" : ObjectId("4bf9b02b7d04f51b48499015"),
"city" : { "state" : "KS" } }
These examples bring up a syntax thing with Mongo that I'm not crazy about: the use of the number one. It's the standard C style: 1 is true, 0 is false. I'd love to see the client and the libraries adopt an intent revealing name. Granted, this is a minor niggle, but the little things are what make a good system an amazing one.
Few issues
The docs, being that they are community run and Mongo's still relatively new, are a little loose. I've found a bunch of examples looking through them that don't work the way they were documented.
Another potential issue (or at least something you need to be aware of) is that Mongo's geospatial support isn't 100% year. They only provide 2d and the math they use assumes that 1° of longitude is the same at the poles as it is at the equator. For many applications, this isn't a huge issue, but if precision is important, Mongo's not ready for this type of use.
One thing that I'm looking forward to is Mongo's sharding. That is going to allow Mongo to scale horizontally really well. Some of the initial test results look amazing. What will be really interesting is to see how well is scales down. It's one thing to have over 300,000 ops/sec on a bigger box, another thing to be able to manage it on something like a 1gb instance on Rackspace Cloudservers.
Two Biggest Issues
First, Mongo's a master-slave system. It appears really robust, but whenever a box takes on a special role I start to get nervous. One of the promises of &lquot;NoSQL&rquot; is that it provides a tremendous amount of resilience. Any time you start to add special nodes you're taking away from that.
For example, if you're running 5 homogeneous servers and one goes down, the other 4 can pick up the slack—assuming you're not running 5 servers at peak capacity. This makes failure planning easy: figure up the amount of CPU time you need to handle your load, provision that many servers, then add enough servers to be comfortable when they start failing. Need 3 servers, provision 5 and you can have two failures before you peg your machines.
This isn't to say Mongo can't handle failures. It's current model is
rebalancing the load when one of the servers goes out. mongos is the tool to
read up on for handling this. Unfortunately, I haven't been able to dive into
it yet. The only way to know for sure is to build up a cluster then start
killing servers. Of course, this type of testing is preferred for any data
storage system.
Second, the license. I'm not anti-AGPL, but there's some ambiguity. The Mongo
team has addressed this both on the
wiki and through an in-depth
blog post. According to
that, I can write up a service such as MongoHQ and as long as I don't
actually change the mongod or mongos code I'm fine.
On the other hand, most of the definitions I've read of the AGPL mean that code that talks to it is subject to being hit with the AGPL. I don't have any doubts with 10gen, but if they don't always own the copyright
Of course, those last two paragraphs are with the caveat I am not a lawyer.
I think Mongo is an amazingly compelling piece of software in the non-standard database realm. With the upcoming sharding and what I would have to imagine is an eminent fix to the geospatial queries, Mongo's definitely worth a look.
It's about the story, stupid! (non-profits online)
Originally posted over at the horribly designed travis.domain51.com (see what happens when you give a programmer CSS access), I thought I'd share this post here too since that blog is just getting underway.
It truly is a shame that so many amazing non-profits are hidden behind horribly thought out websites. Most of these sites deluge their visitors with information, even though great sites such as Charity Navigator exist to provide raw statistics and facts about non-profits. The problem is that most non-profits are missing the point. Their websites are there to tell a story.
Let me say it again: a non-profit's website is there to tell a story. Nothing else.
People are natural story tellers and are drawn to an authentic story. Each of the websites linked to above have an amazing story behind them just waiting to be unleashed. A story that engages their visitors and potential donors. A story that sticks with them while they navigate their life the next few days. A story that ends with another beginning. One the visitor is a part of—where they help chose the ending by getting involved and helping that non-profit reach its goals, whether those goals are getting girls back to school to defeat the spiral of poverty in East Africa; feeding the abandoned, mentally ill of India; or helping people afford water while lifting themselves out of poverty.
When I left Ning a year ago this day I set out to figure out how I could increase my impact on the world. Through a series of fortuitous events, I ended up working with non-profits, helping them tell their story online. I need to make sure I remember that.
Everyone has a story to tell. These organizations are trying to change the world with theirs, and we're there to help them.
Please leave your comments on the original post if you'd like to comment on it.
I won't be attending PyCon 2010
This past few weeks has been crazy hectic. Business is going crazy, I'm in the final stages of launching multiple websites, and I've had a cold for the better part of two weeks. Unfortunately, these things have been conspiring against me and this past week I had to notify PyCon that I won't be able to attend and speak this year.
Please accept my apologies if you were planning on attending. I have a ton of great information together in various forms and as things get back under control I've got a slew of blog posts about testing and web dev that I'll post up. I plan on tagging the posts with the pycon tag if you're interested in keeping up with them.
Open Government in Kansas
I spent an hour yesterday afternoon on a conference call organized by the Sunlight Foundation about open government in Kansas. The Sunlight Foundation is an organization whose self-proclaimed mission is to use "cutting-edge technology and ideas to make government transparent and accountable." It was really encouraging to see the interest in open government, but there's lot's to be done.
We have some counties (20 according to the Sunshine Review) that don't even have websites, much less accessible data about their governments. You can't make claims of openness when you're not even presenting basic information about yourself online.
There were also some issues with the Freedom of Information Act requests. The state has to respond within 3 days to tell you when they'll respond, but they haven't been extremely helpful and have even gone so far as to encourage people not to pester them. The old "more flys with honey than vinegar" argument.
One comment I heard in the conversation was an admonition from someone on the ground here in the state (I didn't catch the name) to watch shooting for the stars. He said he was tired of seeing data being hidden behind "well, we want to do this right" instead of just getting information out there. He was advocating scanning things like minutes and making them available that way. I applaud the release early, release often mentality of that approach, but as a tech guy, that scares me.
I want raw data. In my ideal world, I get an API key from the state and can query the databases of any branch of government and get the information that they're responsible for. Scanned JPGs don't give me that. I have to run an OCR on the images and hope it's a font (or in some cases handwritting) that's recognizable by a computer to get any raw data out of it. Forget the semantics in it, those are almost completely lost without manually vetting all of the data.
I don't think this is a matter of people trying to hide data through obscurity, rather, I think it's more a matter of not fully understanding the issues here. How can you expect those 20 counties to understand the difference between "available" and "accessible" when they don't even think a website is important enough to maintain?
One thing that really makes today's conversation interesting, however, is the recent Secretary of State development. This past Monday, the Kansas Secretary of State gave one week's notice that he was resigning his post to pursue a career in the private sector.
The Secretary of State controls all of the public financial records for campaigns. Voter files, campaign contributions, expenditures, all of that is stored behind the firewall at the Secretary of State's office. There's information on the current site about campaigns and the money they raised, but good luck finding it. I dare you. Give that site to someone who's Internet savvy but not familiar with the site and ask them to find the the filings for the Governor's race. I know my way around the site and it still took me 5 minutes to find it. Hint: it's burried under sub-links that are only exposed when you're on certain pages.
What would be amazing to see is someone appointed Secretary of State who gets open government. Who realizes that scanned tiff files aren't "open government." They're a good faith step in the right direction, but they hide the potential that raw data provides. How can I feed that information into a program to analyze it and look for patterns?
I hope the state doesn't squander this opportunity. There's an opportunity to appoint someone who gets open government and who would make it their mission to open the floodgates on the information that the SoS office controls. Once you provide citizens (and journalists) with the raw numbers about campaigns, where money's being spent, where money's coming from, you open yourself up to all sorts of interesting interpretations. You start moving from that "available" column toward the "accessible" column. That's what the open government movement is all about afterall.
Are we our brother's keeper?
It's 37° F (2.7° C) and dropping. It's going to hover near freezing tonight, and come midnight... flip a coin - head's it rains, tails it's dry. And he's out there.
I know he's not my responsibility. But isn't he?
There's a big guy that's homeless here in Lawrence. You know him if you've lived in or around downtown Lawrence. The guy's really big. He started hanging around South Park shortly after the Salvation Army closed their shelter. We noticed him hanging out early in the morning and in the evenings. It wasn't long before we put two and two together. He'd taken up residence. At least over night.
As the temperature started dropping, he and the other few souls that would spend the night in South Park on the benches, in the gazebo, or under the stairs at the gazebo started disappearing. I assumed they'd found shelter somewhere warm, or moved south following the warmer temperatures. Tonight, as it warms up to a balmy 37, he was back.
We noticed him earlier this evening as we walked the dogs. Meg pointed him out as we started in to South Park. He was headed down the red-bud path. As he got closer, we could see the two pieces of cardboard he was carrying. My heart went out to him as I realized what he had done. He'd scavenged up some insulation for the night to come.
Continuing on our walk, he milled around South Park some more. As we finished our loop through the park with our dogs, I saw him lumbering off back toward downtown. "Well, maybe he isn't spending the night in the park," I reassured myself and didn't give it another thought. Meg and carried on our evening, including some pizza and beer at the Oread and a Jayhawks game.
Walking back down the mountain tonight, I couldn't help but scan the park. The benches were all empty, as was the gazebo. Maybe he had found a spot in the shelter. Then I saw under the stairs. His unmistakable girth, under a pile of blankets, huddled up against the gazebo.
I quietly motioned to Meg. Our banter, lively all the way home, died.
After we had passed the gazebo I said, "I know he's not my responsibility. But if he isn't mine, who's is he?"
I don't know what I could have done. I don't know his story. Could I approach him, or is he unstable? Why's he homeless? Is it a "lifestyle choice", as certain organizations would have you believe of the majority of Lawrence's homeless, or is he one of those guys who just had one too many blows and hasn't been able to get back up?
I wonder though, what is my responsibility? What is our responsibility? If you believe in a higher power, when he comes asking "where's the big guy?", are you going to respond "am I my brother's keeper?", or will you response "I am my brother's keeper, he's over here"?
Want to help?
Head over to the Lawrence Community Shelter website. They can use cash or supplies.
Packaging reuseabe & testable Django apps with virtualenv, pip, and Fabric
As someone noted the other day on one of my Facebook posts, I've been doing a lot of Python development. I've moved almost entirely to Python for development, web and otherwise. Instead of PHP, I reach for Django when I need to prototype an application quickly.
One of the things I've been struggling with is how to build re-usable
applications that are testable without having the entire Django stack running.
Until recently, I've used buildout to handle this. There's a
djangorecipe for creating a Django repository. I include that, a sample
project, the necessary requirements in a buildout.cfg and away we go.
All was well, until I included two project that had a sample project/
directory. The base Django project couldn't figure out what was what, problems
abound. There are other solutions. I could have set the project (doesn't
this seem like the Misses Bennett in Pride and Prejudice?) variable to change
it on a per-app basis, but that still left me with some problems.
Namely, I don't like the default layout of djangorecipe's Django project. I wanted to change it, but after some digging around in buildout's internals, I realized it wasn't going to be a solution I could live with long term. I'd heard a lot of people (by that, I mean James) state their preference for virtualenv and pip. The separation of concerns (one application for isolation, another for installation) instead of the all-in-one approach of buildout felt better to me, so I started exploring.
And I came up empty.
A lot of people talk about using virutalenv and pip together; pip documents how to install into a virtual environment; but no one talks about how to use everything together with Django. Specifically, no one mentions what to include in your repository. Until now. :-)
Setting up the repository
The most important part of this for me was what to store in the repository. It's simple enough, really. First, you need a requirements.txt file. For most simple Django apps, it contains one line. For example, this is what the requirements.txt file for d51.django.apps.tagging looks like:
Django
The next thing I need is a simple .gitignore file. My mantra is to not
commit anything that I can generate. This means all of the files generated by
virtualenv and pip need to be ignored. I also ignore the swap files created by
Vim (hey, I'm not the only one at the company who uses it, so might as well
ignore it) and I ignore all .pyc and .pyo files. The resulting
.gitignore file looks like:
bin/*
include/*
lib/*
.*.swp
*.py[co]
Now we're ready. Of course, you need to have virtualenv and pip installed, but once you've done that, running tests are pretty simple. First, you have to initialize the environment:
prompt> git clone git://github.com/domain51/d51.django.apps.tagging.git
prompt> cd d51.django.apps.tagging
prompt> virtualenv .
prompt> pip install -E . -r requirements.txt
The observant might notice my call to virtualenv. I've left out the parameter
--no-site-packages. Two reasons. First, I don't keep things like Django
installed at the site-packages level. Second, the things I do install, tools
like Fabric, I want access to them while in the virtual environment without
having to re-install them.
Now that the virtualenv has been initialized, now you need to activate the virtual environment:
prompt> source ./bin/activate
(d51.django.apps.tagging)prompt>
Notice that the prompt changes. It's prefixed with the name of the directory you're in to signify that you're inside virtualenv. Now running the tests are dead simple:
(d51.django.apps.tagging)prompt> python ./run_tests.py
Testing Django apps inside virtualenv
I need audio that plays when you get to this line. That screeching record
player coming to halt. The visual question mark. What's this ./run_tests.py
file you ask? The secret sauce.
Django wants to be setup in order to run. Normally that's requires a project,
settings.py, and a partridge in a pear tree. Unless you call
settings.configure. You can use that to mimic the normal Django settings,
tweaking the settings to match your needs for testing.
For d51.django.apps.tagging, the settings are pretty simple. I need to make
sure that my app is available along with django.contrib.contenttypes since
I make use of the generic relationship code. There's also some cargo culting
required, as Django won't run without a DATABASE_ENGINE specified. The end
result looks like this:
from django.conf import settings
from django.core.management import call_command
def main():
# Dynamically configure the Django settings with the minimum necessary to
# get Django running tests
settings.configure(
INSTALLED_APPS=(
'django.contrib.contenttypes',
'd51.django.apps.tagging',
),
# Django replaces this, but it still wants it. *shrugs*
DATABASE_ENGINE='sqlite3'
)
# Fire off the tests
call_command('test', 'tagging')
if __name__ == '__main__':
main()
Running without activating virtualenv
This works, but requires that you always have virtualenv activated. For
example, if you deactivate virtualenv and try to run the test, you get
an ImportError:
(d51.django.apps.tagging)prompt> deactivate
prompt> python run_tests.py
Traceback (most recent call last):
File "run_tests.py", line 1, in <module>
from django.conf import settings
ImportError: No module named django.conf
You can programmatically activate virtualenv, however, by including this snippet of code in a .py file located in the root of your repository:
execfile('./bin/activate_this.py',
dict(__file__='./bin/activate_this.py'))
You can add that line to the top of the file and execute run_tests.py without
needing to activate the virtualenv before hand. The line needs to go before
the from django.conf line to make sure that Python knows where to find Django
and any other requirements of the test.
This requires that you activate the virtual environment prior to running the
test, or have Django installed at the system level. This can be further
simplified and remove the need to activate and deactivate the environment prior
to test runs by executing the bin/activate_this.py file that virtualenv ships
with.
Making this reusable
There's a lot of code in that run_tests.py file that is going to be
duplicated for every project. Actually, there's only three lines, well, two
lines and one variable in a line, in it that are unique: the two line of
INSTALLED_APPS and the app name to test in the call_command line.
To keep from repeating that for every single project, I created a simple harness for initializing tests. Introducing d51.django.virtualenv.test_runner, a very small package for running Django tests inside virtualenv.
Using this, run_tests.py now looks like:
try:
from d51.django.virtualenv.test_runner import run_tests
except ImportError:
print "Please install d51.django.virtualenv.test_runner to run these tests"
def main():
settings = {
"INSTALLED_APPS": (
"django.contrib.contenttypes",
"d51.django.apps.tagging",
),
}
run_tests(settings, 'tagging')
if __name__ == '__main__':
main()
The first four lines give the user some input when they run the tests without
test harness. That's optional, depending on how user friendly you want to be.
After that, all I do is call run_tests with a settings dictionary and the
name of the app I want to test.
There is one downside, though. You have have it installed outside of your
virtualenv in order to run your tests. Personally, I'm not to worried about it,
as I have it installed, but if you're really paranoid, you could include it in
your requirements.txt file, which would require that the user be inside the
virtualenv to run your tests.
Wrapping it all up in a Fabric cloth
The final step is to make all of this bullet proof is creating a Fabric file that handles all of the initialization and running of the tests for me. For good measure, it should also be capable of cleaning up after itself. I don't need a bazillion copies of Django laying around, afterall.
The end result looks something like this (using Fabric 1.0a):
from fabric.api import local
def test():
"""
Run tests for d51.django.apps.schedules
"""
local("python ./run_tests.py")
def init():
"""
Initialize a virtualenv in which to run tests against this
"""
local("virtualenv .")
local("pip install -E . -r requirements.txt")
def clean():
"""
Remove the cruft created by virtualenv and pip
"""
local("rm -rf bin/ include/ lib/")
Now you can run the initialize the environment, run the tests, and clean up after yourself with three commands:
prompt> fab init
prompt> fab test
prompt> fab clean
One drawback to this method, Fabric's local command swallows the output
of the test. This isn't a problem until you have a failure. local does
contain a capture parameter, but it doesn't display the output from an
failed command. That's fixable, but for the time being, my recommendation
is to use Fabric as your quick sanity check, but rely on straight
python ./run_tests.py for your real testing.
Conclusion
That brings us to the end of our quick tour. Hopefully this provides you with the information you need to get started using virtualenv and pip with Django. It's not that complicated. Actually, once you have your bearings, it's downright easy. The problem is more that people who have trodden down this path haven't documented their way. Hopefully, this post helps serve as a rough map.
Thanks
I'd like to thanks James Bennett for illuminating a few pieces of this
puzzle for me (in particular, pointing me toward settings.configure) and his
preview a draft of the article before I posted. I'd also like to thank
Jeff Triplett for his comments on a draft and pointing out an unexplained
inconsistency with the rest of the world's examples of virtualenv. And, as
always, my good buddy Roder for his constant encouragement.
The problem with Python namespaces modules (or, Python Namespaces. There be dragons this way.)
Yesterday I lamented the issues with namespaces in Python. It's not really the namespaces, it's the marketing of namespaces. Newbies to the community (something I still consider myself for most purposes) are drawn to modules thinking that there's a one-to-one relationship between file hierarchy and namespaces. And there is. Well, sort of.
You have to read the entire manual or happen to have someone to point out the difference between namespaces and modules to even realize there is a difference. Under most circumstances, you won't even realize they are different until you start to do something slightly complex. Say, for example, building an application with multiple modules inside a similar namespace, each module in a separate repository and its own history. There needs to be a "there be dragons" warning to let people know.
Those dragons are such: you have two paths inside sys.path that contain similar code. Such as I do with all of the Domain51 code. Package one has foo as a module while package two contains bar. The assumption would be that Python acts like most other languages and would exhaust it's sys.path trying to find both packages, but that's not the case. It'll get to the first one, then pretend the second doesn't exist.
The fix for this is the explicitness, one of Python's cardinal virtues. You have to declare the namespace in order for it to work. As you don't see this hardly anywhere in Python because Pythonistas feel that namespaces are a bad idea, here's the code you need to include in your __init__.py file to make it declare itself as a namespace:
import pkg_resources
pkg_resources.declare_namespace(__name__)
That's it. Now Python becomes smart again, and you can have real namespaces with similar directory structures existing side-by-side. Python is perfectly capable of finding them - now. Which raises an interesting question: why does Python scan the entire sys.path looking for files, building up this list of what declares what namespaces and where only to ignore it later unless they're explicit about it? I haven't dove into the source to be sure, but it seems it would have to scan the __init__.py files in order to know whether there's something there.
But I digress. There's a bigger dragon that's not even hinted at. Python's inability to find modules.
Take, for example, my python-stupidity repository on GitHub. Run the test.py file and you can see the error for yourself. There are two barfoo modules within the path, but Python decides to act the villagedolt and stop as soon as it hits the first one that might match. This particular case is caused by foobar trying to import a method from barfoo that doesn't exist in foobar.barfoo
This is, in my opinion, a huge issue. Note that foobar.barfoo declared it's namespace. It said loudly, "I am me", and Python ignored that fact in favor of relative includes. Not only that, but it stopped and started pouting as soon as one module that said it was foobar.barfoo couldn't match.
Why not finishing looking through the rest of the sys.path? Why not pay attention to that precious declaration Python wants you to add to explicitly become a namespace?
Like almost all problems with programming languages, however, there is a fix. At first glance, I thought it might be the way PHP handled it - just include a separator at the beginning of the import. That didn't work, but in searching for the solution, I found out that Python supports relative imports through it's support of intra package references. The fix within the foobar module is to do from ..barfoo import base_barfoo, but this only covers you if you're in Python 2.5 or later.
According to my understanding of it, you use it to explicitly say I want a sibling module names X without having to declare the entire namespace or accidentally picking up a module from the global namespace. Fair enough, but my solution to the problem above is to use the relative import to trick Python into thinking it couldn't find a module named the same.
You can see the code in the d51.django.auth package. I have a d51.django.auth.facebook module which takes precedent over PyFacebook's facebook module, but only inside the d51.django.auth.
I'm not saying that namespaces are a bad idea in Python or any other language. I'll gladly take namespaces, in any form I can get them and use them. They provide a great way to segregate code into small, independent, re-usable packages while continuing to say "I'm from over here." They allow facebook to be used as a module in multiple places without causing an issue, other than the ones listed here.
No, my problem is not with namespaces. My problem is with Python's current method for searching for them; it's lack of exposing namespaces and modules and their differences up front, and it's brain dead way of halting on the first partial hit. I'm amazed that a language that prides itself on explicitness—on not doing anything that's not asked for—decides that it's ok to stop looking for matching code just because it found one thing that doesn't match. It smacks of premature optimization.
