TravisSwicegood.com

Lone Star Hacker, Author

Python for Beginners

| Comments

Yesterday I attended the Pycon Web Summit and there was a lot of talk about getting new programmers started in Python. I’ve been thinking about this a lot the last year since helping found the Austin Web Python User’s Group and I think I have a solution.

Success early, success often

One of the key things we need to be able to do is get developers on every platform up and running quickly. An iPython shell is a wonderful place for a newbie. Do you remember the first time you typed code into a REPL and it did what you told it to. 2 + 2 returned 4, then I’ll bet you tried 2 + 3 just to see that it wasn’t some trick. That sense of wonder, excitement, and, most importantly, accomplishment needs to be priority number one as we move forward whether we’re starting someone on raw Python for data manipulation or Django for full web application development.

The number of steps between “I want to learn to do X” and actually making Python do something for you needs to be minimal. That means the first words can’t be “pip install django” to get Django installed. We need to teach newbies about pip and virtualenv and how to install Python and all of the steps that go in betwee, but not yet.

The tool

I think an environment built on top of Vagrant is the right solution. We can bootstrap a virtual machine that’s ready to start accomplishing things the second it’s launched. We can’t start teaching Python by telling people they have to go find, download, and install Ruby, Rubygems, VirtualBox, and Vagrant.

The solution to this problem is a one-click installer that gets Vagrant and all of its dependencies installed and presents you with a GUI to select the type of environment you want to create. Need Django, Pyramid, NumPy, SciPy, or hell, even a setup with [csvkit]? Select that and a few minutes later (assuming a broadband connection), you’re up and running with a prompt that lets you start working.

This is doable. I haven’t done it. I’m not sure that I could (I haven’t programmed for Windows in well over a decade). I want this out there though. I want more people thinking about it and hopefully someone can kick the process off. I’ll help in any way I can and I’ll definitely use it if someone starts the project.

Got a better idea? Let’s hear it.

Deploying TileStream to Heroku

| Comments

This past week I attend the 2012 IRE conference. Remember all of those #nicar12 tweets you saw from me and few other programmery/journalisty type people? That’s the conference we were all hanging out at.

Custom maps were one of the big themes. There were a few TileMill talks and they were all packed. TileMill, for those who aren’t familiar, is a tool that let’s you create custom map tiles–the images that make up maps like Google Maps–so you can have a map that’s entirely unique. An example of this is the Idaho Unemployment Map by the folks over at State Impact.

We’ve been talking about using TileMill at the Texas Tribune for months now, but we’ve yet to actually deploy one. A few of us have TileMill locally and have played with it, but the tile serving component is something we haven’t touched.

I came back from the conference and got sick. Yesterday, while trying to kill some time without thinking of anything particularly important I decided to see what was involved with deploying TileStream.

TileStream is a tile server written in Node by MapBox, the creators of TileMill, to generate and server the tiles for a map you create. Since tiles are simply PNGs, it seems like you should be able to just generate a whole host of files, upload them to a server, and call it a day. The problem that a tile server solves is having to generate all of those tiles at once. Generating them, then uploading them once is a pain, but what happens if you need to make a change to them?

Lately, I’ve been on a “no new servers” kick. I’m tired of seeing the amount of time spent tweaking servers instead of working on code. DevOps is fun, don’t get me wrong, but sysadmins we are not. With that in mind, I decided to take a look at what’s involved in deploying TileStream to Heroku, a “cloud application platform” that supports a whole host of languages—including Node.

Preparing for Deploy

The very first thing you have to do is create a map an export it. That’s a topic unto itself, so I’m not going to cover it here. I created a simple copy of the state of Texas with all of its counties outlined and colored in. I forget where I procured the shape file, but some Googling should turn it up if you want to follow along.

Make sure to export the file as the mbtiles format when you export it. Where you export it to isn’t important right now, just remember where it’s at.

Next, you have to make sure Heroku is installed. If you already have a working Ruby and gems environment with Git and so installed, you can run gem install heroku to get the command line client. If you don’t, check out the Heroku Toolbelt for a quick start to get setup. Once you have the command line tools setup, log in to your Heroku account with heroku login and follow the directions.

The next step is to create a new Git repository. Heroku uses Git as its means of tracking files to deploy. You’re going to have to learn at least a little bit of Git if you’re going to use Heroku (side note: I’ve written two books on Git and highly recommend Pragmatic Version Control using Git if you’re new to version control). Once you have a Git repository, run the command heroku create -s cedar inside your working tree. You should see something similar to this:

prompt> heroku create -s cedar
Creating hollow-fire-2448... done, stack is cedar
http://hollow-fire-2448.herokuapp.com/ | git@heroku.com:hollow-fire-2448.git
Git remote heroku added

hollow-fire-2448 is the name of my Heroku application. Yours will be different. Now you have to tell Heroku what to install. To do that for Node applications, Heroku uses a package.json file. That’s the file that Node applications use to set up the dependencies to make sure that everything is installed. For this server, you just need to declare a simple dependency on tilestream. My package.json file looks like this:

{
  "name": "texas-counties",
  "version": "0.0.1",
  "dependencies": {
    "tilestream": "1.0.0"
  }
}

Add that to the repository using git add followed by git commit. The next step is telling Heroku how to run TileStream. Heroku uses a Procfile to handle starting and stopping applications. The Procfile is run using Foreman and can define all of the processes required to run an application. The format is <name>: <command> and for this application you only need to add one line:

web: tilestream --host hollow-fire-2448.herokuapp.com --uiPort=$PORT --tilePort=$PORT --tiles=./tiles

There’s a couple of things going on there. First, notice that I’m explicitly adding a --host name and using the name of the app that Heroku told me when I called heroku create earlier. TileStream currently only responds to requests on hosts that it recognizes. You’re going to need to change that line to be whatever your Heroku app’s name is.

Next, notice that both --uiPort and --tilePort are set to the value of $PORT. Heroku exposes $PORT as an environment variable to let your application know what port to listen to for incoming connections.

Finally, you set the directory for tiles to ./tiles. Commit this, then push to Heroku to verify that everything went according to plan.

prompt> git push heroku master 
Counting objects: 6, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (6/6), 659 bytes, done.
Total 6 (delta 0), reused 0 (delta 0)

-----> Heroku receiving push
-----> Node.js app detected
… and a whole bunch more output …

Go ahead and stand up and stretch. Go grab some coffee or tea or water, whatever you vice. This step takes a few minutes while Heroku installs all of the dependencies and such for TileStream for the first time. It’s kind of awesome, though. Without a single bit of server administration, you’re just a few minutes away from having a fully operational TileStream server.

… waiting on Heroku to finish up …

Ok, done? Now run heroku open. This launches your browser and opens the URL of the Heroku application. If everything went well, you should see the empty TileStream server like this.

Empty TileStream

If you don’t get a page like the above, check the logs by running heroku logs to see if it gives you any clues. Another thing to double check is the process list. Run heroku ps to make sure that web.1 has a state of up.

That big error is the non-user-friendly way of saying there’s nothing in the tiles directory to read and display. Remember the mbtiles file you created earlier? Now it’s time to move it into place. Inside your Git repository, create a directory called tiles and copy the mbtiles file into it. Once the file is in place, add it to Git, then push the new commit to Heroku.

This push is going to take a little bit, depending on how fast your connection is. It has to send the entire mbtiles file over the wire to Heroku. Having done this a few times now, it seems like Heroku might throttle large uploads. I start out at a few hundred KB/sec, then it drops down to around 100KB/sec for about 30 seconds before settling in at 80KB/sec. Their business isn’t receiving huge files, so it would make sense if Heroku did throttle to make sure one large upload didn’t take over their entire pipe.

Once the push has finished, reload your browser window and you should see your new map, much like this:

TileStream with one map

And now, you have a tile server. Deploying to Heroku for this is a great fit for the standard news application. You need the ability to handle tons of traffic as you launch, then scale back until it hits maintenance mode where you only need a skeleton server running.

Heroku gives you one dyno–think of that as one process on a server–for free, with each additional dyno costing $0.05/hour (see the Heroku pricing page). That means you can spin up several dynos to handle the initial flood of traffic, then scale back to a smaller set and only have to pay for the initial spike. All, without any additional work on your end setting up or configuring servers.

Now, the one caveat to all of this is that I haven’t actually tried running TileStream like this with a production load. I’m not sure what kind of performance we could get out of it or what limitations there might be. The only way to answer that is to try. Hopefully we’ll be able to pluck one of the projects out of our pipeline and do some custom maps for it using TileMill and TileStream.

Where to from here?

The next thing you need to do is write some JavaScript to interact with the tile server. Leaflet has gained a lot of popularity and seems to be the default choice. I’ve yet to play around with, but that’s a topic for another blog post.

If you’re interested in seeing what all of the pieces look like together, my Heroku app is still online at hollow-fire-2448.herokuapp.com. I’ll try to leave it spinning, but if I take it down, I’ve posted the repository on GitHub so you can see all of the files in their original state.

Importance of Context

| Comments

Today I discovered the 99% Invisible podcast on architecture and design. Their latest podcast, Pruitt–Igoe Myth, tackles the problems associated with the Pruitt–Igoe housing project which was built in the 1950s in St. Louis to provide affordable housing in the St. Louis urban core. Due to a variety of reasons, which the podcast explores, it was torn down in the 1970s. From Wikipedia:

[Pruitt-Igoe’s] 33 buildings were torn down in the mid-1970s, and the project has become an icon of urban renewal and public-policy planning failure.

After listening to the podcast, you come away with the impression that this isn’t a fair assessment. It was built at the beginning of the White Flight, in a part of the city that saw a decrease in population, not the projected 100,000 every decade increase that was forecasted. These and other issues contributed to it turning into the very thing it was trying to prevent: a slum.

The building is considered the example of the failure of Modernist architecture as it applied public house, but if you view it in the context above you can see that there are many external factors that contributed. It’s easy to pick one particular piece of the puzzle and lay the blame on that for the failure. It’s much harder to try and understand the complex relationship around what caused the issue.

Applied to Programming

This type of logical error is present in many (not all, but many) of the conversations about what framework or language to use, what methodology should be adopted, or even where to found your startup. It’s easy to point to one success or failure and declare “X is why Z happened, so if I want to duplicate Z, then I must/must not do X.” This type of cargo-cult behavior is dangerous and should be guarded against.

Yesterday I tweeted this:

Whoa! JustinTV is moving from #rails to #django. I’m telling ya, Python & the web with a little Django mixed in is about to blow up.

It gives the impression of just that type of “Y leads to X” kind of thought process that I’m against. To clarify, I whole-hearted expected what kvogt wrote when explaining why they’re moving to Django. To paraphrase: “it just makes sense right now to be on one platform.” Justin.tv isn’t going to suddenly take the world by storm after moving to Django any quicker than they would have if they had moved their Python backend to Ruby.

That said, I stand behind the final point of that tweet. There are tons of shops using Python and Django that aren’t vocal about their use. Python is powering business logic that runs on servers sending me music, tracking my location, displaying my news, and a whole host of other things. Python can do everything low-level system tasks to scientific and financial analytical calculations to high-level business logic for websites and everything in between.

I can’t help but thing there’s going to be more Justin.tv-style announcements this year: shops standardizing on one language and that one language being Python.

Using Basketweaver With GitHub

| Comments

Last month I blogged about using Travis CI with Armstrong. Things have been going along fine until the last few weeks. Tests were failing due to network timeouts while talking to PyPI. Never one to take failing tests lightly, I set out to fix it.

From local testing, it appeared that there was some sort of selective filtering happening at the server level on PyPI that was causing our tests to fail. All of our tests in the CI environment follow these tests:

  • Install all of the development requirements with pip install -r requirements/dev.txt
  • Install the local package
  • Execute the tests using fab test

I could follow these steps to the letter locally in a fresh virtualenv, but the second they hit the Travis-CI server they would time out while trying to install everything. We’ve seen similar behavior at the Tribune when we roll out new servers. PyPI appears to be up, but installs fail due to timeouts.

Once I confirmed this, I started looking at alternatives to pypi.python.org as our main index for testing. My initial thought was to have a dynamic server that would act as a proxy to PyPI and cache everything locally. This requires the least amount of work long-term—assuming the server stays up. The problem was that nothing worked quite the way I wanted. The closest I found was collective.eggproxy. It felt a little odd and wasn’t very configurable without going the Paster route, so I decided to fall back on basketweaver.

Basketweaver builds a static index suitable for using with pip via the --index-url option. It takes a directory of files, then generates the HTML that pip can scrape to determine if the package exists. This HTML can be hosted anywhere that can serve a static HTML page, such as GitHub Pages.

Working with GitHub

There’s a few hoops to jump through when deploying to GitHub Pages. First, make sure you include an empty .nojekyll file. GitHub assumes everything you want to publish is in Jekyll, but this file tells GitHub to not parse your files.

Next, and I can’t count the number of times I’ve done this, GitHub Pages doesn’t give you directory indexes. Basketweaver generates its index in the /index/ directory so you can’t hit the plain GitHub Pages URL and expect to see anything more than an error message. Make sure to add the /index/ after your GitHub Pages URL to view the it once you’ve published your changes.

The next thing I do is rework where basketweaver looks for files to build the indexes. I really don’t want to look at a full directory of files at my root directory, instead I want all of the files stored in the creatively named ./files/ directory. Basketweaver installs a file called makeindex which I can never remember, so I created a run.py file that remembers it for me.

The last thing to do is to use the newly created index when installing packages. For Armstrong, we do this with:

pip install -i http://armstrong.github.com/pypi.armstrongcms.org/index/ \
    -r requirements.txt

I haven’t gone to the trouble of setting up a CNAME for pypi.armstrongcms.org yet, so we’re using the main github.com-based address.

There’s one final gotcha: PyPI uses routing that treats http://pypi.python.org/pypi/South/ and http://pypi.python.org/pypi/south/ as the same URL. That’s why pip install Django and pip install django both work even though the former is the correct package name. The URL spec is ambigous as to whether this is correct, but most web servers are case sensitive, including GitHub Pages.

This will get you if you have dependencies on packages that don’t use all lowercase names, such as South, Fabric, or Django. All three of these are dependencies of Armstrong. The fix is to make sure that your install_requires and requirements files have the correct case. The easiest way to determine this is to look at the output of pip freeze and make sure you’re using the same package name as it generates.

Conclusion

At the end of the day, this keeps our tests from being held hostage whenever PyPI goes on the fritz or starts randomly filtering requests as it seemed to do this past week. All that said, we’re still borrowing other people’s infrastructure. GitHub had a little blip while I was writing this post, underlining that you get what you pay for.

While you can use Basketweaver and GitHub to create a mirror of sorts for your packages, make sure you control the infrastructure if its mission critial that everything always stay up. That, or pay for it so there’s someone to call when it goes down.

Editing Mode

| Comments

In case you didn’t know, I use computers. A lot. Between working as a programmer, writing books, and the occasional leisure time spent playing on computers, the vast majority of my life is spent with a screen of some sort in front of it. That time means I come across and try a lot of different tools, and some of them actually make my life better.

One such tool I’ve started using extensively while writing my latest book is Notability. It’s a note taking application that let’s you import PDFs that you can write directly on top of. This is important because I can’t see the typos from within my text editor.

Switching environments when switching tasks is an important concept I picked a while back. For me, that switching comes when I build the book and switch over to a PDF version on my iPad to read it. With Notability, I can take the PDF version of my book and change rooms or sometimes just turn the chair around away from the desk, and switch into “editing” mode.

I’m not alone in using an iPad for editing. I hadn’t found an app that worked well for note taking though, so I often switched back to my text editor to write notes. Having to mentally switch context back and forth and back and forth as I physically switched devices hurt my productivity. Being able to do it all in one app has made iPad editing much more feasible.

Once I’ve finished an edit pass and have a whole slew of changes to make, I switch back to my computer with my iPad close by. Notability lets you change the color of the pen you use, so I swap it out for green, and slightly larger for impact, then start slashing through all of the red as I mark edits off. The satisfaction from marking something off with a physical slash can’t be overstated.

I’ve been using Notability for about a month now and don’t know how I managed to edit without it. I highly recommend it if you have an iPad and are doing any type of writing/drafting work.

Question for the Reader

I’ve been considering a series of short posts like this about tools that I use and how they fit into my work flow. I love to watch people work and see how they interact with their systems, though. Is this something that interests you?

Travis and Python

| Comments

Today I took my name back and got Armstrong tests running on Travis CI. Travis CI is the distributed, community run continuous integration server that the Ruby community has put together. It lets you do all manner of fun things, like testing in dozens of different Ruby version configurations.

You’re probably wondering what Armstrong is doing there with all of this talk of Ruby. No, I didn’t rewrite Armstrong in Rails last night. No, I didn’t convert all of our fabfiles over to Rakefiles either. Instead, I subverted it from within.

Travis CI uses a .travis.yml file for all of its configuration. There are two key fields that it gives you that let you do fun things with it: before_scripts and scripts.

before_scripts runs before anything starts. It’s like setup in the xUnit world, but for your whole environment. Each of the Armstrong components ships a requirements/dev.txt file, so I tell Travis to do a pip install -r of that during setup. That’s right, Travis CI has pip installed!

Next, I’ve set the script to use our test runner, fab test and we’re set. I had to add a few environment variables to turn off our coverage reports—they don’t provide much value when there’s no one there to view them—and we don’t need to do a re-install like we do on a local environment.

You can see this in action by checking out the current build status for the armstrong.core.arm_wells component here. Here’s the .travis.yml file’s contents:

rvm:
   - 1.9.3
before_script:
  - sudo pip install -r requirements/dev.txt
  - sudo pip install .
env:
  - SKIP_COVERAGE=1 SKIP_INSTALL=1
script:
  - fab test
notifications:
  email: false
  irc:
    - "irc.freenode.net#armstrongcms"

There’s work happening to bring native Python support. Native support means being able to test against multiple versions and such. Be sure to check out the #travis channel on Freenode if you’re interested in helping out.

Elegantly Simple

| Comments

JavaScript catches a lot of flack for it’s “ugliness,” but I’m rather fond of the language. It’s first-class functions make up for any quirks you have to deal with in the language. Consider this test case:

It generates this output when run with --spec:

I’m using test cases like this throughout my upcoming Programming Node.js book to test output of some of the simple scripts.

Yes, I know you can get some amazingly expressive test cases in other languages, but I dare people who say that JavaScript is any ugly language to find fault with this bit of code.

50 Days

| Comments

Shh… Don’t tell my editor I’m blogging. I’m procrastinating by writing this blog post instead of working on Programming Node. I’ll still get to that, but this is on the brain right now.

Today marks the 50th straight day of pushing code to GitHub. My work on Armstrong has made a lot of this possible—it’s easy to push code when you’re getting paid to write open source software—but not every day has been Armstrong related code.

During the course of the last 50 days, I’ve rediscovered a few things that I want to share, in case anyone else thinks that they can’t possibly do this without changing jobs.

Keep it small

I’ve written about manageable chunks in writing, but not in contribution. It’s easy to make excuses about why you aren’t pushing code on a daily basis. You need to clean the code up; its not good enough, yet; or it’s not really significant enough to make a difference.

Excuses. All of them.

Every single piece of code you write has importance. Otherwise you wouldn’t write it. There are exceptions to this rule, but those are outliers. Most of the stuff you and I would write and go to the trouble of committing is going to be useful to someone.

Case in point, earlier this week I helped add some interactivity to a timeline on the Texas Tribune. My contribution was trivial, but it might be useful to someone trying to do something similar, so it’s up on GitHub.

There’s always something

There is always something you can do with 5 minutes. I’ve made a lot of contributions to bash-it. Think of it as your terminal on steriods, with pretty colors. I started out with some minor tweaks, then found some places where code could be better handled, then other devs built on that, and I’ve started refactoring some other parts.

I spend the vast majority of my time looking at a terminal, so it needs to fit like a glove. Working on bash-it means I’m getting more and more familiar with my environment and making some pretty cool enhancements.

Find something that you use, something that would make your life a little bit better if it just had X, then go to town and try to figure out how to do X in it. My bash programming sucks. Seriously, I wouldn’t know where to start to write a real bash program, but I can muck around in the internals and figure it out. Just because you don’t know how to program in a language doesn’t mean you: 1) can’t, 2) shouldn’t, 3) aren’t fully capable of figuring it out as a smart human being which I know at least some of you are.

Just start

It’s really easy to get part of the way through a month and say “oh, I’ll start the first of next month.” No. No you won’t. Well, if you’re me you won’t. I have a horrible tendency to want to go big or go home. Not necessarily a bad thing in and of itself, but not good for just getting shit done.™

It’s especially bad when “going big” is “I’m going to commit code every day in a month” and you’re already into an existing month. Then ya wait and you lose that initial momentum.

So the answer for me is to just start. The raw #s are what matters. Get out there, do something, start tallying it up.

Armstrong on Vagrant

| Comments

We released our first version of Armstrong this past Wednesday. After taking a quick breather, I set out on getting Armstrong setup inside a Vagrant virtual machine to make evaluation easy. I finally got it running. There’s more information about getting started in the README, where it belongs, but I ran into some interesting technical issues while setting it up that I want to document here.

Vagrant + Puppet + pip

I initially wanted to create a full build-script inside Vagrant that could be used to setup the entire environment. I used puppet to start the process and found the puppet-pip provider so I was even going to be able to install Armstrong easily. Or so I thought.

There’s something that is happening when puppet runs pip that causes the installation to fail. I’m a big subscriber to select not being broken, but in this case I think there’s some odd in the combination of pip and puppet. The reason is that pip install armstrong via an ssh connection to the same virtual machine works. After briefly discussing it on #pip on Freenode, I opened ticket #298 which outlines the issues we ran into.

I finally decided to go the pragmatic route. For the time being I have a box that’s installed the way you would if you had a raw box yourself. It’s not ideal, but our new armstrong box (warning, that’s a 500mb download) boots up with everything you need to start playing with Armstrong.

Eventually, either I’ll figure out what the issue with pip+puppet is or I’ll switch to some other method that will work. My reason for picking puppet was pretty simple. The provisioning section of the getting started guide for Vagrant shows you puppet code and says essentially “Chef it too complex to simply show you how, so just use this prepared stuff.” I like simple. Right or wrong decision, I’m not 100% sure yet.

Django Server on Bootup

The server runs on startup thanks to upstart in Ubuntu. As far as Ubuntu is concerned, Armstrong is now a service that can be started and stopped with start armstrong, stop armstrong, and so on.

Upstart works on the concept of events. Different tasks emit different events that other tasks can be configured to react to. There’s a startup event and a net-device-up event and so on. I tried all manner of combinations before it dawned on me, the VM is booting, then Vagrant is mounting the NFS with the project.

Once I figured that part out, this recipe helped get things started. A quick task that starts monitoring for the config/development.py file that is mounted after booting was all I need to get runserver_plus going on “bootup”. You can check out the upstart scripts being used in the repository.

I chose runserver_plus from django-extensions rather than the built-in runserver because of issue 15880. Since I’m starting the script on start up, there’s no interactive interface and the watcher gets a little wonky. It works out though, because you get the awesome werkzeug debugger for development.

Closing

Minus a few oddities in the process, I’m really pleased with the end result. It should be noted that this is meant for development only. As we near our first stable release later this year I hope to be able to create another box that’s more deployment ready, but hopefully this will get you started down the right path.

TekXI Recap

| Comments

Had a good week at 2011’s version of tek. Thanks to Marco Tabini and his whole crew for putting together another great conference this year. I haven’t professionally developed in PHP for several years now, but still consider this a must attend conference. This was my 4th year. The people and the content make it worth attending, even though I’m mostly doing Python work these days.

I gave two talks this year, both on Git. Both talks went well, but my advanced Git talk needs some tweaking so I can get it in at an hour. I always plan too much material when I first give it, so it needs a little more taken out.

As promised, I am going to get both talks online over the next week. Each of the repositories we walked through in the advanced talk are going to be posted to GitHub in their “before” state that you can play with them. They also include README files that explain what you’re doing and how to do it.

I’ve already posted my amending and rebasing repositories. You can search my github for pres. to see all of the repositories. I’ll post again once I have them all up.

One of the more interesting evenings this year was a late-night hackathon that involved two 5 gallon kegs from Jason Sweat’s personal stash. I went to bed early’ish, but got this image emailed to me around 1:30. I’m told whiskey fueled its creation. :-)

Stealing Swicegood Code