Archive for the ‘code’ Category

The First Commitment

Sunday, May 4th, 2008

In the past few months, I’ve allowed myself to slip. I haven’t been making many public commits, nor discussing much where others can see. It has me feeling like a bodybuilder who hasn’t touched a set of weights in the same amount of time. My work and my writing has atrophied. My ability to maintain code that other people depend upon has suffered, and my ego has, as well. Time to sharpen up.

The first part of this new commitment is that I’ll be making a minimum of 3 commits a week to rFeedParser, no matter how small. This one is a stepping stone to taking on more of a workout, and It gives me time to reacquaint myself with the code base. rFP has weird and hairy parts in it because the problem it was solving was weird and hairy. However, there are a good number of ugly parts that were created because a) I wrote it with the Python version in the next window over causing me to write with a strong Pythonic accent; and b) I wasn’t as skilled in Ruby as I am now.

The module hierarchy alone proves I was diving in and not giving a fuck. At a certain point, I was just trying to get it to goddamn work and not caring what kind of hack-and-slash maneuvers I had to pull off to make it happen. With the distance from the problem and the clearer head I have now, I can piece together how it should be done.

The second part is a commitment to one commit a week to one of my public side projects. Right now, this consists mainly of the strictly-for-fun-and-I’m-keeping-it-that-way-fuckers framework I’m writing called Recess. Everyone writes a web framework, and I’m going to be That Guy, too.

I’ll try not to be too snooty about it, but if the framework turns out well (or, at all, really), I probably will be. Like I’ve said before, my ego knows no bounds. But, remember! It’s just for fun. Really. Really.

As my plans and projects grow and adapt and interests wax and wane, there will, of course, be a call to change this commitment. This two-part commitment is only the first of what will be a series of changing, and, likely, growing vows to myself. Look to see a lot more work from me.

rFeedParser on GitHub

Friday, May 2nd, 2008

Alright, it’s done. I’ve moved rFeedParser and rchardet to GitHub. Check out the rFeedParser and rchardet pages at GitHub and clone them with these URLs:

git://github.com/jmhodges/rfeedparser.git
git://github.com/jmhodges/rchardet.git

rFeedParser, of course, is a Ruby translation of the Universal Feed Parser in Python and passes 98.8% of its 3000+ unit tests. rchardet is a Ruby translation of chardet in Python and is used quite a bit in rFeedParser.

There are, of course, some things left to be done in both of these projects.

Off the top of my head, rFeedParser needs:

  • to be able to use libxml if the user prefers, instead of the Expat binding
  • to use version 0.4.1 of the character-encodings gem
  • someone to ask People Who Know if the way rfp strips out the bad stuff in the *\_crazy.xml tests is acceptable
  • to set up a git submodule for the tests in order to ease the merging in of tests from the feedparser repository
  • a fix up to some of the regexes and lame matching code in it, especially the time parsing code
  • resorting the incredibly ugly object hierarchy.
  • other things I’ve forgotten and am too lazy too look up

rchardet needs:

  • some information on whether using some gem-provided Tuple object instead of the giant Arrays would help the memory usage
  • fix the other encoding bugs that Mark fixed when he released the version of rchardet that cleared up the little endian UTF-16 bug I reported

There’s still a lot of work to do, and I’m listening to your concerns and taking your patches. Hit the mailing list and we can all make this better.

Special Note for People Who Want to Help: Run rake setup in your branch to install all the gems you need to run it.

Ruby and Rails Compete for Love

Wednesday, January 2nd, 2008

A thought: In the beginning, I wrote in Ruby because I liked using Ruby on Rails. But recently, I’m using Ruby on Rails because I like writing in Ruby.

I think it’s time to start looking at the options again.

Rob Pike Knows How To Scratch His Itches

Friday, October 12th, 2007

Found in lex.c of squint, the Unix implementation of Newsqueak (referenced from Rob Pike’s bio):

if(fd<0 && s[0]!=’/’ && s[0]!=’.'){
    sprint(buf, “/usr/rob/src/squint/include/%s”, s);
    fd=open(buf, 0);
}

That’s an hilariously awesome way to personalize your tools.

While I’m on the topic, trying to google up a copy of squint (or any implementation of Newsqueak) is a serious pain in the ass. And I’ll be the 50 kajillionth programming nerd to link to Rob’s excellent talk on Concurrency and Message Passing Newsqueak. You might have to watch it a few times to catch all of it, but it’s worth it.

For those of you wanting to play along on Mac OS X, be sure to add the code that Jeff Sickel talks about on the plan9 mailing list.

Oh, and for your information, I’m trying to figure out why squint will ignore the last line in a source file. If you append a blank line to the end of the file, everything runs fine. Very weird.

Building CouchDb on Mac OS X

Friday, September 21st, 2007

I, like Sam, *really* want to play with CouchDb. But I’m a MacOSX box that I barely understand after 3 months of ownership.

Install MacPorts and run:

sudo port install erlang icu subversion

Add these two lines to your .bash_profile (or .profile if you’re running tcsh).

export ERLANG_BIN_DIR=/opt/local/bin/
export ERLANG_INCLUDE_DIR=/opt/local/lib/erlang/usr/include/

Run those two commands in your current shell or open a new one. Now, back to the install.

cd ~/projects
svn co http://couchdb.googlecode.com/svn/trunk couchdb
./build.sh | tee couchdb_svn_build.log
./build.sh --install=$HOME/sys | tee couchdb_svn_install.log
mv couchdb_svn_*.log ~/sys/log

Now, for convenience, we set up an easy way to start the CouchDb server. This assumes that $HOME/sys/bin is in your $PATH. Make a file called couchdb in $HOME/sys/bin containing:

#!/bin/bash/
cd $HOME/sys/couchdb && ./bin/startCouchDb.sh

Next, fix its permissions:

chmod +x $HOME/sys/bin/couchdb

Then, start the server:

couchdb

(I follow this up with a ln $HOME/sys/bin/couchdb $HOME/sys/bin/db but that might not be best for you.) Finally, follow the rest of Sam’s post to get a quick introduction to CouchDb.

Bonus Round: Ruby on top of CouchDb.

There are two Ruby gems for work on top of CouchDb, couchobject and CouchDb-Ruby but couchobject seems the most promising. Why? Well, for one, its respository doesn’t include tests with syntax errors. And, two, it lets you write CouchDb views in Ruby, which is fantastic.

I haven’t gotten a chance to find its limitations, yet, but considering the deep magic involved and the 0.5.0 version number, I’m sure it has a few.

To get it, hit the site for the link to the tarball or grab the repository with:

git clone git://repo.or.cz/couchobject.git

I’m terribly excited. Enjoy!

Update: Now leaving Typo City.

erl_interface is Deprecated and I Hate the Erlang Docs

Sunday, September 2nd, 2007

I’ve been learning Erlang in fits and starts for a few months now, and trying to play with the C interface to it. Unfortunately, it wasn’t until tonight that I learned that the best documented interface, erl_interface, is deprecated in favor of ei. (That link is not all of the ei documentation. See the end of this post).

Oh, but you’ll still have to include erl_interface.h in order to get your code to run since ei requires it. Be careful to put -lerl_interface before -lei.

And all of the docs about interoperability with C have you including both ei.h and erl_interface.h and make no mention of the relationship between them or the deprecation of erl_interface. Hell, they barely mention ei, anyhow.

This is the kind of thing that makes languages on the verge of true popularity spin down until they find themselves in the graveyard of “interesting but irrevelant”. There will be no fiery crash, no awe-inspiring fight to the death, no raging against the dying of the light. Nothing more than the slow frost bite of a crumbling community.

I love what Erlang can do, but if you’re going to make a language that proudly shows it roots in a Prolog interpreter, you’ve got to give the plebeians like me a chance.

Oh, and could we get a decent math library while we’re at it? I shouldn’t have to break out to C just to work with matrices. Combined with a quickly made mnesia database you could have some serious distributed work going.

You would probably have to move to a more lightweight data structure than what mnesia gives, but what a great way to write a proof of concept! Of course, you’ll first have to find documentation for mnesia that isn’t ages old.

Since the mnesia and ei documentation (along with everything else) is pretty much unGooglable in all of those frames, I suggest using the documentation tarball (lastest release). In the the current release, hit ./otp_doc_html_R11B-5/lib/erl_interface-3.5.5.3/doc/html/application_ei_frame.html for ei and ./otp_doc_html_R11B-5/lib/mnesia-4.3.5/doc/index.html for mnesia.

Blech. The things you do for love.

I’ll be writing up my experiences and some posts to help others with ei as I go along. Let’s hope I can be as productive as I am critical.

OpenURI, Exceptions and HTTP Status Codes

Tuesday, August 7th, 2007

If you’ve needed the numeric HTTP status code from a connection created with either open-uri’s or rest-open-uri’s open method, you’ve probably noticed that OpenURI::HTTPError is raised on any thing other than a 2xx or 1xx status code and that the docs don’t really lay out how to get to the status code in that error. Some of you may have hacked up a the_error.to_s[0..2] solution, but that is bad and terrible. Don’t do it. Here’s the right way. (Good luck remembering it after a few weeks away, however.)

require 'open-uri' # or 'rest-open-uri'
begin
  io_thing = open(some_http_uri)

  # The text of the status code is in [1]
  the_status = io_thing.status[0]

rescue OpenURI::HTTPError => the_error
  # some clean up work goes here and then..

  the_status = the_error.io.status[0] # => 3xx, 4xx, or 5xx

  # the_error.message is the numeric code and text in a string
  puts “Whoops got a bad status code #{the_error.message}”
end
do_something_with_status(the_status)

There you go. You’ll notice that neither open-uri nor rest-open-uri use the Net:HTTP response classes like it claims you should in these cases, but you can map to them with the numeric status codes. All you need are the CODE_CLASS_TO_OBJ and CODE_TO_OBJ hashes defined in Net::HTTPResponse. The latter hash is probably preferable.

Update: Edited for stupidity.

How to Get Your Project Moving, or My Ego is Massive and You Should Listen to Me.

Monday, August 6th, 2007

So someone asks how they should go about getting a group of people together to work on a software project. I, with a massive ego propped up by very little talent, ability or experience, decided to answer it. Many of these ideas have been said elsewhere in various forms, but this seems to be a nice compaction of them.

How To Get People To Work On Your Project

Start writing it. No, really. Go start coding. Upload the notes, the sketches, whatever. Put it up on Google Code, Rubyforge or something similar. Haunt the IRC channels and mailing lists for the tools you are using and post news everytime you put out a new release. Be sure not to post in places that won’t care, and don’t post too much. Talk about it with friends. Even friends that have no idea what a compiler is.

The nerds will come to you but you’ve got to work your ass off first. No one, absolutely no one, who is any fucking good will come near your project if its nothing more than a few airy ideas. Excepting, of course, those close friends that you’ve already had long discussions with. But if you had friends with a clue, you wouldn’t need to ask this question.

I understand. I, too, have a group of friends who don’t share my peccadilloes. And it can be a strength. It teaches you to hone your description of what you are doing down to its simplest core. It teaches you how to tell yourself what you are doing. Clarity of thought is essential.

Most of your friends won’t understand it. Some will love it for terrible reasons. The rest will think its dumb. If you have a couple of shitty “friends” you only hang around with for historical reasons (like I do) you will invariably have the conversation swiped from you 20 seconds into your 30 second pitch because.. well.. because they are fucking assholes. Almost all of the opinions you hear will be worthless, even from friends with half a clue or more.

Cling to the positive responses. You and I know they mean nothing in relation to the project, but they mean everything in relation to you. It means they think you have good ideas in the areas they understand, and believe in you enough to have good ideas in the areas they don’t. Knowing that can ride you along when you’re coming back from another day at some shitty job to code for 10 more hours before crashing, or, at least, wishing you could crash as you pace your hallway thinking about what needs to be done.

Insomnia, more than likely, will be ever present.

You don’t have to code everyday, if it’s just a side project, but you’ve got to do it damn near. If you aren’t blowing off your friends to work on it at least once a month or more, you aren’t working hard enough on it. If this is really to be a start up, you need to be blowing them off all the time. It sucks. Get used to it. You’ve got a world to change. (Note: I have experience with side projects and have second-hand experience with startups)

But don’t overwork. Insight comes from your brain hashing together your work and your knowledge and your experiences from elsewhere. That mashup club that you heard about? Go check it out. Read some philosophy. Read some comic books. Read something you sort-of-know-about-but-not-really.

Read a shit ton about programming. programming.reddit and Planet Intertwingly are good places to start. The first for what all the cool kids are talking about and the latter for intelligent debate, and odd viewpoints all mixed together. (Er, I should mention I was recently added on to the blog roll there, but I was a huge fan of it way, way before that. In other words, I’m not a self-promoting jerk, just a regular old jerk.)

Write good code. Go back over older code and rewrite it. Then come back later and rewrite it again. Make it better. But don’t stop coding because you can’t “get it right”. If its ugly or sucks or doesn’t pass the tests, put in some placeholder code with a FIXME comment above it. This can be a good place for others to help fill in the gaps but never, ever leave something like that if all you have is the hope that someone will fix it. Ask them, or do it yourself later.

Test. Test a lot. Write tools or use already established tools to make it easier for you. I suggest the latter, though I’ve had to adapt other’s work to test my current “big project“. If your project is different enough or big enough, there’s a chance you’ll be adapting the work of others, too. Testing is what will remind you that you put all of those shitty FIXME comments in your code.

If you stop coding for a couple of days, get pissed at yourself and code angry. Code real fucking angry.

You might break a keyboard, but it’ll be worth it. This is one you have to experience to believe. There is little like coding through your frustration, aggravation and even constipation and finally, finally getting it right.

Anger and love and frustration and elation and sadness and comfort, each of these you will feel when you are coding. Some you’ll feel more than others. All of them will, at some point, make you want to stop. Don’t.

Remember: coding isn’t just putting characters in a text editor, but all those hours you spent thinking don’t count until you punch the fucking keys. Time isn’t your enemy as much as your will to continue on is. And time is a big fucking enemy in a startup.

Go. Go fucking hard. That’s how you get people to come in. Oh, and you’ll probably fail. But it’ll be a good failure. It’ll be the kind of failure that you can turn into a victory later.

People say failure “builds character” or “helps you grow”. That’s pretty much just a bullshit short way of saying this:

You don’t know how you’ll fuck up until you do. The next time you’re about to fuck up, you might see how to not fuck up. You might see a new opportunity because of the way things fucked up, or the state your fuck up left you. Also, some other people with experience or money might see your fuck up and realize that a) you’ve actually got some chops or b) you could have some chops with some help. They will help you. Maybe. Fucking up will be easier next time, except when it doesn’t but if you’ve got chops it all starts coming together. Eventually.

Fail hard. Fail with motherfucking gusto. Succeeding, like flying, is throwing yourself to the ground and missing.

Good luck.

libiconv and rFeedParser

Sunday, July 22nd, 2007

I got a chance to read libiconv’s DESIGN document (found in the tarball) and noticed this passage:

Extensibility

The dlopen(3) approach is good for guaranteeing extensibility if the iconv implementation is distributed without source. (Or when, as in glibc, you cannot rebuild iconv without rebuilding your libc, thus possibly destabilizing your system.)

The libiconv package achieves extensibility through the LGPL license: Every user has access to the source of the package and can extend and replace just libiconv.so.

The places which have to be modified when a new encoding is added are as follows: add an #include statement in iconv.c, add an entry in the table in iconv.c, and of course, update the README and iconv_open.3 manual page.

The upshot of this is that adding new encodings through some iconv-encodings package will be a pain in the ass and would cause breakage in unexpected, fascinating ways. But, there are smarter people than I out there, and maybe something can still be done.

Of course, this also means that we would not get FreeBSD “for free” (though, I imagine xmlparser doesn’t build on it, anyway) and we would have to come up with a solution for it as well.

What a mess.

On rFeedParser

Sunday, July 22nd, 2007

This post is huge but I have not the time to make it smaller. I’m so very tired.

A Quick Introduction

rFeedParser is a RSS/Atom feed parser. It is a translation of Mark Pilgrim’s feedparser from Python to Ruby. It behaves almost exactly the same and passes somewhere near 99% of the tests on a Ubuntu machine. Other platforms suffer from lesser success rates due to differing Iconv installations. The feedparser documentation applies to this work, and almost any deviation from it should be considered a bug. Please file any bugs you find.

This project was inspired by Sam Ruby’s pirate testing idea, one that I hope catches on beyond these feed parsers.

The Basics

require 'rubygems'
require 'rfeedparser'

feed = FeedParser.parse('somefeedurlorfilepath')

first = feed.entries.collect{|e| e['title'] }
second = feed['entries'].collect{|e| e.title }
if first == second
  puts “This is handy when dealing with e['id'], the guid of an item/entry”
end

Installation

Agh. rFeedParser is a monster. Tons of dependencies, some overlapping in areas, and one “not nice” dependency. The “not nice” dependency is on Yoshida Masato’s xmlparser.

You can either install it by hand (be sure to add return in front of stream in saxdriver.rb, line 171), or install through “sudo apt-get install libxml-parser-ruby1.8” if you’re on Ubuntu or another Debian-based Linux, or through the xmlparser gem that I put together that seems to work on only “some” Mac machines but all Linux boxes. xmlparser, of course, depends on the Expat XML parsing library, and be sure to install the -dev, -devel or whatever version has the full headers and libraries available for linking against if you install through MacPorts or by hand.

The Latest and Greatest

The latest version is 0.9.93… Okay, really, the latest version is 0.9.931. There was a minor bug that, if it hadn’t been for the guilt of having put off the user who had brought it to me, I wouldn’t have worried about forgetting in 0.9.93. He/she (no name, just an email address) had been so nice about it.. So, future users, take note: if you see a bug I haven’t fixed yet, guilt seems to work. Also, bribery. Patches certainly don’t hurt.

The 0.9.93 and 0.9.931 updates do a number of things:

  • Fix a horrendous error when handling content:encoded, body, xhtml:body, prodlink and fullitem
  • Added some further support of Yahoo Media RSS. I’ve added support for media:thumbnail and media:content (the latter, only in its “two tag” form). This came directly from a requirement in our project at work. Mark, you should admire my ability to get paid for this.
  • Fixed up the lame ass headers code I had going. I don’t remember what I was on when I wrote it, but it must have been fantastic.
  • py2rtime had some major bugs that I can’t understand how they passed the tests. I will give a dollar to anyone who figures it out, mainly because I don’t want to deal with it. See revision 57, and compare to both revision 58 and the current code in the repository.
  • Use rchardet 1.1. There was a rather serious bug in 1.0. Never use gsub! ever, ever, ever, ever. Maybe sometimes.
  • Some messed up indentation. Neither vim nor Textmate can indent ruby code well, it seems. Or maybe I write weird looking code. Luckily, I’m reading the Dragon book and learning things and I may decide to tackle it.
  • ForgivingURI continues to be something I desperately want to see in the Ruby core libraries. URI.parse shouldn’t puke everytime some loser fucks up his syntax. At least, give me something more than “bad URI(is not URI?)” no matter what the problem is. Something I stole from Bob Aman FeedTools.

Speaking of patches, those interested in helping development can find a bzr repositories for rfeedparser on this very site. This is probably dumb, and a bandwidth hog, but I’m too lazy to either a) go to my workplace and log into my Ubuntu box with bzr-svn or b) patch svn on the Mac laptop I’m currently writing on to put it up on rubyforge.

Gotchas, Monkey Patches and Other Disgusting Things

Now, on to the ugly.

As Sam points out me pointing out, the original feedparser tests require the parsed times to be stored in Python’s 9-tuple format. For those of you who aren’t jargon whores, that’s basically a list of 9 integers specifying the date. Unfortunately, Ruby doesn’t have a method in Time that can take that format. The solution, for our purposes, is to use the py2rtime top-level method I wrote that does the (very easy) task of putting the 9-tuple in a form Time.utc can understand. (Also, Sam’s suggestion of naming it feeddate sounds pretty damn good).

Also, the SGMLParser in HTMLTools is kind of broken. The Regexps don’t really work as intended (which I really need to send in patches for) and its really, really not UTF-8 safe. Oh, god. Making it UTF-8 safe involved code so ugly, so treacherous, that I will probably get cancer from it.

The UTF-8 stuff, of course, isn’t the developers fault. Ruby’s encoding support sucks so much that it seems quite a few people thought it would make writing a decent feed parser nearly impossible.

So, how did I do it? Through beta software, overlapping dependencies, relying on iconv (which is always terribly configured in any operating system) and a total disregard for passing the encoding tests. That’s right, rfp uses both the character-encodings gem and ActiveSupport and we still have dozens of failures and errors, the number of each depending on what OS we’re on!

So, most of the former Eastern Bloc just won’t get to use rFeedParser for a while. Sorry. (Hey, Hungary, it supports your datetimes! Does that make you feel better?)

If someone could magic up some sort of iconv-encodings gem or tarball that can give us a standard iconv install to work with, we might be able to make the encoding situation better. I would do it, however, I have got shit to do that doesn’t make me want to gather up shove ballpoint pens into my brainstem. Or slit my wrists with codepoints. (I’m pretty sure I could come up with a physically realizable way to approximate the latter.) Sigh, maybe I’ll get to it later, but I’d love to have some help.

On to the straight-up monkey patches.

There’s a few on Hpricot, but they have very little impact. Maybe making Hpricot load a bit slower on boot due to the huge element lists I put in there. Also, there is a method called Hpricot.scrub, but it is no longer the Hpricot.scrub that you know so well. It originally was, but I needed to do some extra things that added a couple of scans on top of the two already in there and, suddenly, it was a bottleneck. So, apologies for the confusing name.

(Jeff Hodges’ Trivia Time: The guy who wrote Hpricot#scrub, Michael Moen, is the guy who “officially” put Jeff’s name in for the position at ICTV. He and Jeff work together on the same Ruby on Rails application as members of the ActiveMedia Group. When discussing new problems with Michael, Jeff is often boggled by Michael’s clarity of thought.)

Oh, and one more monkey patch. xmlparser doesn’t return the attributes of the XML tags as a Hash, but SGMLParser does and it would have been pretty damn handy if it did, so I made it do that. The code is in better_attributelist.rb (my filenames are full of ego), and it could be done better, but it suits my purpose.

Other ugly things: ForgivingURI (as mentioned above) and the inconsistent naming of methods that came about after a few bad nights of hacking through Ruby’s inheritance problems. I fixed the actual architectural problem long ago, but left the terrible names in there. So, the self.fooThing and _hasDumbPrefix stuff is my bad. Except for the methods in FeedParserMixin that are named after XML tags. Those names are prefixed with ‘_’ (and is even in the original Python code) in order to work around the differences between the XML parser and SGML parser.

I should also mention the metric ass load of datetime parsing regular expressions I had to write. Another set of patches I need to write, this time to Ruby core. I don’t even want to discuss them. Go look at time_helpers.rb and see how many times I made one problem into two. My code is grody.

The Future of the Tests

Sam brings up the idea of making the tests from the Python feedparser less, er, Pythonic. We could speed up response time If we change the expectations for dates to some method calling a 9-tuple (or rather, a 9-list or 9-Array, or 9-some-datastructure-with-brackets-not-parentheses.) we could get an instant win. I have no idea what I was trying to say here.

Also, the use of u'', u"" and the \unn or \unnnn format for non-ASCII characters in Python had to be hacked around with regular expressions. While the character-encodings gem provides something like the u'' syntax, the \u characters are completely unsupported. It’s really ugly, and kind of painful, esp. if a developer never had much experience with Python. Fortunately, I had a good deal but probably not enough considering the amount of time it took to write those Regexps.

The XML test files are a huge boon and make them more general would make it easier to maintain code equivalence across languages and allow those who are more comfortable in one language to help outside of that language’s project. But, this is all just blue sky stuff for the moment.

And Spent

This post is huge and I need to stop writing. I don’t think I’ve talked about everything I wanted to, but I’m shot. rFeedParser is nice and you should use it and tell other people to use it. Questions and comments are welcome.

Update: A few grammar and spelling clean ups. Sucktasia on ice.