rFeedParser, of course, is a Ruby translation of the Universal Feed Parser in Python and passes 98.8% of its 3000+ unit tests.
rchardet is a Ruby translation of chardet in Python and is used quite a bit in rFeedParser.
There are, of course, some things left to be done in both of these projects.
Off the top of my head, rFeedParser needs:
- to be able to use libxml if the user prefers, instead of the Expat binding
- to use version 0.4.1 of the
- someone to ask People Who Know if the way rfp strips out the bad stuff in the
*\_crazy.xmltests is acceptable
- to set up a git submodule for the tests in order to ease the merging in of tests from the feedparser repository
- a fix up to some of the regexes and lame matching code in it, especially the time parsing code
- resorting the incredibly ugly object hierarchy.
- other things I’ve forgotten and am too lazy too look up
- some information on whether using some gem-provided Tuple object instead of the giant Arrays would help the memory usage
- fix the other encoding bugs that Mark fixed when he released the version of rchardet that cleared up the little endian UTF-16 bug I reported
There’s still a lot of work to do, and I’m listening to your concerns and taking your patches. Hit the mailing list and we can all make this better.
Special Note for People Who Want to Help: Run
rake setup in your branch to install all the gems you need to run it.