Towards a Pure Perl HTML::FillInForm: 61% tests passing

kentucky creek I have a goal of distributing Titanium along with all of its dependencies. The vision is to have a pure perl stack, so it’s easy to unpack and start using the distribution, without worrying about binaries tied to a specific platform.

One stumbling block with this has been HTML::Parser, which requires XS and is dependency of HTML::FillInForm, which is used by CGI::Application::Plugin::ValidateRM.

I’ve been working with Ron Savage on a Pure Perl replacement for HTML::Parser, and I made good progress over the weekend based on his foundation. He created HTML::Parser::Simple, which is really is two major parts in a one. First, it provides a pure-perl HTML parser. Secondly, it also provides a specific use of the parser, which is store a representation of a HTML documentation as a Tree::Simple data structure.

In a future version, I’d like to see the Tree::Simple part split into it’s own module to clarify the parts, and to eliminate this dependency for people who don’t use it. For example, in the new sub-class I created, I don’t use the Tree::Simple object at all.

I made a short-term goal of getting all the tests for HTML::FillInForm to pass when we modify one line in HTML::FillInForm to say that it “isa” HTML::Parser::Simple::Compat” rather than being a sub-class of HTML::Parser. Then I run the HTML::FillInForm test suite and see how many tests pass and file. Currently our parser emulates enough of HTML::Parser to pass 61 percent of the tests.

The idea with “HTML::Parser::Simple::Compat” is provide an API that is compatible (enough) with the HTML::Parser 2.x API. I think the approach seems promising, but there is still more to do:

  • I have a focused on getting tests to pass, so the documentation is still poor
  • Perhaps my “Compat” sub-class should really be merged into the parent class. I’ll discuss this with Ron
  • There are few of our own tests, as I’ve been focusing on getting HTML::FillInForm tests to pass. Perhaps more of those tests could be “ported” into own test suite.
  • Performance and memory consumption have not been benchmarked.

If this project interests you, feel free to fork the project on github and start contributing code or other feedback.

Leave a comment

Recent Entries

  • generating HTTP headers: sorted or unsorted?

    Recently I've been reviewing how various Perl frameworks and modules generate HTTP headers. After reviewing several approaches, it's clear that there are two major...

  • The cost of saving sent e-mail

    I don't tap my own phone. I don't xerox postcards before I mail them back from vacation. I don't take a voice recorder when...

  • Modifying PDFs so they open full screen

    The [PDF spec](http://www.adobe.com/devnet/pdf/pdf_reference_archive.html) includes an option to cause PDFs to open full screen when users open them. I'm a fan of the feature because...

  • Stewardship and Sustainability of our online lives

    A few weeks ago I had my laptop stolen. Earlier that morning I had been reflecting and writing on the laptop about the intersection...

  • A vision for CGI.pm and CGI::Simple

    I've spent a lot time recently [triaging bugs for CGI.pm](http://mark.stosberg.com/blog/2009/08/almost-100-cgipm-bugs-closed-help-with-the-50-still-open.html). I've enjoyed the process, and respect CGI.pm as a widely used Perl module. I'm...

Close