Results tagged “URI”

This post is about the current state of URI encoding in Perl. This is the problem space of being able to safely pass arbitrary text into and out of a URI format. If you’ve even seen a space in URL represented as “%20”, that’s the topic of the moment.

The best general introduction I’ve found on the topic is the Wikipedia page on Percent-encoding.

RFCs on the topic include the 2005 RFC 3986 that defined the generic syntax of URIs. It replaces RFC 1738 from 1994 which defined Uniform Resource Locators (URLs), and RFC 1808 from 1995 which defined Relative Uniform Resource Locators. Sometimes this transformation is called “URI escaping” and sometimes it’s refered to “URL encoding”. RFC 3986 clarified the naming issue:

“In general, the terms “escaped” and “unescaped” have been replaced with “percent-encoded” and “decoded”, respectively, to reduce confusion with other forms of escape mechanisms.”

Elsewhere it’s clarified that percent encoding applies to all URIs, not just URLs.

I think the Perl community would do well to adopt “percent encode URI” and “percent decode URI” as ways to describe this process that is unambigous and in line with the RFC.

There are two URI percent-encoding solutions in Perl that seem to be in the widest use. Both have a significant deficiency.