A year has passsed since my last post on URIs and URLs andÂ it would seem that some of the concepts are still lost on some folks. With that said, I figured I’d throw up another post that I could try and address some of the questions raised in the comments of both posts.
URLs and URNs are both URIs
This is one point that can’t be stated enough. A URL is a URI and a URN is a URI, plain and simple. It’s really quite challenging to phrase it any other way. It’s like trying to explain that a Human is a Mammal but a Mammal is not always a Human.
Examples of URLs and URNs:
People have also suggested that these posts could have been more helpful if I had provided some examples that illustrate the difference between a URL and a URI. Based on the previous point, one should able conclude that a URL is a URI and therefore there’s no reasonable way to present an example that distingushes the two. However, we can provide examples that distingush a URL from a URN:
|All of these Examples are URIs:
|Examples of URLs:
|Examples of URNs:
Again, all of the examples above are all valid examples of URIs. You’ll note of course that all of the URNs are prefixed with “urn:”.
URIs are Opaque Identifiers
There’s a very informative page by Tim Berners-Lee that provides a lot of good deails on Uniform Resource Identifiers. One very important point is the notion of URI opacity, which states:
“The only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.”
When you followed the link to this page, you didn’t have to do anything other than clicking it. Your browser only had to look at the URI scheme and the domain in order to resolve this particular document. Everything after “damnhandy.com” is defined to be opaque to the client.Â This point may seem orthogonal to the original post, but it’s a very important aspect of URIs. I bring this point up because the following question was asked in the comments:
Can we say that:
is an URI
and that the “http://www.domain.tld/somepath/file.php” part is an URL?
No. No you cannot. Both are URLs, which are also URIs. The idea behind URI Opacity is that you should not look at the string to make any inferences as to what is at the other end. The presence of a query string does not distinguish a URL from a URI. Both strings are URIs and URLs.
Another commenter also asserted the following:
URI is whole address of a resource but resource extension is not mentioned. in URL we also mention the extension.
This is also incorrect. Being pedantic, neither is a syntactically correct URI or URL. But that aside, the presence of a file extention also does not distinguish a URI from a URL. Again, they are both valid URLs as well as URIs. Going back to URI opacity, you also cannot conclude that the two URLs reference the same resource or the same representation. The URI is an opaque string that identifies a resource. They’re still both URIs.
HTTP URIs Can Identify Non-Document Resources
This the one point that I think hits at the crux of everyone’s confusion on the matter. Things get all meta-meta when we start using URIs to identify things that are not documents on the web. It is perfectly acceptable to use to URIs to identify concepts, or non-document resources. If a URI happens to express scheme that suggests that it can be dereferenced, it is not a requirement that a document resource is actually dereferenced by that URI.
Most people’s expectation of HTTP URIs is that they it can always be dereferenced. There is a common assumption that an HTTP URI must point to a document. If it document resources is not available at that URI, that is you get a 404 error, then the URI is somehow bad. Just because the server could not locate document or a representation for the URI, does not mean that the HTTP URI is an invalid identifier.
The finer bits of this issue are summed up in httpRange-14 as well as this article by Xiaoshu Wang. The concepts around httpRange-14 are deep , but I think it’s these types of ideas that trips people up a lot. IMHO, it’s one of those concepts that gets muddled in the great internet telephone game that causes more confusion. For example, it seems that people are under the impression that if a URL does not express a file extension, then it represents a concept and therefore is a URI, but it just doesn’t work that way.
A URL is a URI and a URI can be used to represent anything. Furthermore, each URI is unique. This is why you cannot assume that http://www.abc/home is the same as http://www.abc/home.html just by looking at the URI. These are both distinct URIs that may or may not represent the same resource. Because URIs are opaque, you as the client should not be attempting to make any decisions about the URI.