A year has passsed since my last post on URIs and URLs and it would seem that some of the concepts are still lost on some folks. With that said, I figured I’d throw up another post that I could try and address some of the questions raised in the comments of both posts.
URLs and URNs are both URIs
This is one point that can’t be stated enough. A URL is a URI and a URN is a URI, plain and simple. It’s really quite challenging to phrase it any other way. It’s like trying to explain that a Human is a Mammal but a Mammal is not always a Human.
Examples of URLs and URNs:
People have also suggested that these posts could have been more helpful if I had provided some examples that illustrate the difference between a URL and a URI. Based on the previous point, one should able conclude that a URL is a URI and therefore there’s no reasonable way to present an example that distingushes the two. However, we can provide examples that distingush a URL from a URN:
All of these Examples are URIs: | |
---|---|
Examples of URLs: | mailto:someone@example.com http://www.damnhandy.com/ https://github.com/afs/TDB-BDB.git file:///home/someuser/somefile.txt |
Examples of URNs: | urn:mpeg:mpeg7:schema:2001urn:isbn:0451450523 urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66 |
Again, all of the examples above are all valid examples of URIs. You’ll note of course that all of the URNs are prefixed with “urn:”.
URIs are Opaque Identifiers
There’s a very informative page by Tim Berners-Lee that provides a lot of good deails on Uniform Resource Identifiers. One very important point is the notion of URI opacity, which states:
“The only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.”
When you followed the link to this page, you didn’t have to do anything other than clicking it. Your browser only had to look at the URI scheme and the domain in order to resolve this particular document. Everything after “damnhandy.com” is defined to be opaque to the client. This point may seem orthogonal to the original post, but it’s a very important aspect of URIs. I bring this point up because the following question was asked in the comments:
Can we say that:
“http://www.domain.tld/somepath/file.php?mykey=somevalue”
is an URI
and that the “http://www.domain.tld/somepath/file.php” part is an URL?
No. No you cannot. Both are URLs, which are also URIs. The idea behind URI Opacity is that you should not look at the string to make any inferences as to what is at the other end. The presence of a query string does not distinguish a URL from a URI. Both strings are URIs and URLs.
Another commenter also asserted the following:
URI is whole address of a resource but resource extension is not mentioned. in URL we also mention the extension.
for example:URI: http://www.abc/home
URL: http://www.abc/home.html
This is also incorrect. Being pedantic, neither is a syntactically correct URI or URL. But that aside, the presence of a file extention also does not distinguish a URI from a URL. Again, they are both valid URLs as well as URIs. Going back to URI opacity, you also cannot conclude that the two URLs reference the same resource or the same representation. The URI is an opaque string that identifies a resource. They’re still both URIs.
HTTP URIs Can Identify Non-Document Resources
This the one point that I think hits at the crux of everyone’s confusion on the matter. Things get all meta-meta when we start using URIs to identify things that are not documents on the web. It is perfectly acceptable to use to URIs to identify concepts, or non-document resources. If a URI happens to express scheme that suggests that it can be dereferenced, it is not a requirement that a document resource is actually dereferenced by that URI.
Most people’s expectation of HTTP URIs is that they it can always be dereferenced. There is a common assumption that an HTTP URI must point to a document. If it document resources is not available at that URI, that is you get a 404 error, then the URI is somehow bad. Just because the server could not locate document or a representation for the URI, does not mean that the HTTP URI is an invalid identifier.
The finer bits of this issue are summed up in httpRange-14 as well as this article by Xiaoshu Wang. The concepts around httpRange-14 are deep , but I think it’s these types of ideas that trips people up a lot. IMHO, it’s one of those concepts that gets muddled in the great internet telephone game that causes more confusion. For example, it seems that people are under the impression that if a URL does not express a file extension, then it represents a concept and therefore is a URI, but it just doesn’t work that way.
A URL is a URI and a URI can be used to represent anything. Furthermore, each URI is unique. This is why you cannot assume that http://www.abc/home is the same as http://www.abc/home.html just by looking at the URI. These are both distinct URIs that may or may not represent the same resource. Because URIs are opaque, you as the client should not be attempting to make any decisions about the URI.
Pingback: What does URL and URI stand for? » Machiine
Pingback: reference land | Pearltrees
Draw a Venn diagram.
LikeLike
The Light is on now! and I think I am home. Your post clears up most of my confusions about URI’s. Thanks
Site: dremel-trio
LikeLike
Wow im still confused. I still say URL …
LikeLike
Most comprehensive data I’ve read on this interesting subject
LikeLike
You are right! When I first learnt to use webbrowsers, I learnt to type the addresses using http: or ftp: for specifying the protocol. Then, I learnt people refer this address as URLs. After many years, I heard a new term – URI. I accepted it. It was fine until I tried to learn C#. This where I got confused with the URI term. Namespace? But looks like I can type it in my browser. Will my program always refer to MS site to download the libraries and lots of other confused questions. Turned to Dr Google to find answer only to find more confusions. Its like in stupid teaching another stupid. Your answer this time is spot on. The sentence that say URL is URI and URN is URI connects every bits of information that I had. Thanks.
LikeLike
I think it finally made sense to me in the context of XML Namespaces which are URIs (well, IRIs now). This means you can use “urn:isbn:0-486-27557-4” (URN) or “file:///home/username/RomeoAndJuliet.pdf” (URL) for the namespace which makes the namespace unique rather than pointing at a particular document on the web.
See also the wikipedia entry: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
LikeLike
Even in Wikipedia the concepts URI, URL, and URN are not made clear. The image http://commons.wikimedia.org/wiki/File:URI_Venn_Diagram.svg is used in several articles. What does the color mean? why are URL and URN blurred? A clean Venn diagram looks different.
LikeLike