W3C Linked Enterprise Data Workshop and more about URLs vs. URIs

Last week, I had the pleasure of attending the W3C workshop on Linked Enterprise Data Patterns. My colleague Ora Lassila and I gave a presentation on the isues of identity in Linked Data. You can read our position paper and view our presentaion here:

It would seem that many share the same frustrations that Ora and I have with the confusion between URLs and URIs, and the idea that URLs are queries and URIs are identifiers. Readers of this blog will note that over the past few years I have been trying to clarify the difference between a URL and a URI. I think it’s safe to say that even after multiple, detailed posts on the topic, the distinction between the two remains murky for some. Hopefully we can start to clear things up.

W3C Linked Enterprise Data Workshop and more about URLs vs. URIs

Amazon Makes the Kindle Fire an Easy Target for a Thief

Having recieved a Kindle Fire this year, I was really surprised at the unpacking process and minimal security. While the device is very nice, the whole unpacking process made me realize that the Kindle Fire is a package thief’s dream. Here’s a few reason why:

Packaging
One thing that makes stealing a Kinlde Fire easy is the packaging, Amazon makes it quite clear what is in the box. When it arrives to your door, or your bulding’s mail room, the package is clearly marked with the words “Kindle Fire” all over it. In my case, the package sat in our building lobby clearly advertising what’s inside:

This makes it really easy to identify which package has the 7-inch tablet inside.

Automatic Sign-In
When I first opened the package and turned on the Kindle for the 1st time, it skipped past the sign-in screen since my wife’s credentials where already entered and she was automatically logged into Amazon. At first, this seemed like a nice touch for folks like my grandmother who don’t quite get sign-in procedures. However, this approach has far more cons than advantages.

1-Click Enabled by Default
The real kicker is that 1-Click is enabled by default. When you purchase content, you are never prompted for a password. This gives potential gift takers an added bonus: the ability to purchase Amazon content on your dime. And if you have kids, they can snag games as often as they want.

So if you’re considering getting an Amazon Kindle for someone this year, make sure you check the box that says “this is a gift” as this will make sure the device doesn’t sign you in automatically when you get it. Even if you’re going to get it for yourself, this is probably a good thing to do. Second, if you live in an apartment building or condo, you might want to consider shipping the box to your work or someplace where the box isn’t in plain sight.

 

Amazon Makes the Kindle Fire an Easy Target for a Thief

URL vs. URI vs. URN: The Confusion Continues

A year has passsed since my last post on URIs and URLs and it would seem that some of the concepts are still lost on some folks. With that said, I figured I’d throw up another post that I could try and address some of the questions raised in the comments of both posts.

URLs and URNs are both URIs

This is one point that can’t be stated enough. A URL is a URI and a URN is a URI, plain and simple. It’s really quite challenging to phrase it any other way. It’s like trying to explain that a Human is a Mammal but a Mammal is not always a Human.

Examples of URLs and URNs:

People have also suggested that these posts could have been more helpful if I had provided some examples that illustrate the difference between a URL and a URI. Based on the previous point, one should able conclude that a URL is a URI and therefore there’s no reasonable way to present an example that distingushes the two. However, we can provide examples that distingush a URL from a URN:

All of these Examples are URIs:
Examples of URLs: mailto:someone@example.com
http://www.damnhandy.com/
https://github.com/afs/TDB-BDB.git
file:///home/someuser/somefile.txt
Examples of URNs: urn:mpeg:mpeg7:schema:2001urn:isbn:0451450523
urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66

Again, all of the examples above are all valid examples of URIs. You’ll note of course that all of the URNs are prefixed with “urn:”.
URIs are Opaque Identifiers

There’s a very informative page by Tim Berners-Lee that provides a lot of good deails on Uniform Resource Identifiers. One very important point is the notion of URI opacity, which states:

“The only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.”

When you followed the link to this page, you didn’t have to do anything other than clicking it. Your browser only had to look at the URI scheme and the domain in order to resolve this particular document. Everything after “damnhandy.com” is defined to be opaque to the client. This point may seem orthogonal to the original post, but it’s a very important aspect of URIs. I bring this point up because the following question was asked in the comments:

Can we say that:

“http://www.domain.tld/somepath/file.php?mykey=somevalue”

is an URI

and that the “http://www.domain.tld/somepath/file.php” part is an URL?

No. No you cannot. Both are URLs, which are also URIs. The idea behind URI Opacity is that you should not look at the string to make any inferences as to what is at the other end. The presence of a query string does not distinguish a URL from a URI. Both strings are URIs and URLs.

Another commenter also asserted the following:

URI is whole address of a resource but resource extension is not mentioned. in URL we also mention the extension.
for example:

URI: www.abc/home
URL: www.abc/home.html

This is also incorrect. Being pedantic, neither is a syntactically correct URI or URL. But that aside, the presence of a file extention also does not distinguish a URI from a URL. Again, they are both valid URLs as well as URIs. Going back to URI opacity, you also cannot conclude that the two URLs reference the same resource or the same representation. The URI is an opaque string that identifies a resource. They’re still both URIs.

HTTP URIs Can Identify Non-Document Resources

This the one point that I think hits at the crux of everyone’s confusion on the matter. Things get all meta-meta when we start using URIs to identify things that are not documents on the web. It is perfectly acceptable to use to URIs to identify concepts, or non-document resources. If a URI happens to express scheme that suggests that it can be dereferenced, it is not a requirement that a document resource is actually dereferenced by that URI.

Most people’s expectation of HTTP URIs is that they it can always be dereferenced. There is a common assumption that an HTTP URI must point to a document. If it document resources is not available at that URI, that is you get a 404 error, then the URI is somehow bad. Just because the server could not locate document or a representation for the URI, does not mean that the HTTP URI is an invalid identifier.

The finer bits of this issue are summed up in httpRange-14 as well as this article by Xiaoshu Wang. The concepts around httpRange-14 are deep , but I think it’s these types of ideas that trips people up a lot. IMHO, it’s one of those concepts that gets muddled in the great internet telephone game that causes more confusion. For example, it seems that people are under the impression that if a URL does not express a file extension, then it represents a concept and therefore is a URI, but it just doesn’t work that way.

A URL is a URI and a URI can be used to represent anything. Furthermore, each URI is unique. This is why you cannot assume that http://www.abc/home is the same as http://www.abc/home.html just by looking at the URI. These are both distinct URIs that may or may not represent the same resource. Because URIs are opaque, you as the client should not be attempting to make any decisions about the URI.

URL vs. URI vs. URN: The Confusion Continues

The AppleTV is an Anomaly

I have to admit that the new $99 AppleTV caught my attention when it was announced a few days ago. Small, cheap, works with iTunes, Netflix built in, and all for $99. Sounds pretty cool, right? I don’t think so anymore.

See, in my home everything is wireless and there are only laptops and soon a NAS. I want to have access to all of our files and media assets in one spot and be able to access those files on multiple devices. Most decent NAS solutions provide the storage and retrieval functionalities I need. What’s missing is the devices to play it back on. Clearly, I can play back stuff on my Mac and PC laptops and other devices. Getting media on my TV is another story.

The new AppleTV  has no internal storage and only streams media files from another source. In order to do this, a Mac or PC has to be running with iTunes open in order to stream the media files. This is rather inefficient since one of my laptops would have to be running while I’m watching TV. Now one could use a NAS to stream media files. Most NAS devices can stream audio via some type of iTunes friendly media server. Video likely won’t work on such NAS devices since some content would be DRM’d with FairPlay.  Since Apple does not support DLNA and has gone the proprietary route, there isn’t really a good way to stream video from a centralized source other than a desktop running iTunes.

Apple appears to be hanging on to the “digital hub” mentality whereby the Mac is still the center of “your digital life.” Actually, my  so called “digital life” resides in the data itself and not so much the mechanisms that I use to access it. At $99, the Apple TV isn’t a bargain once you start to realize the extras you’ll need in order to make it participate in a complete solution. For me, this complete solution doesn’t appear to exist yet. The closest thing that comes to it doesn’t bear an an Apple logo but rather a Windows one.

The AppleTV is an Anomaly

Protocol Buffers are not very “RESTish”.

There’s been a lot of activity recently around highly optimized, binary serialization formats such as Google Protocol Buffers and Thrift. There have been some attempts to include formats such as Protocol Buffers into their “REST API” because of perceived performance benefits without consideration of the downside of an IDL format with respect to REST.

To illustrate why Protocol Buffers inherently adulterate REST architectural style, I’ve compiled the following points:

Imposes a Tight Coupling

This tight coupling is considered a REST anti-pattern and should be avoided. Protobuf over HTTP is probably a form of RPC-URI tunneling, it is not REST. In order to work with protobufs efficiently, both the client and server must have code generated by the .proto definition.

Strongly Typed Resources

A key point made my Roy T. Fielding in one of his more informative blog posts is that:

A REST API should never have “typed” resources that are significant to the client. Specification authors may use resource types for describing server implementation behind the interface, but those types must be irrelevant and invisible to the client. The only types that are significant to a client are the current representation’s media type and standardized relation names.

I’m not totally sure I’ve got this one 100% correct, but it would appear that the use of protobufs is now also exposing typed resources to the client. Additionally, there is no means to specify which type to ask for. Which leads me to my next point:

Ambiguous Media Type

Subbu Allamaraju has pointed out that the protobuf media type doesn’t doesn’t convey enough information as to indicate what message type is in the protobuf representation in a post about a Protobuf provider for JAX-RS. Protobuf doesn’t have an official media type, but folks generally use one of the following media types:

     application/x-protobuf
     application/vnd.google.protobuf

Protobufs are not generic and the representation contains data for a very specific message. Furthermore, the serialized form contains no information as to what type(s) it represents. So although you may have specified the media type in the request, there is no mechanism to specify which protobuf message we actually want. One suggestion as to how this might be resolved is to specify the .proto package and message name by using a parameter:

     application/vnd.google.protobuf;proto=mypkg.MyMessage

The approach seems reasonable enough. But as Subbu also points out, this solution also requires that the .proto be shared between the client and server in order to establish the binding. Clearly there’s a number of options that could be applied share the .proto definition, but all of them involve some type of external configuration in order to resolve the .proto since the protobuf wire format is not self-describing.

Self-Describing formats must be Self-Describing

Protocol Buffers are not self-describing by design. The Google documentation even states that:

Protocol Buffers do not contain descriptions of their own types. Thus, given only a raw message without the corresponding .proto file defining its type, it is difficult to extract any useful data.”

This isn’t necessary a bad thing in general, but it does violate a core tenet or RESTful architecture. Section 5.2.1.2 of the Roy T. Fielding’s dissertation states that a proper representation consists of the following:

  • The data
  • Metadata that describes the data
  • Optional metadata about the metadata, usually used to for integrity checking.

Protocol Buffers only handle the data. Google has presented an option for making protobuf messages self-describing. While this works, it still has three fundamental flaws:

  1. The SelfDescribingMessage itself is not self-describing ;)
  2. There’s still a tight coupling between client and server for the SelfDescribingMessage as generated code is still required on both client and server.
  3. It doesn’t address dependency resolution. That is, message you’re sending over might extend or depend on other .proto definitions. How you resolve those dependencies is a challenge.

The dependency resolution is no small challenge and becomes a major headache if you’re extending messages from other parties.

Application State is not Hypermedia driven

Protocol Buffers do not have a means to describe links to external messages. Most applications I’ve encountered that use protobufs either send the entire message tree in a single call or there is some type of ID value that the client will have to dereference somehow. The latter ultimately results  in either forcing the client to infer and generate URLs, which is yet another REST anti-pattern.

Wrapping Up

This isn’t to say that Protocol Buffers are inherently bad. Google has proven that protobufs have value when used as intended. Outside of an RPC system, protocol buffers are really don’t seem appropriate for web applications who claim to be RESTful. Of course, I could be complete wrong here and I’m sure the internets will be certain to let me know.

Protocol Buffers are not very “RESTish”.

URL vs. URI vs. URN, in More Concise Terms

Without a doubt, the URL vs. URI post is by the most visited page on this blog. Even so, there’s still a lot of confusion on the topic and so I thought I’d break it down in less words. The original post was slightly misleading in that I attempted to compare URI to URL, when in fact it should have defined the relationship between URI, URL, and URN. In this post, I hope to clear that in more concise terms. But first, here’s a pretty picture:

uri_class_diagram

Both URLs and URNs are specializations, or subclasses of URI. You can refer to both URLs and URNs as a URI. In applictaion terms: if your application only calls for a URI, you should be free to use either or.
Now, here’s where the big difference between URN and URL: a URL is location bound and dereferencable over the web. A URN is just a name and isn’t bound to a network location. However, BOTH are still valid URIs. Now, if the application requires a URI that is bound to a network location, you must use the specialization of URI called URL.

Remember that URI stands for Uniform Resource Identifier, which is used to identify some “thing”, or resource, on the web. Both URLs and URNs are specialization’s (or subclasses if you will), of URI. You’d be correct by referring to both a URL or URN as a URI. In applictaion terms: if your application only calls for an identifier, you should be free to use either a URL or a URN. For brevity, you can state that the application simply requires a URI and the use of a URL or URN would statisfy that requirement.

Now if your application needs a URI that dereferencable over the web, you should be aware of the difference between URN and URL. A URL is location bound and defines the mechanism as to how to retrieve the resource over the web. A URN is just a name and isn’t bound to a network location. For example, you may have a URN for a books ISBN number in the form of urn:isbn:0451450523. The URN is still a valid URI, but you cannot dereference it, it’s just a name used to provide identity. So to put in simpler terms:

  • A URI is used to define the identity of some thing on the on the web
  • Both URL and URN are URIs
  • A URN only defines a name, it provides no details about how to get the resource over a network.
  • A URL defines how to retrieve the resource over the web.
  • You can get a “thing” via a URL, you can’t get anything with a URN
  • Both URL and URN are URIs as the both identify a resource

There some other items that need clarification based on some comments I’ve received from the original post:

  • Elements of a URI such as query string, file extension, etc. have no bearing on whether or not a URL is a URI. If the URI is defines how to get a resource over the web, it’s a URL.
  • A URL is not limited to HTTP. There are many other protocol schemes that can be plugged into a URL.
  • If a URL defines a scheme other than HTTP, it does not magically become a URI. The URI defines how to get the resource, whether it be HTTP, FTP, SMB, etc., it’s still a URL. But because the URL identifies a resource, it’s a URI as well.

Yeah, I’ve probably repeated myself a few times, but I wanted to stress a few points.

There’s also been some confusion about when to use the term URI. As I stated in the original post explained above, it depends on what you’re doing. If everything your application does involves accessing data over the web, you’re most likely using URL exclusively. In that case, it wouldn’t be a bad thing to use the term URL. Now, if the application can use either a network location, or a name, then URI is the proper term. For example, XML namespaces are usually declared using a URI. The namespace may just be a name, or a URL that references a DTD or XML Schema. So if you’re using a URL for identity and retrieval, it’s probably best to use URI.

URL vs. URI vs. URN, in More Concise Terms

Adobe XMP Packet Extraction for the Aperture Framework

When it comes to manipulating photographs, I live in Photoshop. One feature of all Adobe products that I like is the ability to annotate images and other documents using their eXtensible Metadata Platform, or XMP. XMP is a collection of RDF statements that get embedded into a document that describe many facets of the document. I’ve always wanted to be able to somehow get that data out of these files and doing something with it for application purposes.

There are projects like Jempbox, which work on manipulating the XMP data but offers no facilities to extract the XMP packet from image files. The Apache XML Graphics Commons is more the ticket I was looking for. The library includes and XMP parser that performs by scanning a files for the XMP header. The approach works quite well and supports pretty much every format supported by the XMP specification. The downside of XML Graphics Commons is that it doesn’t property read all of the RDF statements. Some of the data is skipped or missed completely. To top it off, neither framework allows you to get at the raw RDF data.

What I really wanted to do was to get the XMP packet in its entirety and load it into a triples store like Sesame or Virtuoso. This of course means that you want to have the data available as RDF. Rather than inventing my own framework to do all of this, I found the Aperture Framework. Aperture is simply amazing framework that can extract RDF statements from just about anything. Of course, the one thing that is missing is XMP support. So, I set out on implementing my own Extractor that can suck out the entire XMP packet as RDF. It’s based on the work started in the XML Graphics Commons project, but modified significantly so that it pulls out the RDF data. Once extracted, it’s very easy to store the statements into a triple store and execute SPARQL queries on it.

Right now the, this  XMPExtractor can read XMP from the following formats:

  • JPEG Images (image/jpeg)
  • TIFF Images (image/tiff)
  • Adobe DNG (image/x-adobe-dng)
  • Portable Network Graphic (image/png)
  • PDF (application/pdf)
  • EPS, Postscipt, and Adobe Illustrator files (application/postscript)
  • Quicktime (video/quicktime)
  • AVI (video/x-msvideo)
  • MPEG-4 (video/mp4)
  • MPEG-2 (video/mpeg)
  • MP3 (audio/mpeg)
  • WAV Audio (audio/x-wav)

On the downside, I’ve found that if you use the XMPExtractor with a Crawler, you’ll run into some problems with Adobe Illustrator files. The problem is that the PDFExtractor mistakes these files for PDFs and then fails. But as long as you’re not using Illustrator files, you should be ok. There’s also a few nitpicks with JPEG files and the JpgExtractor in that the sample files included in the XMP SDK are flagged as invalid JPEG files. However, every JPEG file I created from Photoshop and iPhoto seem to work fine. But after a little more testing, I’ll look at offering it up as a contribution to the project.

Adobe XMP Packet Extraction for the Aperture Framework

Semantic Web Icon and Logo Stencil for OmniGraffle

I’ve been doing a lot of diagrams related to some of the projects that I have doing with RDF and other Semantic Web technologies. Rather that cut and paste PNG icons into OmniGraffle, I decided to start putting together a stencil. Here’s what it looks like so far:Semantic Web Icons

It’s not much right now, but I’ll try and keep it up to date as I add more icons from the W3C site and other sources. It should be up on GraffleTopia soon.

Update: And here’s the direct link to the stencil.

Semantic Web Icon and Logo Stencil for OmniGraffle

You can’t do anything “over REST”

Sometimes, you can let things slide, but there are other time when terms are just used so incorrectly that it has to be called out. One thing that always gets me is the gross misuse of the term REST. For those who know what REpresentation State Transfer (REST) means then you know that, although the REST architectural style is commonly used with HTTP, it is not bound to any specific protocol.

One of the things that starts my head spinning is seeing how the term “REST” is so often used in place of thing they really mean in order to toss out buzzwords. When I’m involved in technical discussions or read articles on the web, I start to feel like Inigo Montoya when I hear the term “REST”. More often than not, someone is probably referring about HTTP, or even HTTPS, but you can never be too sure. Here’s a few of my favorite statements:

We’ll send it over REST

Oh no you won’t! Given that REST is not a protocol, I find this kind of statement simply mind boggling. One can assume that someone would like to return data over HTTP. However, it is entirely possible to create a RESTful application over other protocols such as XMPP, RMI, or something else. It helps top be specific when you’re involved in a technical discussion.

We’ll make a REST request

Are you sure? What exactly does a REST request look like? If you can’t request data from a URI like rest://example.com/foo, then you’re not making a “REST request.” As stated above, be specific as to what protocol you’re using.

We’ll return it as a REST object

This one pains me more than the other two. Seriously, what kind of “objects” are RESTful? Is is XML, JSON, binary, what? Again, there is no such thing. There are only resources and representations, and it’s the representations of those resources you need to be specific about. What, exactly, are you sending over the wire?

We’ll just add some methods to our REST server

OMFG! For real, a REST server? Even though the Facebook claims to have one of those, it doesn’t make improper use of term valid. You can’t serve “REST,” plain and simple.

Just so that I can continue beating the horse: you can’t send jack over REST. REST is not HTTP and HTTP is not REST. If you have a web API that you’re exposing over HTTP in a RESTful fashion, why can’t it just be called an HTTP Service or API server? Correct use of the term REST is just as important as implementing a RESTful application correctly. Sadly, the same folks who use the term REST incorrectly are also not creating applications can claim to be RESTful.

You can’t do anything “over REST”

Why free software shouldn’t depend on Richard M. Stallman’s advice

There’s been a long running rant about how using Mono is um, bad. But I just don’t get it. Now we have Richard M. Stallman coming out against Mono and C# with an argument that sounds kinda like “we shouldn’t use it just because we shouldn’t.” Hmm, Ok. [ok, that is way too much of an over simplification and taking some things out of context. However, I’m still not sure what’s bad: C#, Mono, or both?]

The odd thing about the post if that it focuses on C#, but none of the other languages that the the Mono CLR supports. Second, he goes on to state that “If we lose the use of C#, we will lose them [the applications] too.” Given that C# is an ECMA standard (as is the CLR itself), I think the conerns about not being able use C# are unwarrented. If we have to worry that the ECMA would allow Microsoft to pull rank on C#, then web developers should be rethinking thier use of JavaScript.

But the wierd thing is that Stallman doesn’t make the same point about any other langauage that the Mono CLR supports. For example, if Tomboy were written in the Boo programming language but remain on the Mono CLR, would evertyhing be ok? Why is there such a profound hatred of C# and not other lanagues supported by the CLR? Why not come out against the use CIL? Or is Stallman just not making his point clear enough?

As somone who uses Ubuntu 9.0.4 on a daily basis, I can apprciate what Mono has to offer from an end user perspecitive. I’m a HUGE fan of GnomeDo, which has turned out to be a better implementation of Quicksilver than Quicksilver. Then of course there’s Banshee, which is blosoming into an excellent media player. And also there F-Spot for photo management. I could go on, but the point is here that there are a lot of really great applications for GNOME that happen to be built on Mono.

Overall, I find that the post is weak on sound technical and legal arguments and high as a kite on FUD. Where’s the meat? Specifically, what can Microsoft go after that’s not GNOME if people start rewriting Mono applications in C++? Jo Shields has a lengthy, but excellent, post called Why Mono Doesn’t Suck. Jo’s post makes a lot of really good points about Mono if you don’t have a short attention span.

In the end, i think that a Mono is ultimately a good thing for Linux on the desktop. Anything that gives developers better productivity, and choice is a good. Part of being free is being able to make a choice: we should be free to choose whether or not we actually want to use applications developed with Mono.

Why free software shouldn’t depend on Richard M. Stallman’s advice