Protocol Buffers are not very “RESTish”.

There’s been a lot of activity recently around highly optimized, binary serialization formats such as Google Protocol Buffers and Thrift. There have been some attempts to include formats such as Protocol Buffers into their “REST API” because of perceived performance benefits without consideration of the downside of an IDL format with respect to REST.

To illustrate why Protocol Buffers inherently adulterate REST architectural style, I’ve compiled the following points:

Imposes a Tight Coupling

This tight coupling is considered a REST anti-pattern and should be avoided. Protobuf over HTTP is probably a form of RPC-URI tunneling, it is not REST. In order to work with protobufs efficiently, both the client and server must have code generated by the .proto definition.

Strongly Typed Resources

A key point made my Roy T. Fielding in one of his more informative blog posts is that:

A REST API should never have “typed” resources that are significant to the client. Specification authors may use resource types for describing server implementation behind the interface, but those types must be irrelevant and invisible to the client. The only types that are significant to a client are the current representation’s media type and standardized relation names.

I’m not totally sure I’ve got this one 100% correct, but it would appear that the use of protobufs is now also exposing typed resources to the client. Additionally, there is no means to specify which type to ask for. Which leads me to my next point:

Ambiguous Media Type

Subbu Allamaraju has pointed out that the protobuf media type doesn’t doesn’t convey enough information as to indicate what message type is in the protobuf representation in a post about a Protobuf provider for JAX-RS. Protobuf doesn’t have an official media type, but folks generally use one of the following media types:

     application/x-protobuf
     application/vnd.google.protobuf

Protobufs are not generic and the representation contains data for a very specific message. Furthermore, the serialized form contains no information as to what type(s) it represents. So although you may have specified the media type in the request, there is no mechanism to specify which protobuf message we actually want. One suggestion as to how this might be resolved is to specify the .proto package and message name by using a parameter:

     application/vnd.google.protobuf;proto=mypkg.MyMessage

The approach seems reasonable enough. But as Subbu also points out, this solution also requires that the .proto be shared between the client and server in order to establish the binding. Clearly there’s a number of options that could be applied share the .proto definition, but all of them involve some type of external configuration in order to resolve the .proto since the protobuf wire format is not self-describing.

Self-Describing formats must be Self-Describing

Protocol Buffers are not self-describing by design. The Google documentation even states that:

Protocol Buffers do not contain descriptions of their own types. Thus, given only a raw message without the corresponding .proto file defining its type, it is difficult to extract any useful data.”

This isn’t necessary a bad thing in general, but it does violate a core tenet or RESTful architecture. Section 5.2.1.2 of the Roy T. Fielding’s dissertation states that a proper representation consists of the following:

  • The data
  • Metadata that describes the data
  • Optional metadata about the metadata, usually used to for integrity checking.

Protocol Buffers only handle the data. Google has presented an option for making protobuf messages self-describing. While this works, it still has three fundamental flaws:

  1. The SelfDescribingMessage itself is not self-describing ;)
  2. There’s still a tight coupling between client and server for the SelfDescribingMessage as generated code is still required on both client and server.
  3. It doesn’t address dependency resolution. That is, message you’re sending over might extend or depend on other .proto definitions. How you resolve those dependencies is a challenge.

The dependency resolution is no small challenge and becomes a major headache if you’re extending messages from other parties.

Application State is not Hypermedia driven

Protocol Buffers do not have a means to describe links to external messages. Most applications I’ve encountered that use protobufs either send the entire message tree in a single call or there is some type of ID value that the client will have to dereference somehow. The latter ultimately results  in either forcing the client to infer and generate URLs, which is yet another REST anti-pattern.

Wrapping Up

This isn’t to say that Protocol Buffers are inherently bad. Google has proven that protobufs have value when used as intended. Outside of an RPC system, protocol buffers are really don’t seem appropriate for web applications who claim to be RESTful. Of course, I could be complete wrong here and I’m sure the internets will be certain to let me know.

Protocol Buffers are not very “RESTish”.

URL vs. URI vs. URN, in More Concise Terms

Without a doubt, the URL vs. URI post is by the most visited page on this blog. Even so, there’s still a lot of confusion on the topic and so I thought I’d break it down in less words. The original post was slightly misleading in that I attempted to compare URI to URL, when in fact it should have defined the relationship between URI, URL, and URN. In this post, I hope to clear that in more concise terms. But first, here’s a pretty picture:

uri_class_diagram

Both URLs and URNs are specializations, or subclasses of URI. You can refer to both URLs and URNs as a URI. In applictaion terms: if your application only calls for a URI, you should be free to use either or.
Now, here’s where the big difference between URN and URL: a URL is location bound and dereferencable over the web. A URN is just a name and isn’t bound to a network location. However, BOTH are still valid URIs. Now, if the application requires a URI that is bound to a network location, you must use the specialization of URI called URL.

Remember that URI stands for Uniform Resource Identifier, which is used to identify some “thing”, or resource, on the web. Both URLs and URNs are specialization’s (or subclasses if you will), of URI. You’d be correct by referring to both a URL or URN as a URI. In applictaion terms: if your application only calls for an identifier, you should be free to use either a URL or a URN. For brevity, you can state that the application simply requires a URI and the use of a URL or URN would statisfy that requirement.

Now if your application needs a URI that dereferencable over the web, you should be aware of the difference between URN and URL. A URL is location bound and defines the mechanism as to how to retrieve the resource over the web. A URN is just a name and isn’t bound to a network location. For example, you may have a URN for a books ISBN number in the form of urn:isbn:0451450523. The URN is still a valid URI, but you cannot dereference it, it’s just a name used to provide identity. So to put in simpler terms:

  • A URI is used to define the identity of some thing on the on the web
  • Both URL and URN are URIs
  • A URN only defines a name, it provides no details about how to get the resource over a network.
  • A URL defines how to retrieve the resource over the web.
  • You can get a “thing” via a URL, you can’t get anything with a URN
  • Both URL and URN are URIs as the both identify a resource

There some other items that need clarification based on some comments I’ve received from the original post:

  • Elements of a URI such as query string, file extension, etc. have no bearing on whether or not a URL is a URI. If the URI is defines how to get a resource over the web, it’s a URL.
  • A URL is not limited to HTTP. There are many other protocol schemes that can be plugged into a URL.
  • If a URL defines a scheme other than HTTP, it does not magically become a URI. The URI defines how to get the resource, whether it be HTTP, FTP, SMB, etc., it’s still a URL. But because the URL identifies a resource, it’s a URI as well.

Yeah, I’ve probably repeated myself a few times, but I wanted to stress a few points.

There’s also been some confusion about when to use the term URI. As I stated in the original post explained above, it depends on what you’re doing. If everything your application does involves accessing data over the web, you’re most likely using URL exclusively. In that case, it wouldn’t be a bad thing to use the term URL. Now, if the application can use either a network location, or a name, then URI is the proper term. For example, XML namespaces are usually declared using a URI. The namespace may just be a name, or a URL that references a DTD or XML Schema. So if you’re using a URL for identity and retrieval, it’s probably best to use URI.

URL vs. URI vs. URN, in More Concise Terms

Adobe XMP Packet Extraction for the Aperture Framework

When it comes to manipulating photographs, I live in Photoshop. One feature of all Adobe products that I like is the ability to annotate images and other documents using their eXtensible Metadata Platform, or XMP. XMP is a collection of RDF statements that get embedded into a document that describe many facets of the document. I’ve always wanted to be able to somehow get that data out of these files and doing something with it for application purposes.

There are projects like Jempbox, which work on manipulating the XMP data but offers no facilities to extract the XMP packet from image files. The Apache XML Graphics Commons is more the ticket I was looking for. The library includes and XMP parser that performs by scanning a files for the XMP header. The approach works quite well and supports pretty much every format supported by the XMP specification. The downside of XML Graphics Commons is that it doesn’t property read all of the RDF statements. Some of the data is skipped or missed completely. To top it off, neither framework allows you to get at the raw RDF data.

What I really wanted to do was to get the XMP packet in its entirety and load it into a triples store like Sesame or Virtuoso. This of course means that you want to have the data available as RDF. Rather than inventing my own framework to do all of this, I found the Aperture Framework. Aperture is simply amazing framework that can extract RDF statements from just about anything. Of course, the one thing that is missing is XMP support. So, I set out on implementing my own Extractor that can suck out the entire XMP packet as RDF. It’s based on the work started in the XML Graphics Commons project, but modified significantly so that it pulls out the RDF data. Once extracted, it’s very easy to store the statements into a triple store and execute SPARQL queries on it.

Right now the, this  XMPExtractor can read XMP from the following formats:

  • JPEG Images (image/jpeg)
  • TIFF Images (image/tiff)
  • Adobe DNG (image/x-adobe-dng)
  • Portable Network Graphic (image/png)
  • PDF (application/pdf)
  • EPS, Postscipt, and Adobe Illustrator files (application/postscript)
  • Quicktime (video/quicktime)
  • AVI (video/x-msvideo)
  • MPEG-4 (video/mp4)
  • MPEG-2 (video/mpeg)
  • MP3 (audio/mpeg)
  • WAV Audio (audio/x-wav)

On the downside, I’ve found that if you use the XMPExtractor with a Crawler, you’ll run into some problems with Adobe Illustrator files. The problem is that the PDFExtractor mistakes these files for PDFs and then fails. But as long as you’re not using Illustrator files, you should be ok. There’s also a few nitpicks with JPEG files and the JpgExtractor in that the sample files included in the XMP SDK are flagged as invalid JPEG files. However, every JPEG file I created from Photoshop and iPhoto seem to work fine. But after a little more testing, I’ll look at offering it up as a contribution to the project.

Adobe XMP Packet Extraction for the Aperture Framework

You can’t do anything “over REST”

Sometimes, you can let things slide, but there are other time when terms are just used so incorrectly that it has to be called out. One thing that always gets me is the gross misuse of the term REST. For those who know what REpresentation State Transfer (REST) means then you know that, although the REST architectural style is commonly used with HTTP, it is not bound to any specific protocol.

One of the things that starts my head spinning is seeing how the term “REST” is so often used in place of thing they really mean in order to toss out buzzwords. When I’m involved in technical discussions or read articles on the web, I start to feel like Inigo Montoya when I hear the term “REST”. More often than not, someone is probably referring about HTTP, or even HTTPS, but you can never be too sure. Here’s a few of my favorite statements:

We’ll send it over REST

Oh no you won’t! Given that REST is not a protocol, I find this kind of statement simply mind boggling. One can assume that someone would like to return data over HTTP. However, it is entirely possible to create a RESTful application over other protocols such as XMPP, RMI, or something else. It helps top be specific when you’re involved in a technical discussion.

We’ll make a REST request

Are you sure? What exactly does a REST request look like? If you can’t request data from a URI like rest://example.com/foo, then you’re not making a “REST request.” As stated above, be specific as to what protocol you’re using.

We’ll return it as a REST object

This one pains me more than the other two. Seriously, what kind of “objects” are RESTful? Is is XML, JSON, binary, what? Again, there is no such thing. There are only resources and representations, and it’s the representations of those resources you need to be specific about. What, exactly, are you sending over the wire?

We’ll just add some methods to our REST server

OMFG! For real, a REST server? Even though the Facebook claims to have one of those, it doesn’t make improper use of term valid. You can’t serve “REST,” plain and simple.

Just so that I can continue beating the horse: you can’t send jack over REST. REST is not HTTP and HTTP is not REST. If you have a web API that you’re exposing over HTTP in a RESTful fashion, why can’t it just be called an HTTP Service or API server? Correct use of the term REST is just as important as implementing a RESTful application correctly. Sadly, the same folks who use the term REST incorrectly are also not creating applications can claim to be RESTful.

You can’t do anything “over REST”

Semantic Web research publications need to be more “webbish”

Over the past few weeks, I’ve been taking a deep dive into the Semantic Web.  As some will tell you, a number of scalability and performance issues with the Semantic Frameworks have not been fully addressed. While true to some extent, there’s been a large amount of quality research out there, but it is actually somewhat difficult to find.  Part of the reason is that much of this research is published on the web in PDF. To add insult to injury, these are PDFs without associated meta data nor hyperlinks to associated research (which is not to say that prior research isn’t properly cited).

Does this seem slightly ironic? The Semantic Web is built on the hypertext-driven nature of the web. Even though PDF is a readable and relatively open format, it’s still a rather sucktastic hypertext and on-screen format (especially when it’s published as a two-column layout).  PDFs are an agnostic means of publishing a document that was authored in something like Microsoft Word or LaTex. They are generated as an afterthought and most do not take the time to properly hyperlink these PDFs to external sources. Why is this bad? Given that we’re using the document web (or rather the “Page Ranked” web) of today, this makes it a bit more challenging to locate this kind of information on the web. In a sense, this is akin to not eating your own dog food. If knowledge isn’t properly linked into the web as it works today, it effectively doesn’t exist. Guys like Roy T. Fielding get this, and it’s most likely why his dissertation is available as HTML, as well as PDF variants.

As a friendly suggestion to the Semantic Web research community: please consider using XHTML to publish your research. Or even better, XHTML with RDF/A. Additionally, leverage hyperlinks to cite related or relevant works. It’s not enough anymore to use a purely textual bibliography or footnotes. The document web needs hyperlinks.

There’s a lot of cool and important stuff being worked on but this knowledge is not being properly disseminated. No doubt, in some cases publishing to two different formats can be a lot of work. But in the long term the payoffs are that information is widely available and you’re leveraging the Semantic Web as it was meant to be.

Semantic Web research publications need to be more “webbish”

RESTEasy and Seam

There has been some discussion lately regarding integrating JBoss Seam with RESTEasy. Jay Balunas recently made about post on his thoughts on ths subject, so I thought I’d post some of mine. For the record, I am a huge fan of Seam and I think there’s definitely a place for Seam in RESTEasy, so here’s a few of my musing on the subject:

Seam Managed Persistence Contexts

This is a really great feature of Seam and it adds a lot of value to a framework like RESTEasy. If you have a resource that returns a entity that is a complex object graph that will be marshalled to XML, you have a high potential for hitting a LazyInitialzationException. This is especially true if you’re calling a SLSB to get your data because the marshalling process is handled on the web tier, outside of the EJB transaction. If the entity was not fully initialized, your object graph will be incomplete and you will get a LazyInitialzationException. In the initial version of RESTEasy, I used a Seam managed persistence context in order successfully marshall a complex entity using JAXB without getting a LazyInitializationException. I could have written a filter that was similar to the Spring OpenSessionInViewFilter to span the transaction over the entire HTTP request, but Seam made this problem transparently go away in very elegant manner.

Transactions/Conversations

Conversations are one feature of Seam that cause a lot of folks to raise concerns about the stateful nature of the framework in regards to REST. However, I think in some instances, some of the conversational aspects of Seam can be utilized in a RESTful design. Take this thread on the JSR-311 mailing list from Bill regarding Transactions in JAX-RS:

No, I don’t want JAX-RS to have a transaction model :)

One pattern I’ve seen in REST is how they solve distributed
transactions.  The pattern seems to be

/transactions/{tx-id}/.../whatever/your/real/resources/are

So really the transaction resource is allowed to contain any resource
the server supports.  (I hope you are following me, if not I’ll expand).

IMHO, this is something I think Seam could lend a hand with. Taking both Seam’s support for conversations and jBPM, you could conceivably use a long-running conversation to implement such a feature whereby you might end up with something like:

/transactions/{conversation-id}/.../whatever/your/real/resources/are

It’s not a fully baked idea, nor might it be quite the same thing that Bill is talking about. However, it could potentially be RESTful if implemented properly and Seam already provides the plumbing for this out of the box.

Entity Resources

The Seam framework (as in org.jboss.seam.framework), offers a lot of convenience features that make working with JPA and Hibernate a breeze. I had a silly idea a while a back about how to expose entity beans as a resource without the need for a dedicated resource class. The idea wasn’t fully baked (and in hind sight, those details are actually kinda crappy and overly verbose), but the general idea was to make it easy to navigate elements of an object graph. For example, if you were to access the following URI:

http://myhost/contacts/12345

You’d end up with the full object graph as XML for contact ID 12345. If you just wanted to look at one address for that contact, you should be able to call:

http://myhost/contacts/12345/addresses/home

And you would get just the XML element for the contact’s “home” address. What I don’t want to do is create a separate resource class for each element and each collection type in the object graph. You shouldn’t have to create a ContactsResource, ContactResource, AddressesResource, AddressResource, etc. Ideally, you should be able to have one class, or just the entity itself, that can represent the resource. I’m currently working to refine this idea for RESTEasy and much of the Seam framework API could be useful in making this a reality.

I could go on, but I think there’s a lot of value that Seam can bring to RESTEasy.

RESTEasy and Seam

URI vs. URL: What’s the Difference?

Updated (1/18/2011) : Because there’s still a lot of confusion, I’ve created a third post that attempts to resolve a lot of the questions from the comments on the last two posts. The new post is here.

Updated (8/27/2009): I’ve created another post that attempts to make the distinctions a bit more clear. The new post is here.

What is the difference between a URL and URI and why does it matter? This topic is confusing to some (myself included) and I thought I’d share my understanding of the two concepts. I’m hoping this post will give you a better understanding about how the two differ and why it matters to some.

Note: The goal of this post is to simplify the distinction between URI and URI. If you feel that in the summarization process something was lost, or it’s simply just correct, please post a comment and the information will be corrected. I only ask for any comments/criticism to be constructive.

Update: Thanks some constructive, and not-so constructive, feedback from some readers I have updated this post to correct many of my own misunderstandings. Of which, there were many.

URI

A URI identifies a resource either by location, or a name, or both. More often than not, most of us use URIs that defines a location to a resource. The fact that a URI can identify a resources by both name and location has lead to a lot of the confusion in my opionion. A URI has two specializations known as URL and URN.

URN

A URI identifies a resource by name in a given namespace but not define how the resource maybe obtained. This type of URI is called a URN. You may see URNs used in XML Schema documents to define a namespace, usually using a syntax such as:

<xsd:schema xmlns="http://www.w3.org/2001/XMLSchema"
            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="urn:example"

Here the targetNamespace use a URN. It defines an identifier to the namespace, but it does not define a location.

URL

A URL is a specialization of URI that defines the network location of a specific resource. Unlike a URN, the URL defines how the resource can be obtained. We use URLs every day in the form of http://damnhandy.com, etc. But a URL doesn’t have to be an HTTP URL, it can be ftp://damnhandy.com, smb://damnhandy.com, etc.

The Difference Between Them

So what is the difference between URI and URL? It’s not as clear cut as I would like, but here’s my stab at it:

A URI is an identifier for some resource, but a URL gives you specific information as to obtain that resource. A URI is a URL and as one commenter pointed out, it is now considered incorrect to use URL when describing applications. Generally, if the URL describes both the location and name of a resource, the term to use is URI. Since this is generally the case most of us encounter everyday, URI is the correct term.

URI vs. URL: What’s the Difference?

Not enough is RESTful in RestFaces

I just came across a quick article on The Server Side about a JSF framework called RESTFaces. My initial reaction was “oh cool, a JSF framework that might adhere to RESTful principals.” Sadly, there isn’t much more than HTTP GET support that is “RESTful” about RESTFaces. RESTFaces touts itself as being a

RESTfaces for JavaServerâ„¢ Faces Technology make it possible to write bookmarkable pages using JavaServerâ„¢ Faces.

In a nut shell: RESTFaces allows you to invoke actions via HTTP GET as opposed to just POST actions. JBoss Seam has a similar feature and their docs describes it as a means of making RESTFul applications that can be bookmarked. To be fair, Seam does not claim to be a full-on REST framework. Now I am a huge fan of JBoss Seam, so I don’t mean to come off as pooh-poohing that effort. But I wonder, is just providing GET support enough enough to be considered RESTful? At a low level, probably yes. But there could be so much more.

This could all change as the specifications for JSF and Web Beans matures. JSR-311 is a thriving work in progress and JSR-314 is also still getting ramped up as well. As these spec mature, some nice integration points might be:

Support for URI templates.

JSR-311 is currently defining support for this, but if this were integrated into JSF URIs would not be just bookmarkable, but also human readable. For example, instead of:

http://somehost/blog/entry.action?id=1234

You could have something a bit cleaner

http://somehost/blog/entries/1234

Your entry ID is now a component of the URI, which makes it a lot easier on the eyes. If you ever have had to deal with marketing applications, you can appreciate the value of this.

Support for multiple representations via HTTP Content Negotiation

Again this is something that JSR-311 is defining, but if this is integrated into JSF, or Web Beans for that matter, it would allow a single URI to deliver different media types. Using the example above, the same URI:

http://somehost/blog/entries/1234

Could deliver the content as a PDF, or an HTML format that is more suitable for a mobile device and the decision is made by the framework. The user would not have to execute a specific URL for each type:

http://somehost/blog/entries/1234.html
http://somehost/blog/entries/1234.pdf
http://somehost/blog/entries/1234.wml

Or worse:

http://somehost/blog/entry.action?id=1234&type=application/pdf
http://somehost/blog/entry.action?id=1234&type=text/html

Through content negotiation, the user just needs the URI and the application will take care of delivering the proper response. If you’re looking for an example of a present day implementation, look no further than Apache HTTPD. JBoss Seam already supports a number of ways to generate other media types such as PDF, charts and graphs, and other, so I think they’re in a great position to deliver this kind of functionality. On the plus side, at least these two frameworks do allow one to use HTTP GET which is a big help.

Not enough is RESTful in RestFaces