Semantic Web research publications need to be more “webbish”

Over the past few weeks, I’ve been taking a deep dive into the Semantic Web.  As some will tell you, a number of scalability and performance issues with the Semantic Frameworks have not been fully addressed. While true to some extent, there’s been a large amount of quality research out there, but it is actually somewhat difficult to find.  Part of the reason is that much of this research is published on the web in PDF. To add insult to injury, these are PDFs without associated meta data nor hyperlinks to associated research (which is not to say that prior research isn’t properly cited).

Does this seem slightly ironic? The Semantic Web is built on the hypertext-driven nature of the web. Even though PDF is a readable and relatively open format, it’s still a rather sucktastic hypertext and on-screen format (especially when it’s published as a two-column layout).  PDFs are an agnostic means of publishing a document that was authored in something like Microsoft Word or LaTex. They are generated as an afterthought and most do not take the time to properly hyperlink these PDFs to external sources. Why is this bad? Given that we’re using the document web (or rather the “Page Ranked” web) of today, this makes it a bit more challenging to locate this kind of information on the web. In a sense, this is akin to not eating your own dog food. If knowledge isn’t properly linked into the web as it works today, it effectively doesn’t exist. Guys like Roy T. Fielding get this, and it’s most likely why his dissertation is available as HTML, as well as PDF variants.

As a friendly suggestion to the Semantic Web research community: please consider using XHTML to publish your research. Or even better, XHTML with RDF/A. Additionally, leverage hyperlinks to cite related or relevant works. It’s not enough anymore to use a purely textual bibliography or footnotes. The document web needs hyperlinks.

There’s a lot of cool and important stuff being worked on but this knowledge is not being properly disseminated. No doubt, in some cases publishing to two different formats can be a lot of work. But in the long term the payoffs are that information is widely available and you’re leveraging the Semantic Web as it was meant to be.

