Semantic Web research publications need to be more “webbish”

Over the past few weeks, I’ve been taking a deep dive into the Semantic Web.  As some will tell you, a number of scalability and performance issues with the Semantic Frameworks have not been fully addressed. While true to some extent, there’s been a large amount of quality research out there, but it is actually somewhat difficult to find.  Part of the reason is that much of this research is published on the web in PDF. To add insult to injury, these are PDFs without associated meta data nor hyperlinks to associated research (which is not to say that prior research isn’t properly cited).

Does this seem slightly ironic? The Semantic Web is built on the hypertext-driven nature of the web. Even though PDF is a readable and relatively open format, it’s still a rather sucktastic hypertext and on-screen format (especially when it’s published as a two-column layout).  PDFs are an agnostic means of publishing a document that was authored in something like Microsoft Word or LaTex. They are generated as an afterthought and most do not take the time to properly hyperlink these PDFs to external sources. Why is this bad? Given that we’re using the document web (or rather the “Page Ranked” web) of today, this makes it a bit more challenging to locate this kind of information on the web. In a sense, this is akin to not eating your own dog food. If knowledge isn’t properly linked into the web as it works today, it effectively doesn’t exist. Guys like Roy T. Fielding get this, and it’s most likely why his dissertation is available as HTML, as well as PDF variants.

As a friendly suggestion to the Semantic Web research community: please consider using XHTML to publish your research. Or even better, XHTML with RDF/A. Additionally, leverage hyperlinks to cite related or relevant works. It’s not enough anymore to use a purely textual bibliography or footnotes. The document web needs hyperlinks.

There’s a lot of cool and important stuff being worked on but this knowledge is not being properly disseminated. No doubt, in some cases publishing to two different formats can be a lot of work. But in the long term the payoffs are that information is widely available and you’re leveraging the Semantic Web as it was meant to be.

Hibernate and and the “Found two representations of same collection” error

Last night I spent an extended work day trying to track down the source of a Hibernate exception that I have never encountered before. The application in question is a simple data loader application that reads in an XML file populates a Hibernate object graph. The data in the XML file changes from day to day and if element exists in the XML file one day and not the next, the entity is removed from the object graph. All had been working fine until suddenly I started seeing this in my error logs:


Caused by: org.hibernate.HibernateException: Found two representations of same collection: ...

The odd thing was that there can ever be more than one representation of this collection in the application, so I had wonder WTF? I then came across this thread on the Hibernate Forums, but I wasn’t doing anything with Session.clear(). In fact, we weren’t doing manual session management (i.e. flush, etc.).

To make a long story short, the issue was traced down to mapping error. The object hierarchy is as follows:

parent +
       + Component +
                   + component attribute

The removals were being performed on the component by removing a component attribute the Component.componentAttributes collection. The component attribute maintains a bi-directional relationship with its owning component. However, the mapping for the component attributes parent was as follows:

@NaturalId
@ManyToOne(fetch = FetchType.LAZY,
           cascade = { CascadeType.PERSIST,
                       CascadeType.MERGE,
                       CascadeType.REMOVE })
private Component parentComponent;

Note the CascadeType.REMOVE. This meant that when this child was removed, it’s parent would also be removed, hence the duplicate collection. The issue was resolved once the mapping was changed to:

@NaturalId
@ManyToOne(fetch = FetchType.LAZY,
           cascade = { CascadeType.PERSIST,
                       CascadeType.MERGE })
private Component parentComponent;

Of course this could have been caught sooner had some unit tests been a little better, but the cause of this issue sure was a bitch to find. In end it was a subtle mapping issue that ended up getting introduced and wreaking havoc on my day. Hopefully this post can spare someone else some lost hours.