Adobe XMP Packet Extraction for the Aperture Framework

When it comes to manipulating photographs, I live in Photoshop. One feature of all Adobe products that I like is the ability to annotate images and other documents using their eXtensible Metadata Platform, or XMP. XMP is a collection of RDF statements that get embedded into a document that describe many facets of the document. I’ve always wanted to be able to somehow get that data out of these files and doing something with it for application purposes.

There are projects like Jempbox, which work on manipulating the XMP data but offers no facilities to extract the XMP packet from image files. The Apache XML Graphics Commons is more the ticket I was looking for. The library includes and XMP parser that performs by scanning a files for the XMP header. The approach works quite well and supports pretty much every format supported by the XMP specification. The downside of XML Graphics Commons is that it doesn’t property read all of the RDF statements. Some of the data is skipped or missed completely. To top it off, neither framework allows you to get at the raw RDF data.

What I really wanted to do was to get the XMP packet in its entirety and load it into a triples store like Sesame or Virtuoso. This of course means that you want to have the data available as RDF. Rather than inventing my own framework to do all of this, I found the Aperture Framework. Aperture is simply amazing framework that can extract RDF statements from just about anything. Of course, the one thing that is missing is XMP support. So, I set out on implementing my own Extractor that can suck out the entire XMP packet as RDF. It’s based on the work started in the XML Graphics Commons project, but modified significantly so that it pulls out the RDF data. Once extracted, it’s very easy to store the statements into a triple store and execute SPARQL queries on it.

Right now the, this  XMPExtractor can read XMP from the following formats:

  • JPEG Images (image/jpeg)
  • TIFF Images (image/tiff)
  • Adobe DNG (image/x-adobe-dng)
  • Portable Network Graphic (image/png)
  • PDF (application/pdf)
  • EPS, Postscipt, and Adobe Illustrator files (application/postscript)
  • Quicktime (video/quicktime)
  • AVI (video/x-msvideo)
  • MPEG-4 (video/mp4)
  • MPEG-2 (video/mpeg)
  • MP3 (audio/mpeg)
  • WAV Audio (audio/x-wav)

On the downside, I’ve found that if you use the XMPExtractor with a Crawler, you’ll run into some problems with Adobe Illustrator files. The problem is that the PDFExtractor mistakes these files for PDFs and then fails. But as long as you’re not using Illustrator files, you should be ok. There’s also a few nitpicks with JPEG files and the JpgExtractor in that the sample files included in the XMP SDK are flagged as invalid JPEG files. However, every JPEG file I created from Photoshop and iPhoto seem to work fine. But after a little more testing, I’ll look at offering it up as a contribution to the project.

Eclipse on Mac Java 6 Reveals More SWT Shortcomings

Two years ago, I raised a few points about some of the short comings of SWT. Because of it’s native bindings, SWT makes the Java mantra of “write once, run anywhere” quite a bit more daunting. For the most part, SWT’s cross-platform support is actually quite good and it is a decent in terms of performance. And, if weren’t for SWT’s existence, we probably wouldn’t have seen Sun address Swing’s performance issues like they did in Java 6. Unfortunately, when a minority platform like OS X makes some steep architectural changes, SWT-based applications end up with more work on thier hands.

As most folks know, Java 6 on Mac OS X 10.5 was a long time coming. It took Apple over a full year after the initial release of Java 6 to get it running on Mac OS X. Now that it’s here and working pretty much “ok”, I decided it was time to start running Java 6 as my default JVM. Then the surprise: Eclipse won’t run under Java 6 on the Mac. Why? Because Java 6 under Leopard is 64-bit. The current version of SWT on OS X relies on Carbon, which is 32-bit and we won’t be seeing 64-bit Carbon anytime soon. Support for 32-bit Cocoa is planned for later next year, but I didn’t see word on when 64-bit Cocoa or even just Java 6 support might arrive.

Eclipse is still a great IDE even if I have to continue to run it under Java 5. However, this is one of those things that is annoying each time a platform needs to make significant changes. But this time, you can’t put all of the blame on the Eclipse crew. Apple did an absolutely terrible job keeping the Java community abreast of what thier plans were with Java 6. In fact, it almost seemed that Java 6 would never appear on Leaopard. Coupled with the fact that Java 6 was now going to be only 64-bit and Carbon was not goning to see 64-bit support. But long story short, SWT and therefore Eclipse is always going to be hindered by OS changes to a greater degree than say NetBeans or IDEA.

Hibernate and and the “Found two representations of same collection” error

Last night I spent an extended work day trying to track down the source of a Hibernate exception that I have never encountered before. The application in question is a simple data loader application that reads in an XML file populates a Hibernate object graph. The data in the XML file changes from day to day and if element exists in the XML file one day and not the next, the entity is removed from the object graph. All had been working fine until suddenly I started seeing this in my error logs:

Caused by: org.hibernate.HibernateException: Found two representations of same collection: ...

The odd thing was that there can ever be more than one representation of this collection in the application, so I had wonder WTF? I then came across this thread on the Hibernate Forums, but I wasn’t doing anything with Session.clear(). In fact, we weren’t doing manual session management (i.e. flush, etc.).

To make a long story short, the issue was traced down to mapping error. The object hierarchy is as follows:

parent +
       + Component +
                   + component attribute

The removals were being performed on the component by removing a component attribute the Component.componentAttributes collection. The component attribute maintains a bi-directional relationship with its owning component. However, the mapping for the component attributes parent was as follows:

@ManyToOne(fetch = FetchType.LAZY,
           cascade = { CascadeType.PERSIST,
                       CascadeType.REMOVE })
private Component parentComponent;

Note the CascadeType.REMOVE. This meant that when this child was removed, it’s parent would also be removed, hence the duplicate collection. The issue was resolved once the mapping was changed to:

@ManyToOne(fetch = FetchType.LAZY,
           cascade = { CascadeType.PERSIST,
                       CascadeType.MERGE })
private Component parentComponent;

Of course this could have been caught sooner had some unit tests been a little better, but the cause of this issue sure was a bitch to find. In end it was a subtle mapping issue that ended up getting introduced and wreaking havoc on my day. Hopefully this post can spare someone else some lost hours.