Getting Ready for ESWC 2008

May 31st, 2008

We have just managed to somplete our posters on time for ESWC before we leave. Our flight is on Sunday and we should land in Tenerife in the early evening. Travelling late means that we will probably miss the Linking Open Data Gathering but i’m sure there will be plenty of time to meet up and chat during the conference.

The rkbexplorer.com team is well represented at the conference, with a paper in IRSW , 2 posters and 1 demo. We missed out on a paper in the main conference but hopefully this will be rectified at ISWC.

I am really looking forward to the Identity and Reference Workshop . All the main players should be there and i quite like the Workshop format of having all 10 speakers talk for 8 minutes each and then having en extended period for discussion and debate. I just finished the slides for our presentation and i think certain proposals both from ourselves and the OKKAM team may seem quite controversial. We have also had a list of discussion points circulated that will be debated at the workshop, some of the more interesting things include:

  • Should instances and concepts be treated the same way w.r.t reference on the Semantic Web?
  • What are the criteria needed in order to assert that 2 URIs reference the same entity?
  • Should we have a global set of URIs or let people choose their own?
  • Should we have a centralised or decentralised approach to managing reference on the Semantic Web?
  • I am sure there will be a lot of heated debate, with opinions on both sides of each issue. Let’s just hope we can find some common ground and plan a way forward that suits all parties.

    I will try and put as much stuff as i can from the Workshop and Conference in this blog, if you are going to Tenerife then we hope to see you there.

Who needs owl:sameAs?

May 22nd, 2008

There has been much discussion over the path month on the W3C semweb mailing list on the issue of coreference . More particularly how can we provide an infrustructure for managing URI identity on the Semantic Web that goes beyond making basic owl:sameAs linkages.

The problem with owl:sameAs has been highlighted in one of our papers and was also mentioned in the discussion on the mailing list. Basically owl:sameAs treats the subject and object that it is linking as exactly the same entity with respect to all properties. These strong semantics mean that, for example, if a person has a URI at one institution with a set of statements containing their address, telephone number and email etc., and a URI at another institution with another set of similar statements, making an owl:sameAs link between the URIs will merge all statements and make the URIs indistinguishable. It can also not be used to make links between things that are similar but not quite the same, such as a hardback and softback version of the same book.

At rkbexplorer.com we have developed our own method for managing coreference called the consistent reference service or CRS. The basic idea is that each knowledge base has one or more CRSes that manage both coreference between identifiers in its own KB and has links to identifiers in other CRSes. Similar URIs are grouped together in a bundle which means that ‘this CRS has deemed the URIs to be referring to the same entity in a given context’. Try looking up Hugh Glaser in DBLP for an example . It is then upto an application to decide whether to use any, some or all of the duplicates provided by a CRS. Our infrustructure has been in place in various forms for over 2 years now and currently we have over 20 distributed KB’s containing over 60 million triples, each with its own CRS.

The distributed nature of our approach exhibits features which others are only now beginning to realise as being necessary in managing the Semantic Web. Trust, provenance and security could all be enabled by issuing varying degrees of ‘authority’ to a particular CRS. Yesterday, i implemented an algorithm for following links around CRSes to find the ‘equivalence closure’ for a URI, or in other words: ‘find me all duplicates of this URI’. Each URI in our KB’s contains a predicate giving the URI of a bundle within that KB’s CRS. Following these links, and then repeating the procedure for each duplicate URI in turn yields a set of equivalent URIs. All this takes place in O(n) time with respect to the number of CRSes. Therefore, even if there were thousands of CRSes, computing the equivalence closure would incur minimal cost. Compare this to trying to compute the closure of owl:sameAs links (which struggle when more than a handful are made) and you can see that we already have a scalable, robust and distributed solution for managing coreference.

As Hugh answered each point in the thread on the mailingl list in turn, the amount of correspondence dropped considerably, prompting one person to remark ’suddenly so quiet here :)’. Whether that was due to people being stunned into silence or not bothering to respond, we will find out at IRSW2008.

Interview with New Scientist on Linked Data

May 19th, 2008

Last week, i was sent an email by Jim Giles , a freelance writer for the New Scientist based in San Francisco. He asked if i could spare a few minutes to give my views on -->Linked Data --> and DBpedia and, in particular, if these initiatives will bring the Semantic Web to the masses. I was reluctant at first, not least because there are other people in the Linked Data world who could enthuse and evangelise a lot more than i could. However, after deferring to Hugh we decided we should respond and Hugh would do the interview.

I then found out that the interview would be on Skype so i could join in the conversation, even as a passive participant. Amazingly, I have never used Skype before and was worried about not having a special headset to use on my laptop. As it turned out the call quality was very good and we ended up having over an hour’s discussion that was conducted mainly between Jim Giles and Hugh ,although i was able to contribute in some places.

Hugh started off by setting the -->Linked Data --> intiative in the context of Semantic Web research that has been ongoing for many years. Hugh commented that he thought the Semantic Web would arrive without everyone neccessarily knowing that it had, and Semantic Web researchers would be slightly annoyed that all their efforts had not got the recognition they deserve. This situation is wuite likely since so called semweb apps such as Twine ,Garlik and Powerset don’t come across to the average user as being anything over than a standard Web or Web 2.0 application.

The interview then moved on to the topic of coreference and how it is a problem that needs to be overcome in order for Linked Data clients to successfully gather and use the knowledge that’s out there. It took some explaining, but in the end Jim realised the way in which the LOD bubbles linked together and where coreference arises. Jim then asked for an example of an application that could use Linked Data and provide value to a user that was not already provided by Web 2.0 apps. I pointed out the DBpedia Mobile application that was presented at the LDOW Workshop. The ability to mesh data from different sources in a standard way without being tied down to a specific application is really the main selling point of Linked Data. Having explained the advantages, we were then asked how this technology will become mainstream. I gave links to the BBC’s Linked Data work and the London Gazette to indicate that Linked Data is already being used in big companies and organisations and the number will only continue to grow. Jim was impressed by this and said he would chase up the guys at the BBC to investigate their work.

As a final comment from Hugh the divergence in Semantic Web research was highlighted. We can see two main camps emerging: the ontology or ‘formalists’ camp and the technology or ‘pragmatists’ camp. It seems increasingly clear that one may view the other with a certain amount of disdain. This is picked up in our weekly Semantic Web Interest Group meetings at ECS.

We have no idea what the final article will look like and how much of our converstion will be used but it was good to hear from someone outside of the normal Semantic Web Gang getting interested in Linked Data.

Triplification failure

May 14th, 2008

I was trying to add the triplify add-on for Wordpress to this blog. Unfortunately things didn’t go quite according to plan. Firstly, the documentation on the website is not very detailed particularly with regards to installation instructions. The instructions contained in the readme file in the download package were more helpful.

Secondly, there is no mention of the add-on only being compatible with PHP 5.1.x and higher. The problem occurs in the pdo package that is installed as standard in PHP 5 but not PHP4. This package is used to manage database connections to a range of databases including MySQL and PostgreSQL. After hunting down a PHP4 compatable version of PDO i thought things might fit together. However, after hacking around with a PHP4 PDO class i started getting curious hebrew error messages:

PHP Parse error: parse error, unexpected T_PAAMAYIM_NEKUDOTAYIM, expecting ',' or ';' in /home/.../triplify/PDOStatement_mysql.class.php on line 42

 

‘Paamayim Nekudotayim’ as it turns out is Hebrew for ‘double colon’, an error message left by PHP’s israeli authors.

So my attempt to add this blog to the Linked Data world were unsuccessful. It is also hard to upgrade to PHP5.2 since rkbexplorer.com uses Red Hat Linux Enterprise and there is currently no rpm available for this distro. At least the SIOC plugin seems to be working and should be useful if i decide to manually export the data of this blog as Linked Data.

Hello Blogosphere

May 13th, 2008

Looks like Wordpress installed successfully on rkbexplorer.com. I also added the Sindice SIOC widget and SIOC plugin. Should be interesting to add to the Giant Global Graph of -->Linked Data --> via these blog posts.