Ivan's Blog

Next 8 entries

Fri, 22 Dec, 2006

Music Ontologies

Frederick Giasson just posted a blog on the renewed Music Ontology Specification, trying to make it as close to the MusicBrainz project as possible. It will also be possible to query the MusicBrainz database via SPARQL endpoints. Really great stuff!

While we are at it, however… Personally, I listen mostly to classical music, and I’m always frustrated that most of these ontologies, databases, etc, are not really prepared for that at all. I own an iPod, it is all right as a device, but the terminology it uses is really not prepared for that (think of the terms “song” or “album”, for example). But yesterday, on #swig, Frederick also gave me the reference for a classical music ontology: the “Music Vocabulary” defined by Kanzaki Masahide. For that minority of us caring about classical music, this is a good resource to have…

Category: /WorkRelated/SemanticWeb; Posted at: 09:56 UTC; Permalink

Mon, 11 Dec, 2006

SADIe

Read through the paper of Bechhofer & al[1] from ISWC. Interesting stuff. Use Semantic Web to help accessibility problems. Per se, this is not new, but the approach of Bechhofer & al. seems promising. What they do is (1) extract CSS class names from an HTML page (not necessarily XHTML…) and (2) use ontologies to categorize the role of each element (as referred to by CSS). Using the knowledge gained from the ontology a tool can rearrange the page, remove unnecessary elements, etc. The ontologies consist of two parts: on the one hand a general ontology describing the possible roles of various elements in general ( e.g., menus) and, on the other hand, specific ontologies for a specific classes of pages. Eg, ontologies are available for all the CNN.com pages, or for the blogger.com blogs. (There is a much better description on the project’s home page.)

Having some contacts with the Web accessibility community (eg, via my contact with the Accessibilty.nl group here in the Netherlands) I know how really difficult these things are. Also, the fact that the Web designers’ community has not really adopted XHTML over HTML makes the development of various tools even more difficult. The advantage of SADIe seems to be that it annotates the (now ubiquitous) CSS tags rather than relying on the content of the (X)HTML page; also, the ontology can be defined for a specific Web site (or family of sites) independently so that no extra information has to be added to the site itself.

Although the project is still in research, I was curious, so I tried the on-line facility for some sites. I was not very lucky with the CNN home page; there was hardly any difference. But, when I randomly chose a blog on blogger.com, the difference with the site generated by the tool is noticable (e.g., the menus are rearranged):

original web page, with lots of images, colours transformed web page

I did not compare this with the output of other tools (like Opera’s small screen rendering which does similar things based on very different techniques). But it is certainly promising.

It is interesting that Danny Ayers just published a blog on turning CSS sructures into RDF. The current tool of SADIe is a javascript goody analysing the DOM tree of the page, and using the CSS ontology knowledge to massage the DOM tree. However, if the CSS structure can be turned into RDF, this can then be mashed up with the Ontology defined by SADIe; ie, other tools can do nice things with those data, too (for example, specialized queries could extract the meaningful information from the page, stuff like that). Could become even more interesting…

  1. SADIe: Semantic Annotation for Accessibility”, by Sean Bechhofer, Simon Harper, and Darren Lunn
Category: /WorkRelated/SemanticWeb; Posted at: 15:38 UTC; Permalink

Thu, 30 Nov, 2006

Book mashup, SW book list…

I regularly follow the updates on the “Books on Semantic Web” wiki page. I just found out that for most of the books a new entry has been added: a reference to the book mashup service of Chris Bizer & friends. It is really a great service, thanks Chris!

The only disadvantage is that the various RDF references are spread over the wiki page, so it is not easy to make, for example, SPARQL queries over the whole list of books. To make this easier I made a small python program that simply collects all mashup data for the books on the wiki, merges them into one, and puts the result back on the Web as one RDF resource. The script is ran once a day, so there might be a bit of delay when updating the book list. (I could have added it as a CGI script, and I may do it at some point; the problem is that collecting all the data takes quite a long time, so it is not ideal as a service…)

Category: /WorkRelated/SemanticWeb; Posted at: 21:52 UTC; Permalink

Wed, 29 Nov, 2006

Literals as subject in RDF and internationalization

I had one of those “aha” moments today. I am at the AC meeting of W3C and listened to a presentation on the Internationalization Tag Set work done at W3C. Essentially, the group tries to define a set of generic (XML) attributes and elements relevant and important for internationalization. One of the possible examples is the proper handling of right-to-left scripts in left-to-right one (like Arabic or Hebrew put in an English text): how should one ensure that the text: “פעילות הבינאום, W3C” is properly displayed (i.e., with the ‘W3C’ at the left of the hebrew characters). It so happens that this requires an extra attribute (called ‘dir’ in HTML). HTML happens to have this attribute, but what the ITS group is working on is to define a generic set that could be used in other XML dialects, too. Another example is an attribute to denote whether a specific text should be translated or not.

The question is: what about RDF literals? How can one ensure that the right set of information are set on literals? At the moment we have only the language attribute, which is necessary, but may not be enough. And my “aha” moment was: if we did not have the restriction in the RDF model which says that literals cannot be subjects of triplets then this would be way easier. The current terminologies used in ITS could have a variant in the form of an RDF vocabulary and be used to characterize literals. This may not cover all internationalization issues (like Ruby markup, for example), but may be o.k. for most.

Of course, one has to be very, very cautious in changing the RDF model and semantics, i.e., there might be some hidden mines here. But the internationalization issue may certainly be a good use case for looking at this restriction again…

(There is a really nice overview on the bidirectional issue above on the I18N site of W3C, if you are interested.)

Category: /WorkRelated/SemanticWeb; Posted at: 06:06 UTC; Permalink

Thu, 23 Nov, 2006

Tabulate bookmarklet

As a typical example of programming by example, I have copied and changed the foaf explorer bookmarklet to make a tabulator bookmarklet. It looks for a link header element referring to application/rdf+xml, and sends the corresponding URI to the (latest version) of the tabulator. The tabulator will automatically load that RDF file, instead of starting up with its default values. Nothing fancy, but maybe useful. (Caveat: if there are several such link entries, the bookmarklet finds the first one only…)

(I wanted to put the bookmarklet itself, ie, the javascript: URI into this blog directly, but blosxom had problems displaying it. Oh well…)

Category: /WorkRelated/SemanticWeb; Posted at: 15:49 UTC; Permalink

Fri, 10 Nov, 2006

ISWC Conference (ISWC Day 5)

Great keynote from Rudi Studer. An essential part of his keynote was the relationships between the Semantic Web and other communities (KR, Database, Software Engineering, Natural Language Processing, Machine Learning). One aspect that I really appreciated in the keynote is that Rudi did not use his keynote on what he (or his team) did and does, but gave an overall view instead (I have seen too many of those self-centric keynotes…). My only respectful criticism is that the Web aspect was a bit missing. I think it was Chris Welty who said somewhere that the new thing in the Semantic Web is the Web; very often, when anaylising the relationships, Rudi did not really make it clear what the Semantic Web’s relationship is to a specific area as opposed to Semantic’s (ie, essentially, knowledge representation’s) relationships.

The most interesting part for me was what he said about the relationship to the Database community. Some new things I will have to learn, that is for sure (eg, he referred to the term “dataspaces”, which seem to be the new buzzword in that community). And, clearly, there is some extra outreach to be done in that space; I just read the blogs of Ian and Danny on the keynote of Mårten Mickos at the the Web 2.0 Summit, which does not look very good. We already had hallway discussion on trying to organize an event around the relationship between SW and databases in 2007 (eg, at W3C); maybe the topic should be a bit larger than I originally thought. To be followed up…

I had to give a talk in the next session, and I spent the rest of the day chatting with different people; ie, no more sessions. But I went to the closing ceremony. I was pleased to see the paper on DartGrid (see my previous blog) win the best paper price in the use cases category. And it was also a pleasure to see the team of Amsterdam win the Semantic Web challenge with their MultimediaN E-Culture demonstrator. What I particularly liked in their demonstration, beyond the technical content, was the beautifully designed user interface with nice, pastell colours, clear, clean, and non-cluttered screen. On the other hand, I can only repeat what I said elsewhere: I had to come to the other side of the globe to learn what people are doing at about 20 meters away from my office at CWI (and I also drive along the Vrije Universiteit with the tramway every time I go to the Office). Shame on me. Something should be done about it…

It was a great, albeit exhausting week (I also took part at a face-to-face meeting of the Rules Interchange Format Working Group last week). And I have a bunch of things to read in the proceedings still…

Category: /WorkRelated/SemanticWeb; Posted at: 11:50 UTC; Permalink

Thu, 09 Nov, 2006

ISWC Conference (ISWC Day 4)

Only two papers really stuck with me on the second day; I missed a bunch of other sessions, sadly…

The paper of Motik & al[1] is one of the papers which explore the connections of Description Logic and Rules. Obviously, I will have to read the details in the proceedings, but my first understanding is that this is also based on a rules with the DL side being some sort of a black box. The only unusual aspect is the introduction of what they call “autoepistemic extenstion of DLs”. I must admit, this is the first time I hear about this thing, so I still have to understand it. But all this should be interesting. The paper is theoretical for now; my understanding is that there is no implementation yet, though somehow the feeling I got from the presentation is that the authors have a clear idea on how to implement it, so it should be a matter of time only.

The paper on DartGrid[2] is impressive. These guys did a huge work that covers a large palette of achievements: a system to map relational database content to RDF, with a nice user interface to control the details of that mapping; a query rewriter from SPARQL to the necessary SQL queries using those mapping; a semantic query system (backed up by ontologies) to search through those databases; user interfaces to generate queries, etc. And an application combining cca. 50 databases in one coherent system, with each database containing between 70,000 and 100,000 records. This application is deployed in the Chinese Academy of Traditional Medicine and is used by researchers there. Another application on neural research is under development at Yale. (By the way, the home page of the DartGrid project is in English, so you can look it up…) The paper was part of a session on eScience use cases; the other papers were also pretty nice!

There was also a panel on Web 2.0 in the evening. It was not really controversial and the goal was really to see how the Semantic Web could, in some ways, help the further development of Web 2.0. I liked Benjamin Grossof’s characterization, who said something like “it is better to be the child riding the elephant than the ant crashed by it”…

  1. Can OWL and Logic Programming Live Together Happily Ever After?”, by Boris Motik, Ian Horrocks, Riccardo Rosati, Ulrike Sattler
  2. From Legacy Relational Databases to the Semantic Web: an In-Use Application for Traditional Chinese Medicine”, by Chunyin Zhou, Yimin Wang, Heng Wang, Jinmin Tang, Zhaohui Wu, Ainin Yin, Huajun Chen, Yuxin Mao, Meng Cui
Category: /WorkRelated/SemanticWeb; Posted at: 12:38 UTC; Permalink

Wed, 08 Nov, 2006

ISWC Conference (ISWC Day 3)

The problem with such a conference is that one spends as much time talking to other people as listening to the presentations. Great hallway talks, technical discussion, work… it is all good and important, but I always feel a bit guilty when I miss a session. Oh well…

I quite liked Tom Gruber’s keynote on “Social Web”. Tom tried to avoid the controversial Web 2.0 term and talked rather of the collective intelligence of folksonomies, tagging, blogging, etc. It was good to hear a talk that avoids the unnecessary controversy on the relationship between Web 2.0 and, say, the Semantic Web. Tom also talked about an attempt to give a more coherent ontological model for tagging, though it seems that this work is stalled due to missing people to work on it (see also an earlier blog he had on this for some more details). Would be good to pick this up…

The survey of the Web Ontology Landscape[1] was interesting. They survey a bunch of OWL and RDFS files trying to characterize them in terms of what level of OWL they use, what are the frequencies of usage of various facilities, etc. Although there were some criticisms on whether the sample they use was fully o.k., the conclusions were still interesting. Two things stuck for me: that a large number of OWL ontologies use a very “light” level of functionalities (which leads to the issue of having light ontologies and how important those are); and that a number of ontologies “slip” into OWL Full out of OWL DL due to some very small additional features they use. Bijan told me afterwards that if OWL1.1 were used, than those ontologies would remain OWL DL, actually, which is interesting.

I also quite liked the presentation of Fresnel[2], a language to express how one wants to see RDF data (a distant analogy is like CSS to HTML). The demonstrations showed by Emmanuel were really nice, and one would hope that more tools were at least testing this thing. It is really promising. I talked to Emmanuel later, by the way: it seems the IsaViz is one of the tools that does use this, though not the “stable” version at W3C. He said he would make a new stable version with Fresnel soon; we can then announce it on the SW Activity News page

The same session included a presentation on /facet[3], a faceted user interface to RDF data. It looks really nice, though one can really have a “feel” to a tool like that by trying it. So I hope to install it on my machine soon to play with it (it requires SWI-prolog; well, that should not be a big problem…). The only slight caveat with this project: I had to come to the other side of the globe to learn what Michiel & Co. are doing, although their office is around 20 meters away from me at CWI… sad

  1. A Survey of the Web Ontology Landscape”, by Taowei Wang, Bijan Parsia, Jim Hendler
  2. Fresnel: A Browser-Independent Presentation Vocabulary for RDF”, by Christian Bizer, Emmanuel Pietriga, David Karger, Ryan Lee
  3. /facet: A Browser for Heterogeneous Semantic Web Repositories”, by Michiel Hildebrand, Jacco van Ossenbruggen, Lynda Hardman
Category: /WorkRelated/SemanticWeb; Posted at: 12:35 UTC; Permalink

Next 8 entries


Blossom's logo