Ivan’s private site

November 16, 2008

Classical Music and Improvisation (re: Gabriele Montero)

It was the French Classical Music channel (France Musique) that made me discover the name of Gabriele Montero. A great classical pianist from Venezuela (if a person like Marta Argerich says “I have rarely come across a talent like Gabriela’s. She is a unique artist.”, well, that means something). But what really caught my attention is that Montero does something very rare: she does classical music improvisation.

Musical improvisation has a strange history in Western classical music. There was a time when it dominated classical music: Bach was considered to be incredible improviser at the keyboard, and so were most of his contemporary musicians like Händel or Telemann. In fact, the tradition claims that some of Bach’s keyboard pieces are just the write down of his improvisations (the best known example is the “The Musical Offering” which includes a three-voice fugue for keyboard and which was probably the improvisation of Bach when he visited Frederick II of Prussia). And, in fact, even when playing “published” music a baroque artist was not only allowed to improvise a little bit here and there, but it was, sort of, expected from him.

But this tradition has disappeared. (I am not talking about jazz here. That is different.) Today a classical musician is supposed to follow the notes, the dynamics, the speed, etc, exactly as written down by the composer. Of course, this is not 100% true, musicians do have a great freedom of expressiveness, otherwise any machine would do. But it is certainly not allowed to deviate from the notes as written down in the music. Improvisation is not expected by the public, rarely practiced by musicians, not taught at conservatories. Actually, if an artist still does it, the “established” community of musicians will very often consider this as not “serious”, not worthy of a real classical musician… It requires a certain guts for a serious performer to do it in public.

Well… and Gabriela Montero has the guts. And that is why it is worth remembering her name. There are a bunch of videos on YouTube; maybe the one I prefer is a baroque style improvisation on Debussy’s Claire de Lune. Quite amazing: improvising counterpoint on the fly… She also has some CD-s where she recorded improvisations she made on Bach tunes (“Bach and Beyond”) or Baroque tunes in general. If you are interested in classical music but you also want to hear something a bit… unusual, then it is worth checking out!

B.t.w., she also has a web site (of course :-) where one can submit her a tune, she would improvise on it and send back the result in MP3. I might check this out sometime…

November 14, 2008

Calais Release 4 and the Linking Data cloud…

Just got to this news via Yves’s blog: Reuters’ Open Calais service comes with a new release in January, and this will bind to the Linked Data cloud. To quote the official blog of Reuters:

Release 4 of Calais will be a big deal. In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.

Ie: when analyzing a text, Open Calais will return URIs into DBPedia, Freebase, Musicbrainz… Thereby opening up the possibility for various of applications that would not be possible (or would be fairly complicated) without. One more step to make it possible to reuse all those data on the Web… Yey!

B.t.w.: I write these lines using WordPress and I have Zemanta’s Firefox plugin running to generate the tags. However, as far as I know (I may be wrong!), the Zemanta service does not provide those URI-s yet (they do provide some URI-s in their return format, but I am not sure those are LOD URIs). Maybe some day?

(Thanks to Yves for drawing my attention on this…)

(Note after the original publication of the blog: it seems I was wrong and Zemanta does have a similar feature, see Andraz’ comment.)

November 9, 2008

Open Archive Initiative’s aggregation vocabulary

A few days ago I read through the OAI’s aggregation vocabulary (officially: “Object Reuse and Exchange”). One of those very targeted, relatively small RDF vocabularies that may, nevertheless, become important in practice. Instead of getting into a description in my own words, let me be lazy :-) and copy from the introduction of the ORE Primer:

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. In this physical world these aggregations are frequently tangible - we can hold the photo album, journal, and CD […].

This practice of aggregating extends to the Web. We accumulate URL’s in bookmarks or favorites lists in our browser, collect photos into sets in popular sites like Flickr […]. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a Flickr set to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

The ORE introduces therefore a strict separation between an Aggregation (which is supposed to be a non Informational Resource, in TAG speak) and a separate Resource Map which is an informational resource describing a specific Aggregation. Ie, the core idea of the ORE spec is to follow the Linked Data principles (and to be in line with the Cool URIs for the Semantic Web note) to describe abstract aggregations.

I tried out ORE in practice. A typical example is my talks. When I make a presentation (like the one I gave in Ghent a few months ago) I publish the slides in different formats (ODP, PDF, and HTML in this case). In this sense the talk itself, which is itself an “abstract” thing, is also an aggregation of the three specific slide sets. The following statements describe the situation in the ORE sense:

<http://www.w3.org/2008/Talks/0822-Ghent-IH/> a ore:ResourceMap ;
     dc:title "Detailed introduction into RDF and the Semantic Web" ;
     ore:describes  <http://www.w3.org/2008/Talks/0822-Ghent-IH/#talk>.

<http://www.w3.org/2008/Talks/0822-Ghent-IH/#talk> a cc:Work, ore:Aggregation ;
     cc:license  a ore:AggregatedResource ;
     ...
     ore:aggregates <http://www.w3.org/2008/Talks/0822-Ghent-IH/HTML/>,
                    <http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.odp>,
                    <http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.pdf> . 

<http://www.w3.org/2008/Talks/0822-Ghent-IH/Slides.pdf> a ore:AggregatedResource ;
     dc:format "application/pdf" .
...

The …08-22-Ghent/#talk is the URI for the “abstract” thing, ie, the talk. When dereferenced, that URI yields ...08-22-Ghent/ which is the “resource map” in ORE talk (and is an informational resource that returns, depending on the required format, HTML or RDF). The RDF above is actually encoded using RDFa, with Apache set up to deliver the different formats. The resource map portion is conveniently put into the header of the HTML file, and the body describes the real aggregation, ie, the talk. (The full RDF content encoded in the HTML file can be accessed directly either in RDF/XML or in Turtle; it contains additional information on the talk, not directly relevant for this blog.)

Time will tell whether this vocabulary will catch up; some of its design decisions can also lead to further discussions. But it certainly is an interesting and potentially important addition to the overall vocabulary landscape!

P.S.: the reason I used the example of my presentation in Ghent is because that is where I first heard about this vocabulary in more details thanks to a short tutorial given by Herbert Van de Sompel, from Los Alamos National Laboratory, one of the co-editors of the ORE spec.

November 4, 2008

Semantic Web for dummies…

Filed under: Semantic Web, Work Related — Ivan Herman @ 20:14
Tags: ,

A possible sign that a technology is getting into the mainstream is when a book is published in the “XXX for dummies” series. Well, I just realized today that the “Semantic Web for dummies”, by Jeff Pollock, is in the publication pipeline, to be out in the bookshops in March! We are getting there…

October 31, 2008

ISWC2008, Karlsruhe

ISWC2008 has just finished (I am still at the hotel, leaving for home in a few hours). As usual, it is very difficult to give an exhaustive overview of the whole conference, not only because there were way too many parallel things going on, but everyone’s interests are different… These are just a few impressions. Still have to find time reading through some of the papers in more details.

Great keynote by John Giannanderea from Metaweb, ie, freebase. Freebase has always been an exciting project but the great news from the Semantic Web community’s point of view is that freebase has opened its database to the rest of the World in RDF, too. As such, freebase will soon become part of the Linking Open Data cloud (I guess there are still some details to be ironed out, and I saw John and Chris Bizer starting to discuss these). Actually, it was also interesting to hear again and again from John that the internal structure of freebase is based on a directed, labeled graph model, because that was the only viable option for them to build up what they needed. Sounds familiar?

An interesting point of the keynote was when John was wondering whether Metaweb is therefore a Semantic Web company or not. He thought that yes, it is, because the internal structure is compatible with RDF, it relies on identifiers with URIs, and is Web based. But he also thought that, well, it is not because… no description logic is in use, nor ontologies. Sigh… This still reflects the erronous view that one must use description logic to be on the Semantic Web. Wrong! So I went up to the mike and welcomed Metaweb in the growing club of Semantic Web companies…

Among the many papers I was interested in, let me refer to the one of Eyal Oren et al., “Anytime Query Answering in RDF through evolutionary algorithms” and, actually, a related submission from the same research group to the Billion Triple Challenge, called MaRVIN. In both cases the issue is that while handling very large datasets one might not necessarily want or is interested in _all_ solutions to a given query (or inferences, in case of MaRVIN) but, rather, whatever can be reached within a reasonable time. Ie, essentially, trading completeness for responsiveness. Whether genetic algorithms are the answer, as explored by Eyal and friends, or some other techniques, nobody knows; as Eyal clearly acknowledged, these are first attempts and we have to wait a few more years and furter results to get a feeling where it will lead. But the direction is really interesting.

This actually leads to what was, for me, the highlight of the conference, namely the SW Challenge, both the traditional Open Call as well as the new Billion Triple Challenge (there more details on both on the challenge’s web site). The entries were really impressive. As Peter Mika said in his closing comments on the challenge, long gone are the days when a challenge was some techie keyboard manipulation; the entries all had great user interface design, with the real regards to non-expert end users who may or may not know (and probably do not care) that the underlying technology is Semantic Web.

Among the finalists in the open call Chris Bizer presented DBPedia Mobile, (see also their site) ie, a system to access the full power of DBPedia (and, actually, the LOD cloud in general) from an iPhone via a proxy somewhere on the Web. The proxy is actually a hugely powerful environment, making use of Falcon and Sindice, and a bunch of query engines distributed over the network, all peeking into the LOD cloud and, actually, adding items to it, eg, photos taken on the iPhone. A few years ago all this would have had a SciFi edge to it, and now it was running at the conference…

Eero Hyvönen showed their HealthFinland portal (see also their site), soon to be deployed by the Finnish health authorities. Half of the system is, shall we say, more “traditional” (hm, well, what this means is that it would have been revolutionary two years ago:-), a number of serious ontologies governing health related data integration and search into the data. However, what I found exciting is the other half. Indeed, Eero and friends realized that search facets derived from serious ontologies are not really ideal for everyday end users. Therefore, they made a survey among users, derived a number of terms to be used on the user interface level, and bound these terms internally to the ontology. The result is a much more friendly system that still has the power offered by ontology directed search.

Actually, having Eero’s and Chris’ system presented side by side was also interesting from another point of view, namely to show that there are cases when using serious ontologies is important and there are cases when it isn’t. When I use an iPhone to navigate in a city and get information about, say, historical buildings then a bit of scruffiness is really not a problem. Speed, interaction, richness of data is more important. However, when it comes to, e.g., health issues, I must admit that I am prepared to wait a bit if I am sure that the results go through the rigorous inference and checking processes that one can achieve through the usage of formal ontologies. This is not the place when one should tolerate scruffiness. The stack (or, to quote Eric Miller, the “menu”) of Semantic Web technologies is rich enough to allow for both; choose what you need! All those discussions description logic vs. Semantic Web in general is futile in my view…

And then came benji’s paggr system (which actually won the Challenge in the Open Call track). Are you user of netvibes, iGoogle, or the new Yahoo user interface? Then you know what it means to quickly build up a Web page using small widgets accessing RSS feeds, stock quotes, clocks, etc. Now imagine that each of these widgets is in fact a small sparql query with some wrapper to present the result properly. Package that into a nice user interface that benji has always been a master of, and you get paggr. Not yet public, but I already signed up to play with it as soon as it is… This will really be cool!

As for the Billion Triples challenge: I already referred to MaRVIN, but there were a bunch of others like SearchWebDB or SemaPlorer, or SAOR. In some cases massively parallel storage approaches, not only offering near real time (federated) SPARQL query possibilities, but, in some cases, preprocessing it with a lower level RDFS or OWL fragment inferencing. All that done starting with millions of triples integrating all kinds of public datasets, yielding storages going beyond the 1 Billion triple mark. And let us not forget that this mark had already been reached by companies such as Tallis or OpenLink, so these new architectures just add to the lot… These were also particularly interesting with and eye to the new OWL RL profile that is being defined in the W3C OWL Working group and which aims at exactly such setups.

Let me finish with another remarkable entry, although this one did not win a price. i-MoCo created a small navigation system over a triple store containing “only” 250 million triples. So what is the big deal, you might say? Well, all the triples were stored on… an iPhone! So the next challenge will probably be to get, say, 10 billions of triples or more on your phone. Just wait a few years…

October 15, 2008

Semantic Web and uncertainty

The issue of uncertainty on the Semantic Web has been around for a while now, although it is still largely a research issue (Though not only; C&P has an extension of their Pellet tool to handle a particular probabilistic extension of OWL; but I am not aware of any other commercial system of the kind.) Ken and Kathryn Laskey and Paulo Costa have been organizing a series of workshops (the URSW series) on the subject for several years now (there will be one on the coming ISWC2008 conference, too!), and there was also a W3C Incubator Group on the subject that issued a report not a long time ago. But still a lot to be done…

The reason I remembered all that is because I found a survey of Thomas Lukasiewicz and Umberto Straccia[1] that is worth reading if you are interested in the subject. The survey gives a separate description of probabilistic, possibilistic, and fuzzy extensions of the DL dialect that is at the basis of OWL DL, together with further references if one wants to dig deeper (156 of those!). It is not an easy read at all, and I couldn’t say I understood all the details described there, far from it… But as all good surveys do it gives you an idea or, or refreshes your memory on what is happening in the area. And that is always incredibly useful.

The approaches described in the paper are fairly high level in the sense that (as the authors emphasize, too) the extensions are all on top of SHOIN(D), ie, OWL DL. That makes the constructions sometimes quite complex (mainly in the probabilistic case) and they are probably difficult to use for a lambda user. (Although, who knows. They are certainly complex to implement, but maybe the usage is not that bad. I am not sure.) However, the authors themselves refer to alternative approaches on top of simpler DL dialects (without giving too much details). It would be nice to have a survey on the extensions of a level corresponding to simpler OWL profiles like OWL RL that the OWL WG at W3C is also working on now. Just as OWL RL might be a good “entry point” for a large family of users into the world of OWL, an uncertainty extension of that level might be of a great interest, too…

Reading this survey also reminded me of short paper by Fensel and van Harmelen[2], “Unifying Reasoning and Search to Web Scale”, on which I had a very short blog a while ago. I just wonder whether fuzzy or probabilistic reasoning may not be a good approach to the problems they describe there… Althought this is clearly still a long way off.

Anyway. I learned something today…

  1. Lukasiewicz, Thomas, and Umberto Straccia. “Managing uncertainty and vagueness in description logics for the Semantic Web.” Journal of Web Semantics: Science, Services and Agents on the World Wide Web 6, no. 4 (2008). Available on line as a pre-print.
  2. “Unifying Reasoning and Search to Web Scale”, by Dieter Fensel and Frank van Harmelen, IEEE Internet Computing, Volume 11, No. 2, March/April 2007.

October 4, 2008

Internationalization and smart phones: an unhappy marriage?

Filed under: General, Private, Work Related — Ivan Herman @ 18:01
Tags: , , , , , ,

I recently went through the process of renewing my mobile contract which (in Europe) is usually a good opportunity to update one’s phone. Although my previous (smart) phone, a Nokia 9300i, served me well, an upgrade to a newer model is always a good idea. However, it turned out to be more complicated than I thought…

The complication is that I am a little bit off the beaten track, so to say. I live in the Netherlands, but I usually work using English, and I have text (addresses, data) on my smart phone in Hungarian. This also means using characters specific to this language (ie, ű, ő). Ie, I need a system in English, but with the possibility to, somehow, type in those characters, too. I have lists of all my books, CD-s, etc, that I have been maintaining for many years and I’d like to have around on my smart phone. I would not think this is too much to ask…

Of course, following the hype, I looked at the iPhone. Although I must admit I do not really sympathize with the business approach taken by Apple for iPhones and its applications, I thought I would have a look nevertheless. But… Apple doesn’t speak Hungarian. Neither does it speak Czech, Croatian, and other Central European languages for that matter, except for Polish. This means that there is no way one can type in those characters (and I am not sure it could display them all right). With all the hype around the user friendliness of Apple I was shocked to see them forgetting about cca. 30-35 million people who would simply want to use their own language properly. Exit Apple’s iPhone…

Next stage was Windows Mobile based smart phones; after all, it claims to be Unicode based! And there are some very sexy models out there these days (like, the HTC Touch Pro or Samsung’s Omnia), which try to compete with the iPhone. So I had a look. Using an English model the system gives you the possibility to use a virtual keyboard, and this indeed gives the option of using a “symbol” pad containing all kinds of characters including my Hungarian ones. A little bit awkward but, well, one can live with it. So, for a moment, I thought I was sold! But then came the shock: there is no way one can get a Windows Mobile phone with an English operating system in the Netherlands! Providers can give you Dutch systems only. To add insult to injury, for some reason or other, the Dutch system does not include that extra symbol key pad. (Why?) Ie, even if I accepted to use a Dutch system, it would not be usable. Exit all Windows Mobile devices…

My next target was Nokias E90. A slightly older concept than these sexy new breed of smart phones, no touch screen, no animation but, after all, who really cares if otherwise it does the job? It is sold as an upgrade of the old 9300i (where I had no problem with those characters), so I expected to have all features I was looking for without any problems. Wrong…:-( The E90 (ie, Symbian S60, the operating system) indeed offers you a way to type in accented characters. But, as a default, only the Western ones… Ie, no problem typing in œ, or ç, but no ű or ő (or characters like ř, č, ł, to refer to non-Hungarian ones, too). Ie, the E90 is actually a step back compared to its predecessor, where typing in all these characters was not a problem.

Dead end? Well, almost. Thanks to my colleague, Steven Pemberton, we found out that Symbian gives you the possibility to switch languages via what it calls “writing aids”. This changes the available character set. The models sold in the Netherlands have English, Dutch, and… Romanian. Why Romanian I have no idea. But I was lucky: although the Romanian language does not use ű or ő, it so happens that there is a significant Hungarian minority living in Romania, so the character set for Romanian included those two characters, too. Ie, I was off the hook, but that was shere luck, not design. If I want to type in a, say, Czech character (eg, if I buy a new CD of Dvořak) then, well, I will have to do some copy paste:-( But I had no choice so, after all, I decided to live with that, and I am now the happy owner of a Nokia E90. Story ends.

Don’t take me wrong. For a bunch of other things the E90 is a very very good smart phone, has a much faster processor than the 9300i, Web access is really a breeze (it uses Safari, afaik), it looks and feels great. Ie, it serves my purpose after all. But I dream of a time when internationalization is not a pain but a natural part of these devices (or any other device, for that matter)…

September 28, 2008

ESTC2008, Vienna

I had the pleasure, last week, to be at the 2nd European Semantic Technology Conference in Vienna, Austria. I had a really good week…

The conference was not all that big, cca. 200 participants, 70-75% from industry, plus a small exhibition. It would be all too easy to dismiss the conference due to its size, when compared with SemTech which had more than 1000 participants. But that would really be unfair: first of all, despite the best efforts of the European Commision, Europe is still very much a divided market (one of the feedbacks I heard was to say “Austria is really too small as a market”) which does affect attendance and, as Mark Greaves reminded us at the closing panel, SemTech was also a smaller event a few years ago and it is only lately that it made a huge jump in attendance. Ie, there is room and good prospects for ESTC to grow!

It is always difficult to give a good overview of such a conference; due to the parallel sessions one is bound to miss most of it:-(. Just some highlights from my own perspective.

A passage in an old palace in Vienna

A passage in an old palace in Vienna

One interesting aspect the high profile of technologies aiming at extracting structures from unstructured content, typically text. Three out of the four keynotes (Peter Jackson’s from Reuters, Hans Uszkoreit’s from DFKI, and Hugo Zaragoza’s from Yahoo!) were either fully or partially concentrating on this.  A number of other presentations also touched upon this as part of developing Semantic Web applications, and there were also hallway conversations on the usage of public services like Open Calais or Zemanta. Although these services have not been developed exclusively for the Semantic Web, they are clearly extremely useful for that applcation area, too. It is also interesting to see that Wikipedia (and, by extension, DBPedia) URI-s begin to play a more an more important role, through these services, as reference URI-s. (I had a blog a while ago which also generated a modest discussion, if you are interested).

Another recurring topic was the “long tail” (shame on me but I must admit I did not know this business term). Orestis Terzidis, from SAP, gave a nice keynote showing, through some SAP case studies, how Semantic Web technologies can be very useful in exploiting the business opportunities in this “long tail” through the flexibility, the possibilities for adaptation and personalization, etc, that they can provide. A good example was the presentation given by Liberté Crozon from Discotheka outlining the plan to exploit SW techniques to build a really good archiving and search services for classical music (as a fan of classical music I know all too well what a fosterchild it is on the music related web sites…).

What else? I had discussions, chats with people from well known tool vendors like Franz Inc, Aduna, or Ontotext; it is always nice to see what they do, what new and cool things they come up with (and they do come up with new things, check out the presentations like the ones of …). The talk on the NeOn project (by Mathieu d’Aquin) was interesting; the NeOn toolkit has the potential in becoming a major player as a tool to develop various ontologies. Not only OWL-DL, but also other dialects of OWL (OWL Full, OWL RL, etc), SKOS, RDFS, and all that in a possibly distributed setting. Cool stuff in the making! Leo Sauermann gave a nice presentation on Semantic Desktops; although there is a SW public Use Case on its already (beyond a bunch of papers), it is always good to have an additional insight. Raphael Volz announced a new tool (still in alpha, though) on managing information on persons for persons. Finally, let me quote here one of the slides of David Norheim (who presented a nice application for Norwegian public schools):

- New standards (e.g. SPARQL), proposals for standardization (e.g. SPARUL), new tools (e.g. Jena), open source (e.g. Tomcat, Apache), lack of good documentation all say high risk!!!!

- However, the support and maintenance from the W3C community and open source developers (e.g. Jena team) has been impressive, the support through IRC channels, mailing lists etc has been invaluable for the project.

I take that as a compliment for the SW community at large!

Finally, on a more private note: there was a time when I had to go to Vienna quite frequently for business reasons but, nevertheless, the city never ceases to amaze me with all the things you can see there. I discovered this lovely passage just across the conference site…

There will be an ESTC2009: the dates are already set (2009-09-30 to 2009-10-02) although the location isn’t. But it is worth adding this to your agenda…

September 1, 2008

Just acting as a go-between: Dave’s blog on XBRL

Filed under: Semantic Web, Work Related — Ivan Herman @ 17:59
Tags: ,

Dave Raggett’s blog may not be regularly read by SW people, so I thought it would be worth raising some attention. If you are interested by XBRL and Semantic Web, his blog on the subject may be of interest…

July 5, 2008

Low hanging… dogfood?

Filed under: Semantic Web, Work Related — Ivan Herman @ 10:52
Tags: ,

This should be, actually, a comment on Péter’s comment on my previous blog, but it really becomes a separate topic. Ie, I decided to put it into a separate blog. Besides, it is a bit too long for a comment…

To summarize, the JWS journal has a pre-print service running, as a back end, the openacademia software developed by Péter and his friends. Which also means that the JWS data should be accessible in RDF, probably following the the SWC ontology (although I have not found a pointer on the JWS site).

But, if so, don’t we have a low hanging, hm, dogfood here for the SW community? We begin to have most of the recent SW publications in RDF somewhere on the net. Beyond the JWS papers the Semantic Web Conference Corpus site not only includes the RDF data for ISWC, ESWC, ASWC, and some related workshops, but it also has a SPARQL endpoint. I know that Daniel Schwabe is working on getting the WWW2008 conference material into a similar format and, hopefully, we can have the material available for the WWW200X conferences available somewhere on the Web. I maintain a list of books on a wiki (well, hopefully, the community maintains it…) but I also keep the same list on Bibsonomy, and the list is therefore available in RDF, too (again, using the SWC ontology). And there might be other resources that I do not know about.

So… the easy thing to do is to integrate all this RDF data via some SPARQL endpoint. Because the data is already in RDF, that does not cost anything (although I am not 100% sure all the data follow the same vocabulary, so querying might be a bit tricky). But what I would love to see is to have a general service with a nice user interface on top of the data. I want to be able to search easily through the data without writing SPARQL queries or dive into the RDF graph directly with an RDF browser. The scale can be tricky. A few weeks ago David Huynh created a nice exhibit page for the ESWC2008 data. It really looks great and helps a lot in searching the data. However… as an experiment I copied his file, and added a few more datasets from the SW Corpus. Well… it turned out to be too much for Exhibit (I may have made a mistake somewhere, of course, but I do not believe Exhibit is good enough for that amount of data). Ie, a more dedicated interface should be created to provide this service for end users (maybe along the lines of openacademia?).

And, of course, it is easy to have nice ideas on how to add new features with all the data around… For example, the book wiki page has references to Chris Bizer’s bookmashup data via the ISBN numbers. We could use DBpedia and Geonames to access information on conference cities, FOAF data on authors and editors… We could use some good service (like MOAT) to have a uniform tagging system for the papers’ topic, or use Ed Summers’ Library of Congress Subject headings in SKOS… In other words, this could become a nice LOD application, too! (Hm, maybe it is not such a low hanging dogfood after all?)

What I would really like is to get a comment on this blog saying “you uninformed fool, this already exists here and here!”. I would humbly stand corrected, and would happily use the service. Anyone with this comment?

Next Page »

Blog at WordPress.com.