A Practical Look at the Semantic Web

The faint murmurs I’ve been hearing related to the Semantic Web are becoming louder. Is it the Next Big Web Thing? Or just another vision that will never be fully realized – another solution looking desperately for a problem?

I must admit I’ve been a bit puzzled by the hype – but then I’ve never been one to jump on a bandwagon until I know just where it’s headed. So I started to dig a little deeper into this one, and I’m starting to see how it will be useful. I won’t prognosticate here as to how soon. Rather, I’ll spend a little time explaining what it is and what “problem” it solves, and let you do your own crystal-ball-gazing.

Before I start, a bit about my background. I’m an old-school computer guy. Learned BASIC on a Commodore VIC-20. One of my college profs had kids (nick)named COBOL and FORTRAN. I knew databases before they were relational, and languages before they were object-oriented. So I’ve seen a few major technology shifts (mainframes to PC’s to networked PC’s to the Internet, flat files to B-trees to relational databases to OODB’s, etc.).

I don’t expect the Semantic Web to be as game-changing as any of these. But I do think it’ll be useful and catch fire at some point – and will be of benefit to both consumers of technology and companies that make money through innovative use of technology.

The Semantic Web – What Is It?

I like to explain things through analogy. If I miss your frame of reference here, sorry – I tried. Remember the good old days when lots of data sat around in “flat files?” Maybe they were “positional” in nature, with the first 10 characters representing a social security number, the next 20 a first name, the next 20 a last name, etc. Or maybe they were delimited, with commas, tabs or other characters separating fields. At any rate, they all had one thing in common – a person (let alone a computer program) couldn’t interpret them without the “magic key” – some form of metadata that explained how the contents were arranged. For mainframe files, the “magic key” was often a COBOL file layout. You get the idea.

Fast forward a few years to the advent of relational databases. Suddenly, information was well-organized, and – here’s the kicker – self-explanatory. Not only could you easily run a query and get rows of neatly organized data with explanatory headings, you could also query database metadata – the schema – to inquire about the contents of the database.

Okay. Now, fast forward another decade and a half or so, to the dawn of the Web. Suddenly we have massive amounts of information available for perusal, and it’s (usually) pretty understandable by human readers. But don’t ask software to make heads or tails of an HTML document. A web page might contain information about an individual – name, address, phone number, freckle count, favorite ’80’s hair metal band, etc. – but that information isn’t organized in such a way that a software parser can pick it apart and know what it has (see below).

<html>
<body>
<p>My first name is Jeff.</p>
<p>My last name is Shurts.</p>
<p>My phone number is 555-1212.</p>
</body>
</html>

Enter the Semantic Web. If you were asked to write a piece of software that would dig through a file and display someone’s first name, last name and phone number, wouldn’t you be happier if that file’s content looked like this?

<?xml version="1.0" encoding="utf-16" ?>
<person>
<firstName>Jeff</firstName>
<lastName>Shurts</lastName>
<phoneNumber>555-1212</phoneNumber>
</person>

Ah, much better. Give me a DOM or SAX parser and I’m golden. No ambiguity, no tricky parsing tricks. But wait – XML has been around forever – that can’t be the essence of some Next Big Thing, right?

Right. Plain old XML is great for use in applications where there are a limited number of interested parties, all of whom agree on the format (schema, or DTD) of the data being exchanged. And XML has been used very effectively in this way – expecially in Web Services.

But back to our assignment from a few paragraphs ago. What if we were given a trickier problem… say, to write a web crawler that would scan millions of web pages looking for information about people – specifically first and last names, and phone numbers. We aren’t given a sample XML file or an XML schema. We just know we want information about people.

Now we have a nightmare on our hands. We might be able to look for XML documents (giving up on HTML altogether), and look for tags like <firstName>Jeff</firstName>. But we don’t have any guarantees that the tags were used in a manner consistent with their apparent meaning. Wouldn’t it be great if there were a way that people could (and would) encode data such that it was very clear, in a globally-understood manner, what the data means?

Enter RDF (Resource Description Framework). Based on XML, but forging on to solve bigger and more sophisticated problems, RDF provides a syntactic means of encoding very specific intent along with information. Here is an example:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/0.1/foaf/" >
<foaf:firstName>Jeff</foaf:firstName>
<foaf:lastName>Shurts</foaf:lastName>
<foaf:phone>555-1212</foaf:phone>
</rdf:RDF>

This document not only teases out the interesting bits into discrete fields, but also denotes that these fields are very specific attributes that are defined by a particular well-known model (in this case, FOAF, or Friend of a Friend). In a way, RDF does for web-hosted data what WSDL (Web Services Discovery Language) did for web-hosted services – it provides a well-understood description that can be programmatically interrogated and understood.

Okay, I get it. But how is this a game-changer?

Well, so far, it’s not. But it might be, and its chances are getting better. Earlier this year, Yahoo announced plans to include Semantic Web features in its search engine, to provide richer, more intelligent searches, as well as enhanced output from search results.

For those of you who know SQL, wouldn’t it be pretty nice to be able to run the equivalent of this query – not against a local Oracle table, but against the Web?

SELECT firstName, lastName
from People
where lastName = 'Shurts'
and homeCity = 'Saint Charles'

Voila – the Web as a giant database! Starting to see how this might be a game-changer? Finding just the information you are looking for with today’s search engines is an art – you have to find just the right set of keywords, and then hope for the best. In a semantically-rich future, you’ll be able to scour the web with the precision of a database query.

And, to quote the old Ronco commercials I used to watch when I wasn’t writing BASIC code on that VIC 20: “But wait, there’s more!” Not only will information searches be more precise, the results of those searches will be presented in a richer, semantically meaningful way. Instead of the ubiquitous three-liner (page title, URL, and a snippet) for each match, you will one day be presented information that makes sense given the context of your query. You searched for people? You get business-card-like search results. You searched for CD’s? You get track listings and cover art for each CD that matched your query (e.g. select * from albums where backgroundSinger includes ‘Michael McDonald’).

In conclusion…

Well, there it is. That’s as much of the Semantic Web as I’ve been able to wrap my head around so far. So what do you think? Is it a game changer? Or just incremental improvement over current Web technologies? Will it take off, or are there barriers to adoption? (If I’m an online merchandiser, do I really want to help web crawlers snarf my prices so they can include them in comparison-shopping sites?) I’d love to hear from you!

One thought on “A Practical Look at the Semantic Web

Leave a Reply

Your email address will not be published. Required fields are marked *