OAI-PMH and Syndication

Overview

Open Archives Initiative - Protocol for Metadata Harvesting

What's a Protocol?

In computing, a protocol is a convention or standard that controls or enables the connection, communication, and data transfer between two computing endpoints. In its simplest form, a protocol can be defined as the rules governing the syntax, semantics, and synchronization of communication.

http://en.wikipedia.org/wiki/Protocol_(computing)

What's syndication?

http://en.wikipedia.org/wiki/Web_syndication

What's this presentation about?

protocols for web syndication.

so the transport layer will be HTTP

but what else could we get from HTTP?

Characteristics of OAI-PMH

data model and terminology:

items - partitioned(?) in sets - provide metadata about resources, disseminated as records.

OAI-PMH and HTTP

service oriented as opposed to resource oriented:

http://www.openarchives.org/OAI/openarchivesprotocol.html

OAI-PMH and HTTP (2)

Here's what it looks like:

http://wals.info/register/oai?verb=Identify

GET /register/oai?verb=Identify HTTP/1.1
Host: wals.info
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.18) Gecko/20081113 Ubuntu/7.10 (gutsy) Firefox/2.0.0.18
Accept: application/rdf+xml, application/xhtml+xml;q=0.3, text/xml;q=0.2, application/xml;q=0.2, text/html;q=0.3, text/plain;q=0.1, text/n3, text/rdf+n3;q=0.5, application/x-turtle;q=0.2, text/turtle;q=1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: UTF-8,*
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Fri, 05 Dec 2008 11:56:36 GMT
Server: Apache
Keep-Alive: timeout=15, max=98
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/xml; charset="utf-8"
----------------------------------------------------------

The verbs

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages

The Metadata Formats

while oai-pmh allows disseminating any kind of xml-based format that is described with an xml schema, there's one format that each data provider has to support:

oai_dc: unqualified dublin core metadata

minimal consensus on metadata quality

Problems with OAI-PMH

How about Atom?

Atom is a feed or syndication format.

Mapping OAI-PMH to Atom

Details

http://blogs.law.harvard.edu/pkeane/2008/06/26/oai-ore-atom/ http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/ http://wwmm.ch.cam.ac.uk/blogs/downing/?p=101

Atom and HTTP

Requesting an Atom feed:

http://blog.livingreviews.org/feed/atom/

GET /feed/atom/ HTTP/1.1
Host: blog.livingreviews.org
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.18) Gecko/20081113 Ubuntu/7.10 (gutsy) Firefox/2.0.0.18
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: UTF-8,*
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Mon, 08 Dec 2008 17:15:31 GMT
Server: Apache/2.2.0 (Linux/SUSE)
X-Powered-By: PHP/5.1.2
X-Pingback: http://blog.livingreviews.org/xmlrpc.php
Last-Modified: Tue, 11 Nov 2008 15:07:00 GMT
Etag: "2424f04ad8a82391e74aa577200f9733"
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/atom+xml; charset=UTF-8
----------------------------------------------------------

What about unAPI?

http://unapi.info/

Conclusion

Content Syndication

OAI-ORE vs. Media RSS http://en.wikipedia.org/wiki/Media_RSS http://search.yahoo.com/mrss

general aggregations vs. media:group