<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RKB Blog</title>
	<atom:link href="http://www.rkbexplorer.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.rkbexplorer.com/blog</link>
	<description>For RKBExplorer, sameAs.org and related stuff.</description>
	<lastBuildDate>Tue, 24 Nov 2009 12:30:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Good citizenship on the Web of Data</title>
		<link>http://www.rkbexplorer.com/blog/?p=33</link>
		<comments>http://www.rkbexplorer.com/blog/?p=33#comments</comments>
		<pubDate>Thu, 06 Aug 2009 10:14:21 +0000</pubDate>
		<dc:creator>hg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rkbexplorer.com/blog/?p=33</guid>
		<description><![CDATA[&#60;executive_summary&#62;
&#8220;If you consume open data you should publish as open data.&#8221;
I&#8217;m going to call this Principle 5, as things need to be named, and also of course give it a URI: http://www.rkbexplorer.com/blog/?p=33
What am I talking about?
In the bright new world, there will be lots of data from government (and elsewhere) out there for others to consume. [...]]]></description>
			<content:encoded><![CDATA[<p>&lt;executive_summary&gt;</p>
<p>&#8220;If you consume open data you should publish as open data.&#8221;<br />
I&#8217;m going to call this Principle 5, as things need to be named, and also of course give it a URI: <a href="http://www.rkbexplorer.com/blog/?p=33">http://www.rkbexplorer.com/blog/?p=33</a></p>
<p>What am I talking about?</p>
<p>In the bright new world, there will be lots of data from government (and elsewhere) out there for others to consume. Much of the discussion of this list is to advise the owners of that data to publish it in a way that makes it easily accessible.<br />
However, we should remember that every time some of that data gets consumed, new data is generated. And the people who have generated that data should feel similar or greater pressure of obligation to publish as open data to that pressure applied to the publishers of the data they consumed.</p>
<p>One might say: &#8220;What&#8217;s sauce for the goose is sauce for the gander.&#8221;</p>
<p>&lt;/executive_summary&gt;</p>
<p>Of course many people already do this, but I think it is worth pointing it out as people start to build more systems that consume.<br />
It is quite easy to build a system that publishes, as long as it is designed in from the start; on the other hand, having built an intricate web page, it can be incredibly time-consuming and even very difficult to add on the data publishing facilities later.</p>
<p>As a simple example of what I mean, let&#8217;s take a site that consumes education data, and tells you a little bit about it, perhaps by showing a map where I can click on an area and find out how many kids go to school there.<br />
Does the site publish URIs for each of these statistics?<br />
Is it easy to find these things, or do I need some complex API?<br />
Can I get a dump of the whole dataset?<br />
What formats are offered?<br />
Are there interesting html fragments that someone else might use?<br />
Is the license clear?<br />
Even, is there a SPARQL or other querying endpoint?</p>
<p>I think quite often a consumer who does something like this doesn&#8217;t really think they have generated much data, and so doesn&#8217;t engage with publishing; but each step along the way adds value, and they should celebrate the fruits of their labours by making them easily accessible.<br />
Even taking some data and doing a nice html rendering can be really useful<br />
to someone who just wants to add something interesting to their own page.</p>
<p>This leads to another issue on dataset directories.<br />
We should not consider it satisfactory just to list the source datasets:- we should consider everything a source, and so try to record a graph of dataset derivation.</p>
<p>Finally, why call it Principle 5?<br />
That relates to Linked Data &#8211; there are four principles at the moment:- I happen to think that this is so important that Tim might decide to add it as a fifth principle: &#8220;If you consume Linked Data, you should publish as Linked Data&#8221;.</p>
<p>As they say, my 2 cents worth.<br />
Best<br />
Hugh</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rkbexplorer.com/blog/?feed=rss2&amp;p=33</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How do we generate the sitemap.xml and submit to the search engines?</title>
		<link>http://www.rkbexplorer.com/blog/?p=25</link>
		<comments>http://www.rkbexplorer.com/blog/?p=25#comments</comments>
		<pubDate>Sun, 02 Aug 2009 11:45:02 +0000</pubDate>
		<dc:creator>hg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rkbexplorer.com/blog/?p=25</guid>
		<description><![CDATA[#!/usr/bin/php -q
&#60;?php
require_once &#8220;/usr/lib/rkb/functions-utf.inc.php&#8221;;
/**
generates sitemap.xml for a linked data site that has
a triplestore which resolves URIs to provide a Symmetric Concise Bounded Description,
as well as a SPARQL endpoint,
and RDF files which are the source that populated the triplestore.
Now submits to some search engines:
SWSE
*/
$usage = &#8220;Usage: {$argv[0]} sub_domain_name\n&#8221;;
if(!isset($argv[1])) die($usage);
$base_domain = &#8220;rkbexplorer.com&#8221;;
$sub_domain = $argv[1];
$domain = $sub_domain.&#8221;.&#8221;.$base_domain;
$outfile = &#8220;../$domain/sitemap.xml&#8221;;
$file [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">#!/usr/bin/php -q</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">&lt;?php</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">require_once &#8220;/usr/lib/rkb/functions-utf.inc.php&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">/**</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">generates sitemap.xml for a linked data site that has</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">a triplestore which resolves URIs to provide a Symmetric Concise Bounded Description,</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">as well as a SPARQL endpoint,</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">and RDF files which are the source that populated the triplestore.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Now submits to some search engines:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">SWSE</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">*/</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$usage = &#8220;Usage: {$argv[0]} sub_domain_name\n&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if(!isset($argv[1])) die($usage);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$base_domain = &#8220;rkbexplorer.com&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$sub_domain = $argv[1];</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$domain = $sub_domain.&#8221;.&#8221;.$base_domain;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$outfile = &#8220;../$domain/sitemap.xml&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$file = fopen($outfile, &#8220;w&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$slicing = &#8220;subject-object&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$name = &#8220;&#8221;; if (file_exists(&#8221;../$domain/about/name.txt&#8221;)) $name = trim(entities2accents(file_get_contents(&#8221;../$domain/about/name.txt&#8221;)));</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$typical = &#8220;&#8221;; if (file_exists(&#8221;../$domain/about/typical.txt&#8221;)) $typical = trim(file_get_contents(&#8221;../$domain/about/typical.txt&#8221;));</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$updated = &#8220;&#8221;; exec(&#8221;/var/www/vhosts/wildcard.rkbexplorer.com/repositories/tools/rkb-utils last-update-w3c &#8220;.$sub_domain, $updated);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$changefreq = &#8220;monthly&#8221;; if (file_exists(&#8221;../$domain/about/changefreq.txt&#8221;)) $name = trim(file_get_contents(&#8221;../$domain/about/changefreq.txt&#8221;));</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;&lt;?xml version=\&#8221;1.0\&#8221; encoding=\&#8221;UTF-8\&#8221;?&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;&lt;urlset xmlns=\&#8221;http://www.sitemaps.org/schemas/sitemap/0.9\&#8221;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;        xmlns:sc=\&#8221;http://sw.deri.org/2007/07/sitemapextension/scschema.xsd\&#8221;&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;  &lt;sc:dataset&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;sc:linkedDataPrefix slicing=\&#8221;$slicing\&#8221;&gt;http://$domain/id/&lt;/sc:linkedDataPrefix&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;sc:sparqlEndpointLocation&gt;http://$domain/sparql/&lt;/sc:sparqlEndpointLocation&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$models = opendir(&#8221;../$domain/models&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">while (false !== ($model = readdir($models))) {</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if (preg_match(&#8217;/\.rdf$/&#8217;, $model) || preg_match(&#8217;/\.ttl$/&#8217;, $model) || preg_match(&#8217;/\.n3$/&#8217;, $model) || preg_match(&#8217;/\.turtle$/&#8217;, $model) || preg_match(&#8217;/\.ntriples$/&#8217;, $model))</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;sc:dataDumpLocation&gt;http://$domain/models/$model&lt;/sc:dataDumpLocation&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">};</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">closedir($models);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;sc:datasetURI&gt;http://$domain/&lt;/sc:datasetURI&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;sc:datasetURI&gt;http://$domain/id/void&lt;/sc:datasetURI&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if ($name != &#8220;&#8221;) fwrite($file,&#8221;    &lt;sc:datasetLabel&gt;$name RDF dataset from RKBExplorer.com&lt;/sc:datasetLabel&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">if ($typical != &#8220;&#8221;) fwrite($file,&#8221;    &lt;sc:sampleURI&gt;$typical&lt;/sc:sampleURI&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;lastmod&gt;$updated[0]&lt;/lastmod&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;    &lt;changefreq&gt;$changefreq&lt;/changefreq&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;  &lt;/sc:dataset&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fwrite($file,&#8221;&lt;/urlset&gt;\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fclose($file);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$sitemap_url = &#8220;http://$domain/sitemap.xml&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">// Submit to SWSE</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">print &#8220;Submitting $domain to SWSE: \n&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$ch = curl_init(&#8221;http://swse.deri.org/ping?sitemap=$sitemap_url&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">curl_exec($ch);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">print &#8220;\n&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">curl_close($ch);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">// Submit POST request to Sindice</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">print &#8220;Submitting $domain to Sindice: \n&#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$data = &#8220;url=&#8221;.urlencode($sitemap_url);;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$fp = fsockopen(&#8221;sindice.com&#8221;, 80);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fputs($fp, &#8220;POST /api/v1/sitemap HTTP/1.0\r\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fputs($fp, &#8220;Host: sindice.com\r\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fputs($fp, &#8220;Content-type: application/x-www-form-urlencoded\r\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fputs($fp, &#8220;Content-length: &#8220;. strlen($data) .&#8221;\r\n\r\n&#8221;);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fputs($fp, $data);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">//  read result back from the sindice server</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">$result = &#8221;;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">while(!feof($fp)) $result .= fgets($fp, 128);</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">fclose($fp);</div>
<div>#!/usr/bin/php -q</div>
<div>&lt;?php</div>
<div>require_once &#8220;/usr/lib/rkb/functions-utf.inc.php&#8221;;</div>
<div>/**</div>
<div>Ian MIllard and Hugh Glaser</div>
<div>generates sitemap.xml for a linked data site that has</div>
<div>a triplestore which resolves URIs to provide a Symmetric Concise Bounded Description,</div>
<div>as well as a SPARQL endpoint,</div>
<div>and RDF files which are the source that populated the triplestore.</div>
<div>Now submits to search engines:</div>
<div>*/</div>
<div>$usage = &#8220;Usage: {$argv[0]} sub_domain_name\n&#8221;;</div>
<div>if(!isset($argv[1])) die($usage);</div>
<div>$base_domain = &#8220;rkbexplorer.com&#8221;;</div>
<div>$sub_domain = $argv[1];</div>
<div>$domain = $sub_domain.&#8221;.&#8221;.$base_domain;</div>
<div>$outfile = &#8220;../$domain/sitemap.xml&#8221;;</div>
<div>$file = fopen($outfile, &#8220;w&#8221;);</div>
<div>$slicing = &#8220;subject-object&#8221;;</div>
<div>$name = &#8220;&#8221;; if (file_exists(&#8221;../$domain/about/name.txt&#8221;)) $name = trim(entities2accents(file_get_contents(&#8221;../$domain/about/name.txt&#8221;)));</div>
<div>$typical = &#8220;&#8221;; if (file_exists(&#8221;../$domain/about/typical.txt&#8221;)) $typical = trim(file_get_contents(&#8221;../$domain/about/typical.txt&#8221;));</div>
<div>$updated = &#8220;&#8221;; exec(&#8221;/var/www/vhosts/wildcard.rkbexplorer.com/repositories/tools/rkb-utils last-update-w3c &#8220;.$sub_domain, $updated);</div>
<div>$changefreq = &#8220;monthly&#8221;; if (file_exists(&#8221;../$domain/about/changefreq.txt&#8221;)) $name = trim(file_get_contents(&#8221;../$domain/about/changefreq.txt&#8221;));</div>
<div>fwrite($file,&#8221;&lt;?xml version=\&#8221;1.0\&#8221; encoding=\&#8221;UTF-8\&#8221;?&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;&lt;urlset xmlns=\&#8221;http://www.sitemaps.org/schemas/sitemap/0.9\&#8221;\n&#8221;);</div>
<div>fwrite($file,&#8221;        xmlns:sc=\&#8221;http://sw.deri.org/2007/07/sitemapextension/scschema.xsd\&#8221;&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;  &lt;sc:dataset&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;    &lt;sc:linkedDataPrefix slicing=\&#8221;$slicing\&#8221;&gt;http://$domain/id/&lt;/sc:linkedDataPrefix&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;    &lt;sc:sparqlEndpointLocation&gt;http://$domain/sparql/&lt;/sc:sparqlEndpointLocation&gt;\n&#8221;);</div>
<div>$models = opendir(&#8221;../$domain/models&#8221;);</div>
<div>while (false !== ($model = readdir($models))) {</div>
<div>if (preg_match(&#8217;/\.rdf$/&#8217;, $model) || preg_match(&#8217;/\.ttl$/&#8217;, $model) || preg_match(&#8217;/\.n3$/&#8217;, $model) || preg_match(&#8217;/\.turtle$/&#8217;, $model) || preg_match(&#8217;/\.ntriples$/&#8217;, $model))</div>
<div>fwrite($file,&#8221;    &lt;sc:dataDumpLocation&gt;http://$domain/models/$model&lt;/sc:dataDumpLocation&gt;\n&#8221;);</div>
<div>};</div>
<div>closedir($models);</div>
<div>fwrite($file,&#8221;    &lt;sc:datasetURI&gt;http://$domain/&lt;/sc:datasetURI&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;    &lt;sc:datasetURI&gt;http://$domain/id/void&lt;/sc:datasetURI&gt;\n&#8221;);</div>
<div>if ($name != &#8220;&#8221;) fwrite($file,&#8221;    &lt;sc:datasetLabel&gt;$name RDF dataset from RKBExplorer.com&lt;/sc:datasetLabel&gt;\n&#8221;);</div>
<div>if ($typical != &#8220;&#8221;) fwrite($file,&#8221;    &lt;sc:sampleURI&gt;$typical&lt;/sc:sampleURI&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;    &lt;lastmod&gt;$updated[0]&lt;/lastmod&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;    &lt;changefreq&gt;$changefreq&lt;/changefreq&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;  &lt;/sc:dataset&gt;\n&#8221;);</div>
<div>fwrite($file,&#8221;&lt;/urlset&gt;\n&#8221;);</div>
<div>fclose($file);</div>
<div>$sitemap_url = &#8220;http://$domain/sitemap.xml&#8221;;</div>
<div>// Submit to SWSE</div>
<div>print &#8220;Submitting $domain to SWSE: \n&#8221;;</div>
<div>$ch = curl_init(&#8221;http://swse.deri.org/ping?sitemap=$sitemap_url&#8221;);</div>
<div>curl_exec($ch);</div>
<div>print &#8220;\n&#8221;;</div>
<div>curl_close($ch);</div>
<div>// Submit POST request to Sindice</div>
<div>print &#8220;Submitting $domain to Sindice: \n&#8221;;</div>
<div>$data = &#8220;url=&#8221;.urlencode($sitemap_url);;</div>
<div>$fp = fsockopen(&#8221;sindice.com&#8221;, 80);</div>
<div>fputs($fp, &#8220;POST /api/v1/sitemap HTTP/1.0\r\n&#8221;);</div>
<div>fputs($fp, &#8220;Host: sindice.com\r\n&#8221;);</div>
<div>fputs($fp, &#8220;Content-type: application/x-www-form-urlencoded\r\n&#8221;);</div>
<div>fputs($fp, &#8220;Content-length: &#8220;. strlen($data) .&#8221;\r\n\r\n&#8221;);</div>
<div>fputs($fp, $data);</div>
<div>//  read result back from the sindice server</div>
<div>$result = &#8221;;</div>
<div>while(!feof($fp)) $result .= fgets($fp, 128);</div>
<div>fclose($fp);</div>
<div>
<div></div>
<div>//  report server resonse</div>
<div>$status = substr($result, 0, strpos($result, &#8220;\n&#8221;));</div>
<div>preg_match(&#8217;@&lt;h1&gt;(.*?)&lt;/h1&gt;@&#8217;, $result, $matches);</div>
<div>print &#8220;\t$status\n\t{$matches[1]}\n\n&#8221;;</div>
<div></div>
<div>//  submit to Ping the Semantic Web</div>
<div>passthru(&#8221;./ptsw.py $sitemap_url&#8221;);</div>
<div></div>
<div>?&gt;</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.rkbexplorer.com/blog/?feed=rss2&amp;p=25</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to do the 303 redirect easily?</title>
		<link>http://www.rkbexplorer.com/blog/?p=11</link>
		<comments>http://www.rkbexplorer.com/blog/?p=11#comments</comments>
		<pubDate>Sun, 05 Jul 2009 15:12:46 +0000</pubDate>
		<dc:creator>hg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rkbexplorer.com/blog/?p=11</guid>
		<description><![CDATA[

Create a web-accessible directory with all your .rdf, .ttl, .ntriples and .html files in it. 
Copy lodpub.php and path.php into it. 
Access path.php from your web server. 
Follow the instruction to paste that text into .htaccess 
You can remove path.php, it was only there to help you get the .htaccess right.


lodpub.php
path.php

]]></description>
			<content:encoded><![CDATA[<p><!--StartFragment--></p>
<ol>
<li><span style="font-family: Calibri, Verdana, Helvetica, Arial;"><span style="font-size: 11pt;">Create a web-accessible directory with all your .rdf, .ttl, .ntriples and .html files in it. </span></span></li>
<li><span style="font-family: Calibri, Verdana, Helvetica, Arial;"><span style="font-size: 11pt;">Copy lodpub.php and path.php into it. </span></span></li>
<li><span style="font-family: Calibri, Verdana, Helvetica, Arial;"><span style="font-size: 11pt;">Access path.php from your web server. </span></span></li>
<li><span style="font-family: Calibri, Verdana, Helvetica, Arial;"><span style="font-size: 11pt;">Follow the instruction to paste that text into .htaccess </span></span></li>
<li><span style="font-family: Calibri, Verdana, Helvetica, Arial;"><span style="font-size: 11pt;">You can remove path.php, it was only there to help you get the .htaccess right.<br />
</span></span></li>
</ol>
<p><a href="http://www.rkbexplorer.com/blog/wp-content/uploads/2009/07/lodpub.php.gz">lodpub.php</a><br />
<a href="http://www.rkbexplorer.com/blog/wp-content/uploads/2009/07/path.php.gz">path.php</a></p>
<p><!--EndFragment--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rkbexplorer.com/blog/?feed=rss2&amp;p=11</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What we do to work out co-reference.</title>
		<link>http://www.rkbexplorer.com/blog/?p=6</link>
		<comments>http://www.rkbexplorer.com/blog/?p=6#comments</comments>
		<pubDate>Sat, 04 Jul 2009 22:10:28 +0000</pubDate>
		<dc:creator>hg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rkbexplorer.com/blog/?p=6</guid>
		<description><![CDATA[François Scharffe &#60;francois.scharffe@inria.fr&#62; asked us, so this is what I responded.
Here is a description of what we do in English.
The implementation may vary, but this is the rough idea.
It is really quite simple, I guess.

We are primarily concerned with organisations, people, publications, projects, research areas.
Research areas we have done by hand against relatively fixed ontologies.
Organisations [...]]]></description>
			<content:encoded><![CDATA[<p>François Scharffe &lt;francois.scharffe@inria.fr&gt; asked us, so this is what I responded.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">Here is a description of what we do in English.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">The implementation may vary, but this is the rough idea.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">It is really quite simple, I guess.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px;">
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">We are primarily concerned with organisations, people, publications, projects, research areas.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">Research areas we have done by hand against relatively fixed ontologies.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">Organisations and projects work in similar ways top publications, but I will do the publications bit.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px;">
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">1) To start with, there is absolutely no linkage, so we do a &#8220;coldstart&#8221;, and this is done on paper titles only.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">Extracting all the strings/uri pairs from all the KBs, we map the title to lower case strings of the alphanumerics; if the result is sufficiently long (&gt;=20) and identical, then the uris are considered the same (&#8221;smushed&#8221;).</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px;">
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">2) Now we can work on authors (string matching out of context would be too liberal). For the same (co-reffed) papers, the authors are fuzzy matched (cross product).</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px;">
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">3) For each author string name, we find the co-authorship sets for each paper (we do this by starting with a each unique name, to make it easier).</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">If there is an overlap of two or more co-author strings between different sets, then these authors are smushed.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">The matching of names for this is not fuzzy, but does match name variants, as identified by previous co-ref work on the URI for the author name we are looking at.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">(Another way of looking at it is that if we find three authors of the same name as paper authors, we smush them.)</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">Also, if there are exactly two author with similar names, then we smush them.</p>
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px;">
<p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier;">4) The rest is dynamic. As users browse the data at rkbexplorer.com, we compute networks (communities of practice of closely related entities by domain-specific weighted RDF predicate). If strings are similar in the network, then they are smushed.</p>
<div><span style="font-family: Courier, 'Times New Roman', 'Bitstream Charter', Times, fantasy; font-size: medium;"><span style="line-height: normal;"><br />
</span></span></div>
]]></content:encoded>
			<wfw:commentRss>http://www.rkbexplorer.com/blog/?feed=rss2&amp;p=6</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New RKB Blog</title>
		<link>http://www.rkbexplorer.com/blog/?p=3</link>
		<comments>http://www.rkbexplorer.com/blog/?p=3#comments</comments>
		<pubDate>Sat, 04 Jul 2009 22:01:10 +0000</pubDate>
		<dc:creator>hg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.rkbexplorer.com/blog/?p=3</guid>
		<description><![CDATA[This sort of should be the first posting, I guess.
A new blog where we can out stuff about RKB, sameAs.org, etc.
]]></description>
			<content:encoded><![CDATA[<p>This sort of should be the first posting, I guess.</p>
<p>A new blog where we can out stuff about <a href="http://www.rkbexplorer.com/">RKB</a>, <a href="http://sameas.org">sameAs.org</a>, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rkbexplorer.com/blog/?feed=rss2&amp;p=3</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
