Semantic Queries

Clerk can be very helpful when exploring any kind of data, including the sorts of things for which we might turn to the Semantic Web. To give a sense of what that's like, this notebook gives some examples of querying WikiData for facts about the world.

First, we bring in Clerk, the Clerk viewer helpers, Mundaneum (a WikiData wrapper that uses a Datomic-like syntax), and Arrowic (to draw graphviz-style box-and-arrow graphs).

(ns semantic
(:require [clojure.string :as str]
[nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]
[applied-science.mundaneum.properties :refer [wdt]]
[applied-science.mundaneum.query :refer [describe entity label query]]
[arrowic.core :as arr]))

Now we can ask questions, like "what is James Clerk Maxwell famous for having invented or discovered?"

(query `{:select [?what]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:what :wd/Q1080745}]

The WikiData internal ID :wd/Q1080745 doesn't immediately mean much to a human, so we'll try again by appending Label to the end of the ?what logic variable so we can see a human readable label that item:

(query `{:select [?whatLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:whatLabel "
unified field theory"
}]

Ah, better. 😊 This ceremony is required because WikiData uses a language-neutral data representation internally, leaving us with an extra step to get readable results. This can be a little annoying, but it does have benefits. For example, we can ask for an entity's label in every language for which it has been specified in WikiData:

(query `{:select [?what ?label]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]
[?what :rdfs/label ?label]]})
[{:label "
نظرية الحقل الموحد"
:what :wd/Q1080745}
{:label "
teoría del campu unificáu"
:what :wd/Q1080745}
{:label "
Адзіная тэорыя поля"
:what :wd/Q1080745}
{:label "
Единна теория на полето"
:what :wd/Q1080745}
{:label "
damkan an dachenn unvanet"
:what :wd/Q1080745}
{:label "
teoria de camp unificat"
:what :wd/Q1080745}
{:label "
بیردۆزی بواری یەکگرتوو"
:what :wd/Q1080745}
{:label "
damcaniaeth maes cyffredinol"
:what :wd/Q1080745}
{:label "
Samlet feltteori"
:what :wd/Q1080745}
{:label "
Einheitliche Feldtheorie"
:what :wd/Q1080745}
{:label "
unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
unuiĝinta kampoteorio"
:what :wd/Q1080745}
{:label "
teoría del campo unificado"
:what :wd/Q1080745}
{:label "
eremu-teoria bateratu"
:what :wd/Q1080745}
{:label "
نظریه میدان واحد"
:what :wd/Q1080745}
{:label "
Théorie du champ unifié"
:what :wd/Q1080745}
{:label "
Teoiric an aonréimse"
:what :wd/Q1080745}
{:label "
תאוריית השדה המאוחד"
:what :wd/Q1080745}
19 more elided]

One of the nice things about data encoded as a knowledge graph is that we can ask questions that are difficult to pose any other way, then receive answers as structured data for further processing.

Here, for instance, is a query asking for things discovered or invented by anyone who has as one of their occupations "physicist":

(def inventions-and-discoveries
(->> (query `{:select [?whatLabel ?whomLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ?whom]
[?whom ~(wdt :occupation) ~(entity "physicist")]]
:limit 500})))
[{:whatLabel "
World Wide Web"
:whomLabel "
Tim Berners-Lee"
}
{:whatLabel "
HTTP"
:whomLabel "
Tim Berners-Lee"
}
{:whatLabel "
HTML"
:whomLabel "
Tim Berners-Lee"
}
{:whatLabel "
Semantic Web"
:whomLabel "
Tim Berners-Lee"
}
{:whatLabel "
WorldWideWeb"
:whomLabel "
Tim Berners-Lee"
}
{:whatLabel "
Io"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Callisto"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Europa"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Ganymede"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Trapezium Cluster"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Galilean transformation"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
solar variation"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
square-cube law"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Q1535340"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Galileo's escapement"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
Galilean micrometer"
:whomLabel "
Galileo Galilei"
}
{:whatLabel "
tholin"
:whomLabel "
Carl Sagan"
}
{:whatLabel "
Carbon chauvinism"
:whomLabel "
Carl Sagan"
}
{:whatLabel "
Encyclopedia Galactica"
:whomLabel "
Carl Sagan"
}
{:whatLabel "
fluorine"
:whomLabel "
André-Marie Ampère"
}
480 more elided]

Tabular data

It's great that we can retrieve this information as a sequence of maps that we can explore interactively in Clerk, but sometimes it's more pleasant to display data organized in a table view:

(clerk/table inventions-and-discoveries)
:whatLabel
:whomLabel
World Wide WebTim Berners-Lee
HTTPTim Berners-Lee
HTMLTim Berners-Lee
Semantic WebTim Berners-Lee
WorldWideWebTim Berners-Lee
IoGalileo Galilei
CallistoGalileo Galilei
EuropaGalileo Galilei
GanymedeGalileo Galilei
Trapezium ClusterGalileo Galilei
Galilean transformationGalileo Galilei
solar variationGalileo Galilei
square-cube lawGalileo Galilei
Q1535340Galileo Galilei
Galileo's escapementGalileo Galilei
Galilean micrometerGalileo Galilei
tholinCarl Sagan
Carbon chauvinismCarl Sagan
Encyclopedia GalacticaCarl Sagan
fluorineAndré-Marie Ampère
480 more elided

Once we see how a given table looks, we might decide that it would be better if, for example, these inventions were grouped by inventor. This is just the sort of thing that Clojure sequence functions can help us do:

(clerk/table (->> inventions-and-discoveries
(group-by :whomLabel)
(mapv (fn [[whom whats]] [whom (apply str (interpose " ; " (map :whatLabel whats)))]))))
Enrico FermiMonte Carlo method ; Fermi–Pasta–Ulam–Tsingou problem ; Fermi resonance ; Metrop
58 more elided
Al-Birunipycnometer
Leonardo da VinciLeonardo's robot ; sfumato ; Coulomb friction ; Leonardo's crossbow ; Leonardo's
59 more elided
Albert Einsteingeneral relativity ; special relativity ; mass–energy equivalence ; theory of re
253 more elided
Luigi Galvanigalvanic cell ; galvanotherapy
Carl Sagantholin ; Carbon chauvinism ; Encyclopedia Galactica
André-Marie Ampèrefluorine ; Ampère's force law ; Q3733125
Blaise Pascalmathematical induction ; Pascal's Wager ; Pascal's calculator ; Pascal's barrel
Galileo GalileiIo ; Callisto ; Europa ; Ganymede ; Trapezium Cluster ; Galilean transformation
91 more elided
Carl Friedrich Gaussleast squares method ; discrete logarithm ; Gauss's principle of least constrain
212 more elided
Edward TellerMetropolis–Hastings algorithm
Pierre BouguerBouguer photometer
ArchimedesArchimedean spiral ; claw of Archimedes ; Burning glass
Max PlanckPlanck constant
Dmitri Mendeleevperiodic table ; Mendeleev's predicted elements ; periodic trends
Nikola Teslaremote control ; three-phase electric power ; Tesla coil ; Tesla valve ; radio r
472 more elided
Marie Curiepolonium ; radium
Tim Berners-LeeWorld Wide Web ; HTTP ; HTML ; Semantic Web ; WorldWideWeb
Augustin-Louis Cauchyroot test ; Burnside's lemma ; infinitesimal deformation theory
Georges LemaîtreBig Bang ; Hubble–Lemaître law
20 more elided

Geospatial data

Some data are more naturally viewed in other ways, of course. In this example we find every instance of any subclass of "human settlement" (village, town, city, and so on) in Germany that has a German language placename ending in -ow or -itz, both of which indicate that it was originally named by speakers of a Slavic language.

(def slavic-place-names
(->> `{:select *
:where [{?ort {(cat ~(wdt :instance-of) (* ~(wdt :subclass-of))) #{~(entity "human settlement")}
~(wdt :country) #{~(entity "Germany")}
:rdfs/label #{?name}
~(wdt :coordinate-location) #{?lonlat}}}
[:filter (= (lang ?name) "de")]
[:filter (regex ?name "(ow|itz)$")]]
:limit 1000}
query
;; cleanup lon-lat formatting for map plot!
(mapv #(let [[lon lat] (-> %
:lonlat
:value
(str/replace #"[^0-9 \.]" "")
(str/split #" "))]
{:name (:name %) :latitude lat :longitude lon}))))
[{:latitude "
50.993055555"
:longitude "
13.716666666"
:name "
Bannewitz"
}
{:latitude "
51.022777777"
:longitude "
13.7425"
:name "
Zschertnitz"
}
{:latitude "
54.516388888"
:longitude "
13.641111111"
:name "
Sassnitz"
}
{:latitude "
51.2475"
:longitude "
14.2275"
:name "
Caseritz"
}
{:latitude "
51.2469"
:longitude "
14.1819"
:name "
Miltitz"
}
{:latitude "
51.3556"
:longitude "
14.4333"
:name "
Driewitz"
}
{:latitude "
51.233333333"
:longitude "
14.691666666"
:name "
Jerchwitz"
}
{:latitude "
51.7258"
:longitude "
13.9558"
:name "
Werchow"
}
{:latitude "
51.6214"
:longitude "
14.1061"
:name "
Lubochow"
}
{:latitude "
51.3128"
:longitude "
14.1989"
:name "
Neu-Schmerlitz"
}
{:latitude "
51.100533333"
:longitude "
14.423877777"
:name "
Kleinpostwitz"
}
{:latitude "
51.511111111"
:longitude "
14.498611111"
:name "
Mulkwitz"
}
{:latitude "
51.089444444"
:longitude "
14.156666666"
:name "
Großdrebnitz"
}
{:latitude "
51.2864"
:longitude "
14.2508"
:name "
Naußlitz"
}
{:latitude "
51.3231"
:longitude "
14.235"
:name "
Cunnewitz"
}
{:latitude "
51.2339"
:longitude "
14.2628"
:name "
Prautitz"
}
{:latitude "
51.8872"
:longitude "
13.9039"
:name "
Ragow"
}
{:latitude "
51.79291667"
:longitude "
14.43713889"
:name "
Groß Lieskow"
}
{:latitude "
51.233333333"
:longitude "
14.640277777"
:name "
Groß Saubernitz"
}
{:latitude "
51.145"
:longitude "
14.7167"
:name "
Kleinradmeritz"
}
980 more elided]

The :coordinate-location in this query is the longitude/latitude position of each of these places in a somewhat unfortunate string fomat. The mapv at the end converts these lonlat strings into key/value pairs so Vega can plot the points on a map. This gives us a very clear picture of which parts of Germany were Slavic prior to the Germanic migrations:

(v/vl {:width 650 :height 650
:config {:projection {:type "mercator" :center [10.4515 51.1657]}}
:layer [{:data {:url "https://raw.githubusercontent.com/deldersveld/topojson/master/countries/germany/germany-regions.json"
:format {:type "topojson" :feature "DEU_adm2"}}
:mark {:type "geoshape" :fill "lightgray" :stroke "white"}}
{:encoding {:longitude {:field "longitude" :type "quantitative"}
:latitude {:field "latitude" :type "quantitative"}}
:mark "circle"
:data {:values slavic-place-names}}]})
Loading...

Sometimes the data needs a more customized view. Happily, we can write arbitrary hiccup to be rendered in Clerk. We'll use this query to fetch a list of different species of Apodiformes (swifts and hummingbirds), returning a name, image, and map of home range for each one.

(->> (query `{:select-distinct [?item ?itemLabel ?pic ?range]
:where [[?item (* ~(wdt :parent-taxon)) ~(entity "Apodiformes")]
[?item ~(wdt :taxon-rank) ~(entity "species")]
[?item :rdfs/label ?englishName]
[?item ~(wdt :image) ?pic]
[?item ~(wdt :taxon-range-map-image) ?range]
[:filter (= (lang ?englishName) "en")]]
:limit 11})
(mapv #(vector :tr
[:td.w-32 (:itemLabel %)]
[:td [:img.w-80 {:src (:pic %)}]]
[:td [:img.w-80 {:src (:range %)}]]))
(into [:table])
clerk/html)
Lucifer Sheartail
Lucifer Sheartail
Andean emerald
Ruby-throated Hummingbird
Ruby-throated Hummingbird
Spot-throated Hummingbird
Bronzy Inca
Coeligena helianthea
Blue-capped Hummingbird
Costa's Humming bird
Costa's Humming bird

Network diagrams

Another useful technique when dealing with semantic or graph-shaped data is to visualize the results as a tree. Here we gather all the languages influenced by Lisp or by languages influenced by Lisp (a transitive query across the graph), and visualize them in a big network diagram.

Because Clerk's html viewer also understands SVGs, we can just plug in an existing graph visualization library and send the output to Clerk.

The graph is really huge, so you'll need to scroll around a bit to see all the languages.

(-> (clerk/html
(let [data (query `{:select [?itemLabel ?influencedByLabel]
:where [[?item (* ~(wdt :influenced-by)) ~(entity "Lisp")]
[?item ~(wdt :influenced-by) ?influencedBy]
[?influencedBy (* ~(wdt :influenced-by)) ~(entity "Lisp")]]})]
(arr/as-svg
(arr/with-graph (arr/create-graph)
(let [vertex (->> (mapcat (juxt :itemLabel :influencedByLabel) data)
distinct
(reduce #(assoc %1 %2 (arr/insert-vertex! %2)) {}))]
(doseq [edge data]
(when (:influencedByLabel edge)
(arr/insert-edge! (vertex (:influencedByLabel edge))
(vertex (:itemLabel edge))))))))))
(assoc :nextjournal/width :full))
ChaosRubyRingCrystalSwiftElixirMirahKotlinRakuCoffeeScriptRustApache GroovySnap4ArduinoSchemeExtemporeQalbOaklispImpromptuPicoTeaHopJoySnap!Common LispScalaECMAScriptLuaJavaScriptKojoLogoRobicAgentSheetsREBOLProcessingMicrosoft Small BasicSqueakScratchSmalltalkMicro Lua DSDragonRedSquirrelCrocIoPostScriptCitrineMagikGemstoneF-ScriptEmeraldLassoObject REXXStrongtalkSelfObject PascalDartPharoObjective-CAdaGoJavaWolfram LanguageClaireFalconErlangCeylonGScriptActionScriptSenseScriptForthAMPLEHierarchical Music Specification LanguageTRACRPLBefungeFactorGleamOzF#FortressHackAustralMojoCarbonVZigWhileyRocBallerinaC#F*Q#LiveScripturbiscriptOocArgusCLUC++SubLISLISPEuLispEmacs LispAspectJEmbeddable Common LispDylanClojureTypeScriptAliceCatNewtonScriptSimple Yet Powerful Scripting LanguageJSONElmCGOLMLispLionessCubReScriptBunJSXJUDO ERPJS++asm.jsFJAXGoogle Apps ScriptMonkey XQMLNode.jsJScriptHaxeVarphi LanguagePascal ScriptKinetic Rule LanguageCurlBeanShellPHPBeefGosuPascalABC.NETChapelValaBooDNimPowerShellJsishTclItkIncr TclIdrisAgdaFeriteHaskellPureScriptjqUr/WebPureFregeA♯QiCurryAldorMercuryPerlQorePerl Data LanguageSleepPixieConcurrent CParaSailPL/pgSQLXLVerilogEiffelVHDLSeed7PL/SQLTOMObjective-JSystemVerilogSilqTurbo Pascal OOPXProfanDelphiMalbolgeAngularJSLisaacHotSpotSAM76WebAssemblyCilk PlusCilk++Solid JSGraphQLHolyCKarel++LunaPowerscriptGNU EX++URBIGame Maker LanguageATSUnreal EngineAngelScriptKarel J. RobotJoin JavaGremlinDeeselPnutsOptimJAteji PXEX10GambasJavaFX ScriptCommon Intermediate LanguageGnolangOdinHyLispPliantNialLittle bCOWSELUCBLogoMocklispGNU SmalltalkNyquistCLIPSDiscipulus

I hope this gives you some ideas about things you might want to try!