Semantic Queries

Clerk can be very helpful when exploring any kind of data, including the sorts of things for which we might turn to the Semantic Web. To give a sense of what that's like, this notebook gives some examples of querying WikiData for facts about the world.

First, we bring in Clerk, the Clerk viewer helpers, Mundaneum (a WikiData wrapper that uses a Datomic-like syntax), and Arrowic (to draw graphviz-style box-and-arrow graphs).

(ns semantic
(:require [clojure.string :as str]
[nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]
[applied-science.mundaneum.properties :refer [wdt]]
[applied-science.mundaneum.query :refer [describe entity label query]]
[arrowic.core :as arr]))

Now we can ask questions, like "what is James Clerk Maxwell famous for having invented or discovered?"

(query `{:select [?what]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:what :wd/Q1080745}]

The WikiData internal ID :wd/Q1080745 doesn't immediately mean much to a human, so we'll try again by appending Label to the end of the ?what logic variable so we can see a human readable label that item:

(query `{:select [?whatLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:whatLabel "
unified field theory"}]

Ah, better. 😊 This ceremony is required because WikiData uses a language-neutral data representation internally, leaving us with an extra step to get readable results. This can be a little annoying, but it does have benefits. For example, we can ask for an entity's label in every language for which it has been specified in WikiData:

(query `{:select [?what ?label]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]
[?what :rdfs/label ?label]]})
[{:label "
نظرية الحقل الموحد"
:what :wd/Q1080745}
{:label "
teoría del campu unificáu"
:what :wd/Q1080745}
{:label "
Адзіная тэорыя поля"
:what :wd/Q1080745}
{:label "
Единна теория на полето"
:what :wd/Q1080745}
{:label "
damkan an dachenn unvanet"
:what :wd/Q1080745}
{:label "
teoria de camp unificat"
:what :wd/Q1080745}
{:label "
بیردۆزی بواری یەکگرتوو"
:what :wd/Q1080745}
{:label "
damcaniaeth maes cyffredinol"
:what :wd/Q1080745}
{:label "
Samlet feltteori"
:what :wd/Q1080745}
{:label "
Einheitliche Feldtheorie"
:what :wd/Q1080745}
{:label "
unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
unuiĝinta kampoteorio"
:what :wd/Q1080745}
{:label "
teoría del campo unificado"
:what :wd/Q1080745}
{:label "
eremu-teoria bateratu"
:what :wd/Q1080745}
{:label "
نظریه میدان واحد"
:what :wd/Q1080745}
{:label "
Théorie du champ unifié"
:what :wd/Q1080745}
{:label "
Teoiric an aonréimse"
:what :wd/Q1080745}
{:label "
תאוריית השדה המאוחד"
:what :wd/Q1080745}
19 more elided]

One of the nice things about data encoded as a knowledge graph is that we can ask questions that are difficult to pose any other way, then receive answers as structured data for further processing.

Here, for instance, is a query asking for things discovered or invented by anyone who has as one of their occupations "physicist":

(def inventions-and-discoveries
(->> (query `{:select [?whatLabel ?whomLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ?whom]
[?whom ~(wdt :occupation) ~(entity "physicist")]]
:limit 500})))
[{:whatLabel "
World Wide Web"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Hypertext Transfer Protocol"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
HyperText Markup Language"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Semantic Web"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
WorldWideWeb"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Io"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Callisto"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Europa"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Ganymede"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Trapezium Cluster"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Galilean transformation"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
solar variation"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Square-cube law"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Q1535340"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Galilean micrometer"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
tholin"
:whomLabel "
Carl Sagan"}
{:whatLabel "
carbon chauvinism"
:whomLabel "
Carl Sagan"}
{:whatLabel "
Encyclopedia Galactica"
:whomLabel "
Carl Sagan"}
{:whatLabel "
fluorine"
:whomLabel "
André-Marie Ampère"}
{:whatLabel "
Ampère's force law"
:whomLabel "
André-Marie Ampère"}
480 more elided]

Tabular data

It's great that we can retrieve this information as a sequence of maps that we can explore interactively in Clerk, but sometimes it's more pleasant to display data organized in a table view:

(clerk/table inventions-and-discoveries)
:whatLabel:whomLabel
World Wide WebTim Berners-Lee
Hypertext Transfer ProtocolTim Berners-Lee
HyperText Markup LanguageTim Berners-Lee
Semantic WebTim Berners-Lee
WorldWideWebTim Berners-Lee
IoGalileo Galilei
CallistoGalileo Galilei
EuropaGalileo Galilei
GanymedeGalileo Galilei
Trapezium ClusterGalileo Galilei
Galilean transformationGalileo Galilei
solar variationGalileo Galilei
Square-cube lawGalileo Galilei
Q1535340Galileo Galilei
Galilean micrometerGalileo Galilei
tholinCarl Sagan
carbon chauvinismCarl Sagan
Encyclopedia GalacticaCarl Sagan
fluorineAndré-Marie Ampère
Ampère's force lawAndré-Marie Ampère
480 more elided

Once we see how a given table looks, we might decide that it would be better if, for example, these inventions were grouped by inventor. This is just the sort of thing that Clojure sequence functions can help us do:

(clerk/table (->> inventions-and-discoveries
(group-by :whomLabel)
(mapv (fn [[whom whats]] [whom (apply str (interpose " ; " (map :whatLabel whats)))]))))
Julius PlückerPlücker surface
Paul DiracDirac equation ; Dirac spinor ; Dirac large numbers hypothesis
Hans GeigerGeiger counter
Karl Weierstraß(ε, δ)-definition of limit ; Q111354260
Friedrich PaschenPaschen series
Pierre Curiepolonium ; radium ; piezoelectricity ; Curie temperature
Alexander Lippischvariometer
Friedrich KohlrauschKohlrausch bridge
Johannes StarkStark effect
Karl SchwarzschildSchwarzschild telescope ; Schwarzschild effect
Johann Christian Poggendorffpotentiometer
J. J. Thomsonelectron ; electromagnetic waveguide ; discovery of electrons
Hendrik LorentzLorentz transformation
Hans Christian Ørstedaluminium ; Oersted's law
Leonardo da VinciLeonardo's robot ; sfumato ; Coulomb friction ; Leonardo's crossbow ; Leonardo's
45 more elided
Otto von GuerickeMagdeburg hemispheres ; baroscope
Anaximanderspontaneous generation
Pierre-Simon Laplacenebular hypothesis
Albert Einsteingeneral relativity ; special relativity ; mass–energy equivalence ; theory of re
157 more elided
David Hilbertepsilon calculus ; Hilbert's nineteenth problem
67 more elided

Geospatial data

Some data are more naturally viewed in other ways, of course. In this example we find every instance of any subclass of "human settlement" (village, town, city, and so on) in Germany that has a German language placename ending in -ow or -itz, both of which indicate that it was originally named by speakers of a Slavic language.

(def slavic-place-names
(->> `{:select *
:where [{?ort {(cat ~(wdt :instance-of) (* ~(wdt :subclass-of))) #{~(entity "human settlement")}
~(wdt :country) #{~(entity "Germany")}
:rdfs/label #{?name}
~(wdt :coordinate-location) #{?lonlat}}}
[:filter (= (lang ?name) "de")]
[:filter (regex ?name "(ow|itz)$")]]
:limit 1000}
query
;; cleanup lon-lat formatting for map plot!
(mapv #(let [[lon lat] (-> %
:lonlat
:value
(str/replace #"[^0-9 \.]" "")
(str/split #" "))]
{:name (:name %) :latitude lat :longitude lon}))))
[{:latitude "
53.98055556"
:longitude "
11.25138889"
:name "
Erprobungsstelle Tarnewitz"}
{:latitude "
51.5"
:longitude "
12.6833"
:name "
Flugplatz Mörtitz"}
{:latitude "
52.474444444"
:longitude "
13.138055555"
:name "
Flugplatz Gatow"}
{:latitude "
52.5128"
:longitude "
13.0515"
:name "
Flugplatz Döberitz"}
{:latitude "
49.648080555"
:longitude "
8.742275"
:name "
Lauten-Weschnitz"}
{:latitude "
51.1494"
:longitude "
13.7972"
:name "
Gomlitz"}
{:latitude "
53.49538"
:longitude "
13.37613"
:name "
Dewitz"}
{:latitude "
53.7089"
:longitude "
13.1319"
:name "
Burgwall Schossow"}
{:latitude "
53.4433"
:longitude "
12.3197"
:name "
Burgwall Zislow"}
{:latitude "
50.216667"
:longitude "
11.694722"
:name "
Gemarkung Wüstenselbitz"}
{:latitude "
53.279785"
:longitude "
11.553129"
:name "
Grabow"}
{:latitude "
53.8061"
:longitude "
13.2828"
:name "
Burgwall Groß Below"}
{:latitude "
53.8164"
:longitude "
13.2572"
:name "
Burgwall Hohenbüssow"}
{:latitude "
53.7594"
:longitude "
13.3794"
:name "
Burgwall Janow"}
{:latitude "
53.7967"
:longitude "
13.3056"
:name "
Burgwall Klempenow"}
{:latitude "
49.6588"
:longitude "
8.84065"
:name "
Weschnitz"}
{:latitude "
51.0175"
:longitude "
13.6589"
:name "
Niederpesterwitz"}
{:latitude "
51.136944444"
:longitude "
14.958611111"
:name "
Biesnitz"}
{:latitude "
51.0714"
:longitude "
13.6708"
:name "
Briesnitz"}
{:latitude "
50.8125"
:longitude "
12.9139"
:name "
Altchemnitz"}
980 more elided]

The :coordinate-location in this query is the longitude/latitude position of each of these places in a somewhat unfortunate string fomat. The mapv at the end converts these lonlat strings into key/value pairs so Vega can plot the points on a map. This gives us a very clear picture of which parts of Germany were Slavic prior to the Germanic migrations:

(v/vl {:width 650 :height 650
:config {:projection {:type "mercator" :center [10.4515 51.1657]}}
:layer [{:data {:url "https://raw.githubusercontent.com/AliceWi/TopoJSON-Germany/master/germany.json"
:format {:type "topojson" :feature "states"}}
:mark {:type "geoshape" :fill "lightgray" :stroke "white"}}
{:encoding {:longitude {:field "longitude" :type "quantitative"}
:latitude {:field "latitude" :type "quantitative"}}
:mark "circle"
:data {:values slavic-place-names}}]})
Loading...

Sometimes the data needs a more customized view. Happily, we can write arbitrary hiccup to be rendered in Clerk. We'll use this query to fetch a list of different species of Apodiformes (swifts and hummingbirds), returning the name in English and Japanese, an image of the bird itself, and map of that bird's home range for each one.

(->> (query `{:select-distinct [?englishName ?japaneseName ?pic ?range]
:where [[?item (* ~(wdt :parent-taxon)) ~(entity "Apodiformes")]
[?item ~(wdt :taxon-rank) ~(entity "species")]
[?item :rdfs/label ?englishName]
[?item :rdfs/label ?japaneseName]
[?item ~(wdt :image) ?pic]
[?item ~(wdt :taxon-range-map-image) ?range]
[:filter (= (lang ?englishName) "en")]
[:filter (= (lang ?japaneseName) "ja")]]
:limit 9})
(mapv #(vector :tr
[:td.w-32 (:englishName %)]
[:td.w-32 (:japaneseName %)]
[:td [:img.w-80 {:src (:pic %)}]]
[:td [:img.w-80 {:src (:range %)}]]))
(into [:table
[:tr
[:th "English"]
[:th "Japanese"]
[:th "Photo"]
[:th "Range"]]])
clerk/html)
EnglishJapanesePhotoRange
Bee Hummingbirdマメハチドリ
Bee Hummingbirdマメハチドリ
Sapphire-bellied Hummingbirdルリハラハチドリ
Blue-fronted Lancebillスミレビタイヤリハチドリ
Broad-tailed Hummingbirdフトオハチドリ
Blue-capped Pufflegズアオワタアシハチドリ
Broad-billed Hummingbirdアカハシハチドリ
Band-tailed Barbthroatオビオヒゲハチドリ
Hoary Pufflegハイイロアシゲハチドリ

Network diagrams

Another useful technique when dealing with semantic or graph-shaped data is to visualize the results as a tree. Here we gather all the languages influenced by Lisp or by languages influenced by Lisp (a transitive query across the graph), and visualize them in a big network diagram.

Because Clerk's html viewer also understands SVGs, we can just plug in an existing graph visualization library and send the output to Clerk.

The graph is really huge, so you'll need to scroll around a bit to see all the languages.

(-> (clerk/html
(let [data (query `{:select [?itemLabel ?influencedByLabel]
:where [[?item (* ~(wdt :influenced-by)) ~(entity "Lisp")]
[?item ~(wdt :influenced-by) ?influencedBy]
[?influencedBy (* ~(wdt :influenced-by)) ~(entity "Lisp")]]})]
(arr/as-svg
(arr/with-graph (arr/create-graph)
(let [vertex (->> (mapcat (juxt :itemLabel :influencedByLabel) data)
distinct
(reduce #(assoc %1 %2 (arr/insert-vertex! %2)) {}))]
(doseq [edge data]
(when (:influencedByLabel edge)
(arr/insert-edge! (vertex (:influencedByLabel edge))
(vertex (:itemLabel edge))))))))))
(assoc :nextjournal/width :full))
ChaosRubyRingCrystalSwiftElixirMirahKotlinRakuCoffeeScriptRustGroovySnap4ArduinoSchemeQalbOaklispExtemporePicoHopJoySnap!ImpromptuTeaCommon LispLuaScalaECMAScriptJavaScriptLogoKojoRobicProcessingAgentSheetsREBOLMicrosoft Small BasicScratchSqueakSmalltalkPostScriptCitrineMagikGemstoneF-ScriptEmeraldWolfram LanguageLassoObject REXXStrongtalkSelfClaireObject PascalDartPharoObjective-CAdaGoJavaIoFalconErlangCeylonGScriptCrocActionScriptScriptForthAMPLEHierarchical Music Specification LanguageTRACRPLBefungeFactorSilqDQoreValaOzF#FortressHackRedAustralCarbonZigMojoVBallerinaWhileyC#F*Q#LiveScripturbiscriptOocCLUArgusC++PythonSubLISLISPEuLispEmacs LispEmbeddable Common LispDylanAspectJClojureBooTypeScriptSquirrelAliceNewtonScriptSimple Yet Powerful Scripting LanguageJSONCGOLMLispZhPyChinesePythonLionessCubReScriptBunjsxJUDO ERPJS++Pascal Scriptasm.jsKinetic Rule LanguageFJAXGoogle Apps ScriptMonkey XQMLBeanShellCurlNode.jsJScriptHaxePHPBeefGosuNimChapelPascalABC.NETPowerShellJsishTclItkIncr TclIdrisAgdaDragonPyretGDScriptMicroPythonMonty KarelHyMyHDLGuido van RobotConvergeSuneidoGenieCythonCobraFeriteHaskellPureScriptjqUr/WebPureFregeElmA♯QiCurryAldorMercuryPerlPerl Data LanguageSleepConcurrent CParaSailPL/pgSQLXLVerilogEiffelVHDLPL/SQLSeed7TOMObjective-JMicro Lua DSSenseTurbo Pascal OOPDelphiXProfanMalbolgeAngularJSCatLisaacHotSpotSAM76WebAssemblyGraphQL EditorGraphQLSolid.JSCilk++LunaPowerscriptGNU EARS++Karel++X++URBIGame Maker LanguageATSUnreal EngineAngelScriptKarel J. RobotJoin JavaGremlinDeeselPnutsOptimJAteji PXEX10GambasCommon Intermediate LanguageGnolangOdinLispPixiePliantNialLittle bCOWSELUCBLogoMocklispGNU SmalltalkNyquistCLIPSSystemVerilogDiscipulusCilk Plus

I hope this gives you some ideas about things you might want to try!