Semantic Queries

Clerk can be very helpful when exploring any kind of data, including the sorts of things for which we might turn to the Semantic Web. To give a sense of what that's like, this notebook gives some examples of querying WikiData for facts about the world.

First, we bring in Clerk, the Clerk viewer helpers, Mundaneum (a WikiData wrapper that uses a Datomic-like syntax), and Arrowic (to draw graphviz-style box-and-arrow graphs).

(ns semantic
(:require [clojure.string :as str]
[nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]
[applied-science.mundaneum.properties :refer [wdt]]
[applied-science.mundaneum.query :refer [describe entity label query]]
[arrowic.core :as arr]))

Now we can ask questions, like "what is James Clerk Maxwell famous for having invented or discovered?"

(query `{:select [?what]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:what :wd/Q1080745}]

The WikiData internal ID :wd/Q1080745 doesn't immediately mean much to a human, so we'll try again by appending Label to the end of the ?what logic variable so we can see a human readable label that item:

(query `{:select [?whatLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]]})
[{:whatLabel "
unified field theory"}]

Ah, better. 😊 This ceremony is required because WikiData uses a language-neutral data representation internally, leaving us with an extra step to get readable results. This can be a little annoying, but it does have benefits. For example, we can ask for an entity's label in every language for which it has been specified in WikiData:

(query `{:select [?what ?label]
:where [[?what ~(wdt :discoverer-or-inventor) ~(entity "James Clerk Maxwell")]
[?what :rdfs/label ?label]]})
[{:label "
نظرية الحقل الموحد"
:what :wd/Q1080745}
{:label "
teoría del campu unificáu"
:what :wd/Q1080745}
{:label "
Адзіная тэорыя поля"
:what :wd/Q1080745}
{:label "
Единна теория на полето"
:what :wd/Q1080745}
{:label "
damkan an dachenn unvanet"
:what :wd/Q1080745}
{:label "
teoria de camp unificat"
:what :wd/Q1080745}
{:label "
بیردۆزی بواری یەکگرتوو"
:what :wd/Q1080745}
{:label "
damcaniaeth maes cyffredinol"
:what :wd/Q1080745}
{:label "
Samlet feltteori"
:what :wd/Q1080745}
{:label "
Einheitliche Feldtheorie"
:what :wd/Q1080745}
{:label "
unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
Unified field theory"
:what :wd/Q1080745}
{:label "
unuiĝinta kampoteorio"
:what :wd/Q1080745}
{:label "
teoría del campo unificado"
:what :wd/Q1080745}
{:label "
eremu-teoria bateratu"
:what :wd/Q1080745}
{:label "
نظریه میدان واحد"
:what :wd/Q1080745}
{:label "
Théorie du champ unifié"
:what :wd/Q1080745}
{:label "
Teoiric an aonréimse"
:what :wd/Q1080745}
{:label "
תאוריית השדה המאוחד"
:what :wd/Q1080745}
19 more elided]

One of the nice things about data encoded as a knowledge graph is that we can ask questions that are difficult to pose any other way, then receive answers as structured data for further processing.

Here, for instance, is a query asking for things discovered or invented by anyone who has as one of their occupations "physicist":

(def inventions-and-discoveries
(->> (query `{:select [?whatLabel ?whomLabel]
:where [[?what ~(wdt :discoverer-or-inventor) ?whom]
[?whom ~(wdt :occupation) ~(entity "physicist")]]
:limit 500})))
[{:whatLabel "
World Wide Web"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Hypertext Transfer Protocol"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
HyperText Markup Language"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Semantic Web"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
WorldWideWeb"
:whomLabel "
Tim Berners-Lee"}
{:whatLabel "
Io"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Callisto"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Europa"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Ganymede"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Trapezium Cluster"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Galilean transformation"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
solar variation"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Square-cube law"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Q1535340"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
Galilean micrometer"
:whomLabel "
Galileo Galilei"}
{:whatLabel "
tholin"
:whomLabel "
Carl Sagan"}
{:whatLabel "
carbon chauvinism"
:whomLabel "
Carl Sagan"}
{:whatLabel "
Encyclopedia Galactica"
:whomLabel "
Carl Sagan"}
{:whatLabel "
fluorine"
:whomLabel "
André-Marie Ampère"}
{:whatLabel "
Ampère's force law"
:whomLabel "
André-Marie Ampère"}
480 more elided]

Tabular data

It's great that we can retrieve this information as a sequence of maps that we can explore interactively in Clerk, but sometimes it's more pleasant to display data organized in a table view:

(clerk/table inventions-and-discoveries)
:whatLabel
:whomLabel
World Wide WebTim Berners-Lee
Hypertext Transfer ProtocolTim Berners-Lee
HyperText Markup LanguageTim Berners-Lee
Semantic WebTim Berners-Lee
WorldWideWebTim Berners-Lee
IoGalileo Galilei
CallistoGalileo Galilei
EuropaGalileo Galilei
GanymedeGalileo Galilei
Trapezium ClusterGalileo Galilei
Galilean transformationGalileo Galilei
solar variationGalileo Galilei
Square-cube lawGalileo Galilei
Q1535340Galileo Galilei
Galilean micrometerGalileo Galilei
tholinCarl Sagan
carbon chauvinismCarl Sagan
Encyclopedia GalacticaCarl Sagan
fluorineAndré-Marie Ampère
Ampère's force lawAndré-Marie Ampère
480 more elided

Once we see how a given table looks, we might decide that it would be better if, for example, these inventions were grouped by inventor. This is just the sort of thing that Clojure sequence functions can help us do:

(clerk/table (->> inventions-and-discoveries
(group-by :whomLabel)
(mapv (fn [[whom whats]] [whom (apply str (interpose " ; " (map :whatLabel whats)))]))))
Enrico FermiMonte Carlo method ; Fermi–Pasta–Ulam–Tsingou problem ; Fermi resonance ; Metrop
58 more elided
Al-Birunipycnometer
Leonardo da VinciLeonardo's robot ; sfumato ; Coulomb friction ; Leonardo's crossbow ; Leonardo's
45 more elided
Joseph FourierFourier–Motzkin elimination ; heat equation
Albert Einsteingeneral relativity ; special relativity ; mass–energy equivalence ; theory of re
157 more elided
Luigi Galvanigalvanic cell ; galvanotherapy
Carl Sagantholin ; carbon chauvinism ; Encyclopedia Galactica
André-Marie Ampèrefluorine ; Ampère's force law ; Q3733125
Blaise Pascalmathematical induction ; Pascal's Wager ; Pascal's calculator ; Pascal's barrel
Galileo GalileiIo ; Callisto ; Europa ; Ganymede ; Trapezium Cluster ; Galilean transformation
68 more elided
Carl Friedrich Gaussleast squares method ; discrete logarithm ; Gauss's principle of least constrain
159 more elided
Edward TellerMetropolis–Hastings algorithm
ArchimedesArchimedean spiral ; claw of Archimedes
Max PlanckPlanck constant
Dmitri Mendeleevperiodic table ; Mendeleev's predicted elements ; periodic trends
Nikola Teslaremote control ; three-phase electric power ; Tesla coil ; Tesla valve ; radio r
44 more elided
Marie Curiepolonium ; radium
Tim Berners-LeeWorld Wide Web ; Hypertext Transfer Protocol ; HyperText Markup Language ; Seman
22 more elided
Thomas Alva Edisonincandescent light bulb ; phonograph ; phonograph cylinder ; Tasimeter ; phonome
51 more elided
Augustin-Louis CauchyRoot test ; Burnside's lemma ; infinitesimal deformation theory
20 more elided

Geospatial data

Some data are more naturally viewed in other ways, of course. In this example we find every instance of any subclass of "human settlement" (village, town, city, and so on) in Germany that has a German language placename ending in -ow or -itz, both of which indicate that it was originally named by speakers of a Slavic language.

(def slavic-place-names
(->> `{:select *
:where [{?ort {(cat ~(wdt :instance-of) (* ~(wdt :subclass-of))) #{~(entity "human settlement")}
~(wdt :country) #{~(entity "Germany")}
:rdfs/label #{?name}
~(wdt :coordinate-location) #{?lonlat}}}
[:filter (= (lang ?name) "de")]
[:filter (regex ?name "(ow|itz)$")]]
:limit 1000}
query
;; cleanup lon-lat formatting for map plot!
(mapv #(let [[lon lat] (-> %
:lonlat
:value
(str/replace #"[^0-9 \.]" "")
(str/split #" "))]
{:name (:name %) :latitude lat :longitude lon}))))
[{:latitude "
52.5128"
:longitude "
13.0515"
:name "
Flugplatz Döberitz"}
{:latitude "
51.5"
:longitude "
12.6833"
:name "
Flugplatz Mörtitz"}
{:latitude "
53.98055556"
:longitude "
11.25138889"
:name "
Erprobungsstelle Tarnewitz"}
{:latitude "
52.474444444"
:longitude "
13.138055555"
:name "
Flugplatz Gatow"}
{:latitude "
49.648080555"
:longitude "
8.742275"
:name "
Lauten-Weschnitz"}
{:latitude "
51.0175"
:longitude "
13.6589"
:name "
Niederpesterwitz"}
{:latitude "
49.6588"
:longitude "
8.84065"
:name "
Weschnitz"}
{:latitude "
53.279785"
:longitude "
11.553129"
:name "
Grabow"}
{:latitude "
50.3084831"
:longitude "
11.2381361"
:name "
Schierschnitz"}
{:latitude "
51.1494"
:longitude "
13.7972"
:name "
Gomlitz"}
{:latitude "
53.8061"
:longitude "
13.2828"
:name "
Burgwall Groß Below"}
{:latitude "
53.8164"
:longitude "
13.2572"
:name "
Burgwall Hohenbüssow"}
{:latitude "
53.7594"
:longitude "
13.3794"
:name "
Burgwall Janow"}
{:latitude "
53.7967"
:longitude "
13.3056"
:name "
Burgwall Klempenow"}
{:latitude "
53.7089"
:longitude "
13.1319"
:name "
Burgwall Schossow"}
{:latitude "
53.4433"
:longitude "
12.3197"
:name "
Burgwall Zislow"}
{:latitude "
51.612"
:longitude "
12.7844"
:name "
Gniebitz"}
{:latitude "
49.4733"
:longitude "
11.2842"
:name "
Grüne Au bei Röthenbach an der Pegnitz"}
{:latitude "
53.49538"
:longitude "
13.37613"
:name "
Dewitz"}
{:latitude "
51.0256"
:longitude "
13.7422"
:name "
Räcknitz/Zschertnitz"}
980 more elided]

The :coordinate-location in this query is the longitude/latitude position of each of these places in a somewhat unfortunate string fomat. The mapv at the end converts these lonlat strings into key/value pairs so Vega can plot the points on a map. This gives us a very clear picture of which parts of Germany were Slavic prior to the Germanic migrations:

(v/vl {:width 650 :height 650
:config {:projection {:type "mercator" :center [10.4515 51.1657]}}
:layer [{:data {:url "https://raw.githubusercontent.com/deldersveld/topojson/master/countries/germany/germany-regions.json"
:format {:type "topojson" :feature "DEU_adm2"}}
:mark {:type "geoshape" :fill "lightgray" :stroke "white"}}
{:encoding {:longitude {:field "longitude" :type "quantitative"}
:latitude {:field "latitude" :type "quantitative"}}
:mark "circle"
:data {:values slavic-place-names}}]})
Loading...

Sometimes the data needs a more customized view. Happily, we can write arbitrary hiccup to be rendered in Clerk. We'll use this query to fetch a list of different species of Apodiformes (swifts and hummingbirds), returning a name, image, and map of home range for each one.

(->> (query `{:select-distinct [?item ?itemLabel ?pic ?range]
:where [[?item (* ~(wdt :parent-taxon)) ~(entity "Apodiformes")]
[?item ~(wdt :taxon-rank) ~(entity "species")]
[?item :rdfs/label ?englishName]
[?item ~(wdt :image) ?pic]
[?item ~(wdt :taxon-range-map-image) ?range]
[:filter (= (lang ?englishName) "en")]]
:limit 11})
(mapv #(vector :tr
[:td.w-32 (:itemLabel %)]
[:td [:img.w-80 {:src (:pic %)}]]
[:td [:img.w-80 {:src (:range %)}]]))
(into [:table])
clerk/html)
Green-tailed Goldenthroat
Lucifer Sheartail
Lucifer Sheartail
Buffy Hummingbird
Andean emerald
Santa Marta Woodstar
Ruby-throated Hummingbird
Ruby-throated Hummingbird
Buff-tailed Coronet
Spot-throated Hummingbird
Bronzy Inca

Network diagrams

Another useful technique when dealing with semantic or graph-shaped data is to visualize the results as a tree. Here we gather all the languages influenced by Lisp or by languages influenced by Lisp (a transitive query across the graph), and visualize them in a big network diagram.

Because Clerk's html viewer also understands SVGs, we can just plug in an existing graph visualization library and send the output to Clerk.

The graph is really huge, so you'll need to scroll around a bit to see all the languages.

(-> (clerk/html
(let [data (query `{:select [?itemLabel ?influencedByLabel]
:where [[?item (* ~(wdt :influenced-by)) ~(entity "Lisp")]
[?item ~(wdt :influenced-by) ?influencedBy]
[?influencedBy (* ~(wdt :influenced-by)) ~(entity "Lisp")]]})]
(arr/as-svg
(arr/with-graph (arr/create-graph)
(let [vertex (->> (mapcat (juxt :itemLabel :influencedByLabel) data)
distinct
(reduce #(assoc %1 %2 (arr/insert-vertex! %2)) {}))]
(doseq [edge data]
(when (:influencedByLabel edge)
(arr/insert-edge! (vertex (:influencedByLabel edge))
(vertex (:itemLabel edge))))))))))
(assoc :nextjournal/width :full))
ChaosRubyRingCrystalSwiftElixirMirahKotlinRakuCoffeeScriptRustApache GroovySnap4ArduinoSchemeExtemporeQalbOaklispImpromptuPicoTeaHopJoySnap!Common LispScalaECMAScriptLuaJavaScriptKojoLogoRobicAgentSheetsREBOLProcessingMicrosoft Small BasicSqueakScratchSmalltalkMicro Lua DSDragonRedSquirrelCrocIoPostScriptCitrineMagikGemstoneF-ScriptEmeraldLassoObject REXXStrongtalkSelfObject PascalDartPharoObjective-CAdaGoJavaWolfram LanguageClaireFalconErlangCeylonGScriptActionScriptSenseScriptForthAMPLEHierarchical Music Specification LanguageTRACRPLBefungeFactorGleamOzF#FortressHackAustralMojoCarbonVZigWhileyRocBallerinaC#F*Q#LiveScripturbiscriptOocArgusCLUC++SubLISLISPEuLispEmacs LispAspectJEmbeddable Common LispDylanClojureTypeScriptAliceCatNewtonScriptSimple Yet Powerful Scripting LanguageJSONElmCGOLMLispLionessCubReScriptBunJSXJUDO ERPJS++asm.jsFJAXGoogle Apps ScriptMonkey XQMLNode.jsJScriptHaxeVarphi LanguagePascal ScriptKinetic Rule LanguageCurlBeanShellPHPBeefGosuPascalABC.NETChapelValaBooDNimPowerShellJsishTclItkIncr TclIdrisAgdaFeriteHaskellPureScriptjqUr/WebPureFregeA♯QiCurryAldorMercuryPerlQorePerl Data LanguageSleepPixieConcurrent CParaSailPL/pgSQLXLVerilogEiffelVHDLSeed7PL/SQLTOMObjective-JSystemVerilogSilqTurbo Pascal OOPXProfanDelphiMalbolgeAngularJSLisaacHotSpotSAM76WebAssemblyCilk PlusCilk++Solid JSGraphQLHolyCKarel++LunaPowerscriptGNU EX++URBIGame Maker LanguageATSUnreal EngineAngelScriptKarel J. RobotJoin JavaGremlinDeeselPnutsOptimJAteji PXEX10GambasJavaFX ScriptCommon Intermediate LanguageGnolangOdinHyLispPliantNialLittle bCOWSELUCBLogoMocklispGNU SmalltalkNyquistCLIPSDiscipulus

I hope this gives you some ideas about things you might want to try!