Datalog for json munging

Hello all, my name is tommy and I love clojure. You can find me on github or email me. This article was created with clerk and published on clerk garden.

I am working on a gameserver plugin for my favorite fighting game MGE, and want to be able able to run tournaments automatically on a weekly basis, similar to how many melee groups run low-stakes weekly tournaments.

Thankfully, the company challonge (owned now by logitech) hosts software that lets you create and run your own tournaments. It displays the bracket, handles tournament state, player signups, multistage tournaments (swiss/round-robin/group-stage), winners/losers brackets, and much more. It also lets you embed the (realtime updating) bracket as an iframe.

It also has an API. So lets get started. Below is an example tournament that I've created, to decide which color is the best color. A couple matches have already been played (apparently crimson is favorable over rust by 15 points...)

Below is the parsed json we get back from the challonge api after asking for all the matches of the pictured tournament. Challonge shapes its responses according to a standard called json-api. Json-api describes itself as something to stop bikeshedding questions about how json responses should be formatted.

Its uniformity somewhat enables the rest of what happens in this post.

^{::clerk/opts {:auto-expand-results? true}}
(def matches (clojure.edn/read-string (slurp "matches.edn")))
{:data [{:attributes {:identifier "
A"

 :pointsByParticipant [{:participantId 185178776 :scores [20]}
 {:participantId 185178777 :scores [3]}]

 :round 1
 :scoreInSets [[20 3]]
 :scores "
20 - 3"

 :state "
complete"

 :suggestedPlayOrder 1
 :timestamps {:createdAt "
2022-11-19T22:13:54.469Z"

 :startedAt "
2022-11-19T22:13:54.728Z"

 :underwayAt nil
 :updatedAt "
2022-11-19T22:17:43.687Z"}

 :winners 185178776}

 :id "
296494105"

 :relationships {:attachments {:data [] :links {:meta {:count 0} :related "
https://api.challonge.com/v2/tournaments/noewg2h2/matches/296494105/attachments.4 more elided"}}

 :player1 {:data {:id "
185178776"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178776.json"}}

 :player2 {:data {:id "
185178777"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178777.json"}}}

 :type "
match"}

 {:attributes {:identifier "
B"

 :pointsByParticipant [{:participantId 185178775 :scores [20]}
 {:participantId 185178778 :scores [5]}]

 :round 1
 :scoreInSets [[20 5]]
 :scores "
20 - 5"

 :state "
complete"

 :suggestedPlayOrder 2
 :timestamps {:createdAt "
2022-11-19T22:13:54.478Z"

 :startedAt "
2022-11-19T22:13:54.753Z"

 :underwayAt nil
 :updatedAt "
2022-11-19T22:17:53.837Z"}

 :winners 185178775}

 :id "
296494106"

 :relationships {:attachments {2 more elided}
 :player1 {2 more elided}
 :player2 {2 more elided}}

 :type "
match"}

 {4 more elided}
 {4 more elided}
 {4 more elided}
 {4 more elided}
 {4 more elided}
 {4 more elided}
 {4 more elided}
 {4 more elided}]

 :included [11 more elided]
 :links {3 more elided}
 :meta {1 more elided}}

I asked for matches, I got very complicated piece of data. The :data key has a vector of 10 resources, each with a type, id, some attributes, and some relationships.

For a each match in the tournament, its relationships are the two opponents facing off in that match. However, the participant nested in the match (get-in matches [:data 0 :relationships :player1]) has very little information. If you want to see any attributes of a player, you must look in the toplevel :included vector and find the right one.

The :included vector contains all 12 participant resources (colors) in this tournament.

All resources have a uniform shape.

(distinct (concat (map keys (:data matches))
(map keys (:included matches))))
((:id :type :attributes :relationships))

In each participant, there is a :misc key in the attributes map. This is a bit of data specific to my domain that challonge allows me to attach to the resource. In my case, it the id of each player in my gameserver.

^{::clerk/opts {:auto-expand-results? true}}
(-> matches :included first)
{:attributes {:finalRank nil
 :groupId nil
 :icon nil
 :misc "
acd53907-8d3b-4b56-948c-f5ef17bd3a9d"

 :name "
scarlet"

 :seed 8
 :states {:active true}
 :timestamps {:createdAt "
2022-11-19T22:17:15.521Z"

 :updatedAt "
2022-11-19T22:17:15.521Z"}

 :tournamentId 12116017
 :username nil}

 :id "
185178776"

 :relationships {:invitation {} :tournament {}}
 :type "
participant"}

At first glance this feels structure feels heavy, but it's well thought out, and handles many edge cases. I personally would prefer to expose my data using something like pathom3, but I prefer the standardized json-api approach over having arbitrary shapes that are different for every company.

Answering questions about the tournament

Now that we have all this data, we have to answer questions about it to run the gameserver logic.

To run my tournament, I need to know which matches are active (not completed, but have two defined participants), so I can enable the correct arenas ingame. Each match has a "state" attribute that defines this, so we can just filter on that. Easy enough.

(defn matches->pending [matches]
(->> matches
:data
(filter #(= "open" (:state (:attributes %))))))
#object[jsonapi_xtdb.core$matches__GT_pending 0x3b89a0fa "
jsonapi_xtdb.core$matches__GT_pending@3b89a0fa"
]
^{::clerk/opts {:auto-expand-results? true}}
(matches->pending matches)
({:attributes {:identifier "
C"

 :pointsByParticipant [{:participantId 185178774 :scores []}
 {:participantId 185178779 :scores []}]

 :round 1
 :scoreInSets []
 :scores "
0 - 0"

 :state "
open"

 :suggestedPlayOrder 3
 :timestamps {:createdAt "
2022-11-19T22:13:54.485Z"

 :startedAt "
2022-11-19T22:13:54.770Z"

 :underwayAt nil
 :updatedAt "
2022-11-19T22:13:54.770Z"}}

 :id "
296494107"

 :relationships {:attachments {:data [] :links {:meta {:count 0} :related "
https://api.challonge.com/v2/tournaments/noewg2h2/matches/296494107/attachments.4 more elided"}}

 :player1 {:data {:id "
185178774"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178774.json"}}

 :player2 {:data {:id "
185178779"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178779.json"}}}

 :type "
match"}

 {:attributes {:identifier "
D"

 :pointsByParticipant [{:participantId 185178768 :scores []}
 {:participantId 185178776 :scores []}]

 :round 2
 :scoreInSets []
 :scores "
0 - 0"

 :state "
open"

 :suggestedPlayOrder 5
 :timestamps {:createdAt "
2022-11-19T22:13:54.493Z"

 :startedAt "
2022-11-19T22:17:43.718Z"

 :underwayAt nil
 :updatedAt "
2022-11-19T22:17:43.718Z"}}

 :id "
296494108"

 :relationships {:attachments {:data [] :links {:meta {:count 0} :related "
https://api.challonge.com/v2/tournaments/noewg2h2/matches/296494108/attachments.4 more elided"}}

 :player1 {:data {2 more elided} 1 more elided}
 :player2 {2 more elided}}

 :type "
match"}

 {4 more elided}
 {4 more elided})

This filters my 10 matches down to four, so now I know to allocate four mach arenas in the game.

I also need to know which players to allow in which arenas, so I need the corresponding uuids stored in the :misc attribute.

Lets start by getting the player embedded in each match's :relationship map.

(defn matches->participating-user-ids [matches]
(let [pending-matches (matches->pending matches)]
(->> pending-matches
(map :relationships)
(map (juxt :player1 :player2))
flatten)))
#object[jsonapi_xtdb.core$matches__GT_participating_user_ids 0x3d9671f8 "
jsonapi_xtdb.core$matches__GT_participating_user_ids@3d9671f8"
]
^{::clerk/opts {:auto-expand-results? true}}
(first (matches->participating-user-ids matches))
{:data {:id "
185178774"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178774.json"}}

Nice! But we're still not done. We have the player ids, but its not the entire resource. The piece of data we want is the included resources. Lets try again, but build a join from challonge id to gameserver uuid first...

(defn matches->participating-users [matches]
(let [challongeid->myid (->> matches
:included
(map (juxt :id (comp :misc :attributes)))
(into {}))
pending-matches (matches->pending matches)]
(->> pending-matches
(map :relationships)
(map (juxt :player1 :player2))
flatten
(map (comp :id :data))
(map challongeid->myid))))
#object[jsonapi_xtdb.core$matches__GT_participating_users 0x7d766484 "
jsonapi_xtdb.core$matches__GT_participating_users@7d766484"
]
(matches->participating-users matches)
("
90e9ec35-eaa8-46f8-85ec-ee6e2fd1114e"
"
71b73727-0c20-4a47-8a54-c307cee7ff71"
"
7557dfdf-953e-4687-a343-2a15097207a6"
"
acd53907-8d3b-4b56-948c-f5ef17bd3a9d"
"
6d1de091-8c63-43e7-a5fa-9e5bd1e4405f"
"
b5dd463a-6ba4-4772-ad6e-5848f822043e"
"
6983ea29-d4b8-44df-a452-2a7260ea6348"
"
17249836-9790-436a-b974-768c4e00c622")

Great! now we have a list of uuids I can send to my gameserver. Slinging maps around like this is one of clojure's strenghts, so this felt fairly natural to write.

Problems

This code works, but what about other questions we could ask of our api response? We went from match to gameserver uuid. What if we had a gameserver uuid and wanted to know which matches they are a part of? (the player runs a chat command to display their upcoming matches)

We would have to write more or less the same amount of code (~30 lines), this time building the join in the other direction.

Every new question we ask of the data requires a new set of functions to traverse the resource graph from question to answer, with very little code reuse, producing code that is hard to read.

A better way

Enter xtdb, the graph database from JUXT. We are receiving a normalized graph of resources from the json-api endpoint, and are computing queries on it. Why not have xtdb run the queries for us.

Lets start an xtdb node with an empty configuration map: Instead of persisting the data in one of their pluggable backends, we store it in plain java datstructures by giving an empty map as configuration.

^{::clerk/visibility {:code :show :result :hide}
:nextjournal.clerk/no-cache true}
(def node (xt/start-node {}))

Xtdb is schemaless so we don't have to worry about defining any attributes ahead of time.

Although you can keep the documents nested and query them in datalog (you can run (get-in _ [:attributes :timestamps :startedAt]) directly in the query if you wanted), its cleaner to flatten the data before ingesting it.

Lets use this (12 year old!!) function to recursively flatten keys, joining nested keys with a "."

(defn flatten-keys
"adapted from http://blog.jayfields.com/2010/09/clojure-flatten-keys.html"
([m] (flatten-keys {} [] m))
([a ks m] (if (map? m)
(reduce into (map (fn [[k v]] (flatten-keys a (conj ks k) v)) (seq m)))
(assoc a (keyword (clojure.string/join "." (map name ks))) m))))
#object[jsonapi_xtdb.core$flatten_keys 0x2c9484e4 "
jsonapi_xtdb.core$flatten_keys@2c9484e4"
]

Lets see a before and after of just one resource.

^{::clerk/opts {:auto-expand-results? true}}
(->> matches :data first)
{:attributes {:identifier "
A"

 :pointsByParticipant [{:participantId 185178776 :scores [20]}
 {:participantId 185178777 :scores [3]}]

 :round 1
 :scoreInSets [[20 3]]
 :scores "
20 - 3"

 :state "
complete"

 :suggestedPlayOrder 1
 :timestamps {:createdAt "
2022-11-19T22:13:54.469Z"

 :startedAt "
2022-11-19T22:13:54.728Z"

 :underwayAt nil
 :updatedAt "
2022-11-19T22:17:43.687Z"}

 :winners 185178776}

 :id "
296494105"

 :relationships {:attachments {:data [] :links {:meta {:count 0} :related "
https://api.challonge.com/v2/tournaments/noewg2h2/matches/296494105/attachments.4 more elided"}}

 :player1 {:data {:id "
185178776"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178776.json"}}

 :player2 {:data {:id "
185178777"
:type "
participant"}

 :links {:related "
https://api.challonge.com/v2/tournaments/noewg2h2/participants/185178777.json"}}}

 :type "
match"}
^{::clerk/opts {:auto-expand-results? true}}
(flatten-keys (->> matches :data first))
{:attributes.identifier "
A"

 :attributes.pointsByParticipant [{:participantId 185178776 :scores [20]}
 {:participantId 185178777 :scores [3]}]

 :attributes.round 1
 :attributes.scoreInSets [[20 3]]
 :attributes.scores "
20 - 3"

 :attributes.state "
complete"

 :attributes.suggestedPlayOrder 1
 :attributes.timestamps.createdAt "
2022-11-19T22:13:54.469Z"

 :attributes.timestamps.startedAt "
2022-11-19T22:13:54.728Z"

 :attributes.timestamps.underwayAt nil
 13 more elided}

Much better. We don't need to, but we could do the reverse of this operation because section 7.8.2 of the json-api spec forbids "." in key names.

Now lets deal with toplevel responses.

Because of json-api's uniformity, we can have just one function that handles every possible api response. In particular, we use these properties:

  • Every toplevel response has :data and :included keys. They contain either a single resource, or a list of them.
  • Each resource has an :id key.

This function takes every resource from the response, flattens it, and puts it into our database node.

Notice how we tranform the challonge notion :id to :xt/id, which every xtdb document needs.

(defn ingest-jsonapi-response [node resp]
(let [payload (->> resp
((juxt :data :included))
(map (fn [res] (cond
(map? res) [res]
(vector? res) res)))
flatten
(map flatten-keys)
(map #(clojure.set/rename-keys % {:id :xt/id})))]
(xt/await-tx node (xt/submit-tx node (for [doc payload]
[::xt/put doc])))))
#object[jsonapi_xtdb.core$ingest_jsonapi_response 0x5a6c6ffc "
jsonapi_xtdb.core$ingest_jsonapi_response@5a6c6ffc"
]
(ingest-jsonapi-response node matches)
{:xtdb.api/tx-id 0 :xtdb.api/tx-time #inst "
2024-11-16T16:47:11.042-00:00"
}

Datalog is beautiful

Now lets ask all the same questions as before, this time in a declarative style.

(defn xtdb-matches->pending [node]
"see which matches are pending (need to and can be played)"
(map first (xt/q (xt/db node) '{:find [matchid]
:where [[matchid :type "match"]
[matchid :attributes.state "open"]]})))
#object[jsonapi_xtdb.core$xtdb_matches__GT_pending 0x547bdcf8 "
jsonapi_xtdb.core$xtdb_matches__GT_pending@547bdcf8"
]

If you have never seen datalog before, don't fret. This query can be read as

find the id of every document that has the key :type with a value of "match", and the key :attributes.state with a value of "open".

Each triple in the :where clause of the query is defining a rule/restriction in the shape [entity attribute value] that the returned documents must comply to.

The effect of declaring that an attribute must match a data literal like [match :attribute.state "open"] is a filtering of documents. This line is like my above function matches->pending which took my 10 matches down to only 4 open ones. This is a nice delcarative filtering, but nothing mind shattering.

The magic comes when you put a variable in the value position of the triple. Now any time you use this variable again (in entity or value position of another rule), the values must match up.

^{::clerk/opts {:auto-expand-results? true}}
(xtdb-matches->pending node)
("
296494107"
"
296494108"
"
296494110"
"
296494109")
(defn xtdb-matches->participating-users [node]
"see which matches are pending, and which users are in those matches"
(xt/q (xt/db node) '{:find [match p1uuid p1name p2uuid p2name]
:where [[match :type "match"]
[match :attributes.state "open"]
[match :relationships.player1.data.id p1]
[match :relationships.player2.data.id p2]
[p1 :attributes.misc p1uuid]
[p1 :attributes.name p1name]
[p2 :attributes.misc p2uuid]
[p2 :attributes.name p2name]]}))
#object[jsonapi_xtdb.core$xtdb_matches__GT_participating_users 0x1ae7d0b7 "
jsonapi_xtdb.core$xtdb_matches__GT_participating_users@1ae7d0b7"
]

Here we extract the p1 and p2 values from the match relationships, then use those values as entity ids to extract information about the player.

^{::clerk/opts {:auto-expand-results? true}}
(xtdb-matches->participating-users node)
#{["
296494107"

 "
90e9ec35-eaa8-46f8-85ec-ee6e2fd1114e"

 "
coral"

 "
71b73727-0c20-4a47-8a54-c307cee7ff71"

 "
salmon"]

  ["
296494108"

 "
7557dfdf-953e-4687-a343-2a15097207a6"

 "
red"

 "
acd53907-8d3b-4b56-948c-f5ef17bd3a9d"

 "
scarlet"]

  ["
296494109"

 "
6d1de091-8c63-43e7-a5fa-9e5bd1e4405f"

 "
orange"

 "
b5dd463a-6ba4-4772-ad6e-5848f822043e"

 "
maroon"]

  ["
296494110"

 "
6983ea29-d4b8-44df-a452-2a7260ea6348"

 "
green"

 "
17249836-9790-436a-b974-768c4e00c622"

 "
crimson"]}

Now the reverse operation, (find match ids from gameserver uuid), is trivial.

(defn xtdb-uuid->matches [node uuid]
"find the challonge match ids that a given player is currently allowed to play in"
(xt/q (xt/db node) '{:find [match]
:where [[p1 :attributes.misc uuid]
(or [match :relationships.player1.data.id p1]
[match :relationships.player2.data.id p1])
[match :attributes.state "open"]]
:in [uuid]}
uuid))
#object[jsonapi_xtdb.core$xtdb_uuid__GT_matches 0x125cc4ab "
jsonapi_xtdb.core$xtdb_uuid__GT_matches@125cc4ab"
]
(xtdb-uuid->matches node "90e9ec35-eaa8-46f8-85ec-ee6e2fd1114e")
#{["
296494107"]}

Or we can ask, given two player names, which matches they are in.

(defn xtdb-names->match [node p1name p2name]
"find the challonge match ids that a given player is currently allowed to play in"
(xt/q (xt/db node) '{:find [match]
:where [[p1 :type "participant"]
[p2 :type "participant"]
[p1 :attributes.name p1name]
[p2 :attributes.name p2name]
[match :type "match"]
[match :attributes.state "open"]
(or (and
[match :relationships.player1.data.id p1]
[match :relationships.player2.data.id p2])
(and
[match :relationships.player1.data.id p2]
[match :relationships.player2.data.id p1]))]
:in [p1name p2name]}
p1name
p2name))
#object[jsonapi_xtdb.core$xtdb_names__GT_match 0x2bf65a93 "
jsonapi_xtdb.core$xtdb_names__GT_match@2bf65a93"
]
(xtdb-names->match node "red" "scarlet")
#{["
296494108"]}

Datalog can also run arbitraty code inside queries. Here, the challonge api gives us the winner of a match as an integer, even though the primary keys are all strings. Easily rectified by calling str on the key before using it further.

(xt/q (xt/db node) '{:find [winnername]
:where [[match :type "match"]
[match :attributes.winners winner']
[(str winner') winner]
[winner :type "participant"]
[winner :attributes.name winnername]]})
#{["
crimson"]
["
scarlet"]}

We can also run arbitrary predicates on logic variables (not=) and access nested documents (get-in).

(xt/q (xt/db node) '{:find [points participant]
:where [[match :type "match"]
[match :attributes.pointsByParticipant points']
[(get-in points' [:scores 0]) points]
[(get-in points' [:participantId]) participant]
[(not= nil points)]]})
#{[3 185178777] [5 185178778] [20 185178775] [20 185178776]}

There is much more you can do with datalog, almost any question you could want to ask is answerable. See here for more examples.

Conclusion

Thanks to json-api, every response to further api calls moving the tournament forward can be run through ingest-jsonapi-resonpse and be otherwise forgotten.

Before xtdb, I would have to write code to answer my questions. After xtdb, every question I can ask of my data is already answered, I just have to phrase the quesiton!