Building a URL shortener with Redis & Clojure

I have recently developed an anonymous URL shortener service which you can find at encog.io. It can store a URL and assign it an autogenerated ID or the user can optionally choose an alias for it. In this post I'll talk about how it uses Redis for storage. The code is open source and you can find it on GitHub.

URL Shortener

A URL shortener is a web service that takes URLs and assigns them an alphanumeric ID. Visiting the URL associated with the ID redirects to the original. Short and meaningful links such as encog.io/code look better for sharing them with the world.

A key-value store such as Redis fits our use case considering our storage needs: given a URL we need to store it associated with an alphanumeric ID and then want to be able to retrieve the original URL given the alphanumeric ID.

Redis

Redis is an in-memory data structure key-value store that can be used as a database, cache or message broker. The strings are the most fundamental Redis data type and the only one we will need.

Even though is a key-value store and doesn't have a query language, Redis supports many data types with a rich set of commands, transactions and LUA scripting.

Clients interact with a Redis server using a protocol called RESP: a simple, human-readable protocol with a request-response model. The redis-cli command-line client can be used to interact with a Redis server from the console.

$ redis-cli
127.0.0.1:6379> PING
PONG
127.0.0.1:6379>

For talking to Redis from Clojure we use the excellent Carmine library. It provides a wcar macro that acquires a connection to Redis and inside its body we can use functions from the taoensso.carmine namespace that correspond to Redis commands.

(ns shortener
  (:require
   [taoensso.carmine :as car :refer [wcar]]))

(def redis {:spec {:host "127.0.0.1"}})

(wcar redis
  (car/ping))
;;=> "PONG"

ID generation

Redis allows to treat strings as 64-bit signed integers provided that the string can be represented as an integer. Commands like INCR can be used to manipulate keys containing numerical values.

Redis stores integers in their integer representation, so for string values that actually hold an integer, there is no overhead for storing the string representation of the integer.

Using INCR on a key that doesn't exist sets it to 0 before performing the increment, so we can use INCR to treat a key as a counter.

(defn unique-id
  [conn]
  (wcar conn
    (car/incr "unique-id.counter")))

(unique-id redis)
;;=> 1
(unique-id redis)
;;=> 2
(unique-id redis)
;;=> 3

Since we want the IDs to be short, human-readable and URL friendly we'll use base-encoding. The most popular base encoding for URL strings is base 64, but encog.io uses a custom alphabet which includes the letter Ñ.

(def alphabet
  "abcdefghijklmnñopqrstuvwxyzABCDEFGHIJKLMNÑOPQRSTUVWXYZ1234567890_-")

(def base (count alphabet))
;;=> 66

It has a couple more characters than the base-64 alphabet so we'll have to write the base encoding function for the custom alphabet.

(defn base-encode
  [input]
  (loop [n input
         res ""]
    (cond
      (zero? input) (subs alphabet 0 1)
      (zero? n) res
      :else
      (recur (quot n base)
             (str (nth alphabet (rem n base)) res)))))

(base-encode 0)
;;=> "a"

(base-encode 1)
;;=> "b"

(base-encode 9999999999999)
;;=> "b1-gJÑ4j"

Storing URLs

With the ID generation and encoding solved we can write the functions for storing and retrieving URLs. The store-url! function takes a Redis connection and a URL, returning its ID after storing it in Redis.

(def urls-prefix "url-shortener.id:")

(defn store-url!
  [conn url]
  (let [id (unique-id conn)
        id-key (base-encode id)
        redis-key (str urls-prefix id-key)]
    (wcar conn
      (car/set redis-key url))
    id-key))

(store-url! redis "http://alejandro.run")
;;=> "b"

(store-url! redis "http://github.com/encogio/encogio")
;;=> "c"

For retrieval we use the GET command, returning nil when the given ID does exist.

(defn get-url!
  [conn id]
  (let [redis-key (str urls-prefix id)]
    (wcar conn
      (car/get redis-key))))

(get-url! redis "b")
;;=> "http://alejandro.run"

(get-url! redis "c")
;;=> "http://github.com/encogio/encogio"

(get-url! redis "d")
;;=> nil

Aliases

The basics for the URL shortener are in place but we haven't taken into account that we want users to be able to provide a custom alias. Since our IDs are already alphanumeric we'll treat aliases as IDs and use them as keys for storing aliased URLs.

Once a key in our URLs keyspace is picked we don't want to overwrite it so we have to make sure that we only write them once.

Atomic set

Redis supports running LUA scripts transactionally with the EVAL command so we can guarantee the atomicity of a set operation by doing it inside a LUA script.

Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed.

There are two types of arguments we pass to LUA scripts: key arguments (assumed to be used as Redis keys) and regular arguments. They are available in the KEYS and ARGV special variables respectively. Note that LUA uses 1-based indexing.

The Redis EVAL command takes a LUA script, the number of key arguments and the script arguments and returns the result returned by the script. Here's a few examples of how we can use it from Clojure:

(wcar redis
  (car/eval "return {KEYS[1]};" 1 "a-key"))
;;=> ["a-key"]

(wcar redis
  (car/eval "return {KEYS[1], ARGV[1]};" 1 "a-key" 42))
;;=> ["a-key" "42"]

Redis commands can be called from Lua using redis.call.

(wcar redis
  (car/eval "return redis.call('SET', KEYS[1], ARGV[1]);" 1 "a-key" 42))
;;=> "OK"

(wcar redis
  (car/get "a-key"))
;;=> "42"

With that in mind we write a LUA script that will result in an error if the key we are trying to set already exists.

-- Receives a key (KEYS[1]) and a value (ARGV[1]) and atomically 
-- sets the key to value if it doesn't exist.
--
-- Returns "OK" if successful, "duplicate key" if not.
--
local exists = tonumber(redis.call("EXISTS", KEYS[1]));
if exists == 0 then
   return redis.call("SET", KEYS[1], ARGV[1]);
else
   return redis.status_reply("duplicate key");
end;

Since we'll be using this script quite frequently, we'll use Carmine's eval* helper which optimistically tries the Redis EVALSHA command to save bandwidth.

(def atomic-set-lua "
  local exists = tonumber(redis.call('EXISTS', KEYS[1]));
  if exists == 0 then
     return redis.call('SET', KEYS[1], ARGV[1]);
  else
     return redis.status_reply('duplicate key');
  end;")

(defn atomic-set!
  [conn k v]
  (let [set? (wcar conn
               (car/eval* atomic-set-lua 1 k v))]
    (if (= set? "OK")
      {:key k :value v}
      {:error  :conflict})))

When trying to set a key using the atomic-set! function it will only be written if it doesn't exist already.

(atomic-set! redis "write-once" 42)
;;=> {:key "write-once" :value 42}

(atomic-set! redis "write-once" 42)
;;=> {:error :conflict}

atomic-set! can be used now to support aliasing URLs.

(defn alias-url!
  [conn url alias]
  (let [redis-key (str urls-prefix alias)
        result (atomic-set! conn redis-key url)]
    (when-not (:error result)
      alias)))

(get-url! redis "blog")
;;=> nil

(alias-url! redis "http://alejandro.run" "blog")
;;=> "blog"

(get-url! redis "blog")
;;=> "http://alejandro.run"

(alias-url! redis "http://alejandro.run" "blog")
;;=> nil

Finally the store-url! function needs to use atomic-set! as well and try another autogenerated ID when running into a conflict.

(defn store-url!
  [conn url]
  (let [id (unique-id conn)
        id-key (base-encode id)
        redis-key (str urls-prefix id-key)
        result (atomic-set! conn redis-key url)]
    (if-not (:error result)
      id-key
      (recur conn url)))

Conclusion

We have used Redis and Clojure to implement the basis of an URL shortener service that supports both autogenerated IDs and user supplied aliases. In future posts we'll be looking at how to expose our URL shortener to the world through HTTP as well as protecting it from bad actors by rate limiting.

Happy hacking!