There Is Only One Logarithm

When you start learning about algorithms, you very quickly learn about algorithm complexity, which describes how the execution time grows with increasingly large inputs. For instance, you might learn that binary search on an ordered list takes \(O\left(\log(n)\right)\) time, and a sorting algorithm like merge sort takes \(O\left(n\log(n)\right)\) space. This data is quite useful to know and understand, as it can describe how your algorithms work at scale.

However, you might at some point wonder about one key detail: which \(\log\) am I referring to? As you probably are aware, the logarithm is a way of getting the power of a number with respect to a certain base a.k.a \(x = \log_a(b)\) satisfies \(a^x = b\). In other words, each base number \(a\) gives us a different logarithm function. So, which base should we use?

Some people say that, since powers of \(2\) are very important in computer science, we should therefore treat \(\log\) as \(\log_2\). Others might say that usually, when one doesn't specify the base of the logarithm, it should refer to \(\log_{10}\) or even \(\log_e\).

I say it doesn't matter; when it comes to big-\(O\), all logarithms are equal.

First, A Point of Clarification

For the rest of this article, I will be setting aside big-\(O\), since it will be more instructive to talk about it's lesser known brother, big-\(\Theta\) (or big-theta). It's sometimes forgotten that big-\(O\) describes an upper-bound to performance, namely that the execution time of the algorithm will, at most, grow at the given rate. It is therefore not incorrect to say that quicksort runs in \(O(x^3)\) time. Big-\(\Theta\), on the other hand, also includes a lower-bound, and is therefore more suitable for this discussion. Mathematically speaking, we say that \(f(x) = \Theta(g(x))\) for real functions \(f,g\) if there exists constants \(c, C \in \mathbb{R}\) such that for some \(K \in \mathbb{R}\), \(cg(x) \leq f(x) \leq Cg(x)\) for all \(x \geq K\). The equivalent definition for big-\(O\) only has the upper bound \(f(x) \leq Cg(x)\).

Why should we care about big-\(\Theta\). Well, if \(f(x) = \Theta(g(x))\), then we can actually replace \(g(x)\) with \(f(x)\) whenever we do big-\(O\). Since \(x + 208 = \Theta(x)\), if an algorithm has complexity \(O(x)\) we can equally say it has complexity \(O(x + 208)\). If you're not too familiar with big-\(\Theta\), don't worry too much: a lot of the ideas from big-\(O\) apply to big-\(\Theta\) as well, including the ignoring of constant multipliers and non-dominant terms.

One more important thing to mention, is that big-\(\Theta\) is symmetric: if \(f(x) = \Theta(g(x))\), then \(g(x) = \Theta(f(x))\). This means we only have to do analysis one way, which saves us a little time.

So, all logarithms are the same?

This entire point stems from a simple formula, which we derive as follows: consider the positive real number \(x\), and bases \(b\) and \(c\). We can write \(x = b^{\log_b(x)} = (c^{\log_{c}(b)})^{\log_b(x)}\). Then, by simply applying \(\log_c\) to both sides, we get \(\log_c(x) = \log_c(b)\log_b(x)\). Now, note that if we fix \(b\) and \(c\), then \(\log_c(b)\) is constant, hence \(\log_b(x)\) and \(\log_c(x)\) are equal up to a constant! This statement alone means several things: for instance, if we have the graph of one logarithm, we can obtain the graph for any other through simply stretching, shrinking, or flipping to our original graph. Another implication, in terms of big-\(\Theta\), is that \(\log_b(x) = \Theta(\log_c(x))\).

The purple line demonstrates the equality of the logs up to a constant. Made with GeoGebra®

This immediately solves our problem: no matter which logarithm we use, whether \(\log_2\), \(\log_{10}\) or \(\log_e\), we get the same big-\(O\) class.

When the base doesn't matter

So we've seen the base doesn't matter for \(\Theta(\log)\). Where else can we apply this idea to? Well, since we can often ignore constant multipliers, we can already point to a bunch of classes where the bases don't matter.

For instance, since \((c\log(x))^p=c^p\log(x)^p\), we can therefore already conclude that \(\log_a(x)^p = \Theta(\log_b(x)^p)\) for any base \(a,b\) and power \(p\).

Similarly, since \(f(x)(c\log(x)) = cf(x)\log(x)\), we get \(f(x)\log_a(x) = \Theta(f(x)\log_b(x))\) for any base \(a,b\) and real function \(f\).

In particular, we can combine them and get \(f(x)\log_a(x)^p = \Theta(f(x)\log_b(x)^p)\) for any base \(a,b\), real function \(f\) and power \(p\).

What about more complicated examples, say \(\log(\log(x))\). Well, in this case, it's easy: the outer \(\log\) doesn't matter, since the constant factor disappears immediately, and as for the inner \(\log\), since \(\log(c\log(x)) = \log(c) + \log(x)\), and clearly \(\log(c)\) is non-dominant, meaning it grows at a slower pace than \(\log(x)\), we can ignore it, and so \(\log(\log_a(x)) = \Theta(\log(\log_b(x)))\).

Okay, but what about \(\log(\log(\log(x)))\), or \(\log(\log(\log(\log(x))))\), or any tower of \(\log\)s for that matter. There is a simple result that tells us that yes, none of the bases of these \(\log\)s matter:

Let \(f(x)\) be concave, positive and increasing over \((a,\infty\)) for some real \(a\), and let \(c\) be a positive constant real number. Then, \(f(x+c) = \Theta(f(x))\).

Note that without the concavity restriction, this statement is generally not true. It's possible to construct many functions that stay relatively flat most of the time, and then suddenly shoot up over a range of length \(c\).

With concavity though, the problem becomes easy to solve: \(f(x)\) being increasing immediately gives us \(f(x) \leq f(x + c)\) when \(x \geq a\). Then, when \(x - c \geq a \), we get

\begin{align}f(x) &= f\left(\frac{1}{2}(x-c) + \frac{1}{2}(x+c)\right) \\ &\geq \frac{1}{2}f(x-c) + \frac{1}{2}f(x+c) \\ &\geq \frac{1}{2}f(x+c). \end{align}

Therefore, since for \(x \geq a + c\), we get \(f(x) \leq f(x+c) \leq 2f(x)\), thus \(f(x+c) = \Theta(f(x))\).

Thus, since the composition of increasing concave functions is still an increasing concave function, thus for \(\log(\log(\log(\dots)))\), if we want to change the base of one of the \(\log\)s, say \(f(\log(c\log(g(x))))\) where \(f\) and \(g\) are a tower of 0 or more \(\log\)s, then we get the above is equal to \(f(\log(c) + \log(\log(g(x))))\), which, when applying our above result, tells us \(f(\log(c) + \log(\log(g(x))))\) \(= \Theta(f(\log(\log(g(x))))\). Thus, the bases of the \(\log\)s don't matter.

When the base does matter

So, I've shown you several ways in which the base of the \(\log\) doesn't matter. Is this true in all cases though?

The answer, naturally, is no. Consider \(e^{log_a(x)}\). We can write this as \(e^{\log_a(e)\log_e(x)}\), which is equal to \(x^{\log_a(e)}\). Clearly then, the power of \(x\) in this equation depends on our choice of base \(a\). And, while it is certainly unusual to see \(e^{\log(x)}\), it's still an important reminder to pay attention to what you're doing, and not oversimplify too early.

Another example is as follows: for \(x \geq 1\), let \(f(x) = n!\) where \(n\) is the largest integer such that \(n! \leq x\). This naturally gives us \(f(x) \leq x\), so \(f(x) = O(x)\). However, we have \(f(n!) = n!\) and \(f(\frac{n!}{2}) = (n-1)!\) for \(n \geq 2\), thus for any integer \(n\) greater than \(2\), \(nf(\frac{n!}{2}) = f(x)\), and so in particular there are no constants \(c\) satisfying \(cf(\frac{x}{2}) > f(x)\) for all \(x\) greater than any constant \(K \in \mathbb{R}\), so in particular \(f(\frac{x}{2}) \neq \Theta(f(x))\). What this means is that, if \(log_a(x) = \frac{1}{2} \log_b(x)\) for two bases \(a,b\), then \(f(\log_a(x)) \neq \Theta(f(\log_b(x)))\), so specifying \(a\) and \(b\) does matter.

Final Thoughts

In any case, the reason I've had to use such unnatural examples in the previous section is simply because, in most of the cases algorith complexity is concerned with, constant multipliers don't really matter all that much. So, in the future, feel free to use your favorite logarithm, since it (probably) won't make a difference!