We are all now aware of what the so called Big O notation actually means. Big O refers to the asymptotic upper bound, which means that it's a cap on how much the time complexity will grow. So, if we say that a function is O(1), then there's no growth and the function will always take a fixed amount of time to complete. If, on the other hand, we say that a function is O(N) then if N doubles, the function's time complexity at most will double. It may be less, but never more. That's the upper bound of an algorithm.

Lets start with a simple program to calculate time complexity

Count the total number of fundamental operations the algorithm performs, relative to some value N.

The function sets the value of ret to 1. This is one operation.

Then this function tests the condition of the while loop exactly 1 + n times. It runs the body of the while loop exactly n times. The body of the while loop performs exactly 2 fundamental operations: 'multiply ret by n', and 'decrement n'. The condition has one fundamental operation, comparison. Calling the function counts as a fundamental operation, too.

So we have the total number of fundamental operations being

1 + 1 + (1 + n) + 2 * n. (The sum is composed of the operations for calling the function, setting ret = 1, performing a total of 1 + n tests in the condition of the while loop, and a total of 2 * n operations in the body of the while loop.)

This simplifies to 3 + 3 * n fundamental operations, which is **O(n)**

**------------------------------------------**

Now, even though Big O is the most common notation, it is not accurate. Its like saying, all programs will be in the subset of O(infinity), because all programs will definitely take less time than *infinity *to complete. But this is not an accurate way of saying things.

If we take an example of a sequential search

This algorithm is clearly seen to be O(N) because it has just one loop that relies on the size of the array (the time complexity of the loop doubles as the size of the array doubles). However, this is the worst case upper bound. On an average, only half of the array would (or should) be searched before the item is found (due to the random distribution). So, while the time complexity could reach O(N), it's usually less.

Okay, lets do a binary search instead of a sequential search? If the array is sorted, we can make the search faster by splitting the array in half at each comparison and only searching the half where the item is expected to lie. That's the way **Binary Search** works. Here's the code for binary search:

Calling this algorithm O(N) would not be wrong because the time complexity will never exceed O(N). But because the array is split in half each time, the number of steps will always be equal to the base-2 logarithm of N, which is considerably less than O(N). Therefore, a more accurate claim is O(log_{2} N), an upper limit we are sure never to cross.

We can also be interested in a lower bound, that is, the smallest time complexity that we can expect. In our example, since a correct binary search is guaranteed to only take log N steps to complete, the lower bound is also logarithmic. So we can say that the lower bound for binary search is Ω(log_{2} N).

Now, the upper and lower bounds are the same! This means that we can have an accurate bound on the time complexity of a binary search. To represent this asymptotic tight bound, we say that binary search is Θ(log N). O(log_{2} N) is still correct, and you'll see it more often than any of the other notations, but Θ(log_{2}N) is more accurate.

Note: *In the best case scenario, the first item in the array would be the one we're searching for and the search should effectively be O(1), so why is the lower bound Ω(log _{2} N)? The reason is that we're only using one variable, the size of the array, to derive the time complexity. If we use other variables, such as the contents of the array and the item being searched for, we can easily say that the lower bound is O(1). However, without these, we cannot make an assumption. Therefore, the longest time complexity possible is logarithmic for both the upper and lower bounds.*

**------------------------------------------**

Okay, Lets now look at some sorts, starting with** selection sort**. The algorithm is to find the largest item and move it to the back of the array. When you move an item to the back, decrease the size of the array so that you don't continually choose from the items that have already been selected:

This algorithm has two loops, one inside of the other. Both rely on the size of the array, so the algorithm is clearly O(N * N), more commonly shown as O(N^{2}) and referred to as quadratic. The fact that N decreases with each step of the outer loop is irrelevant unless you want a tight bound, and even then it's difficult to analyze. But that doesn't matter much because the upper bound is really all we care about for an existing sorting algorithm.

Let's look at a faster sort. The **heap sort algorithm **uses a tree based structure to make the selection process faster. Because the selection part of a selection sort is Θ(N), and the outer loop that it's nested in is O(N), selection sort is O(N^{2}). But by using a heap where selection is O(1) and fixing the heap is Θ(log_{2} N), we can bring the complexity measure down to Θ(log_{2} N). Lets see

Because the heap is structured like a tree, **do_heap** is Θ(log_{2} N). The first loop in **heapsort** is O(N / 2), but because the second loop is O(N) and dominates the first loop, we can leave the complexity of the first loop. So we have a O(N) loop that calls a Θ(log_{2} N) function. So, we can say that the upper bound of heap sort is O(Nlog_{2} N). However, because the lower bound of heap sort is also Ω(Nlog_{2} N) for the same reasons as binary search, we can safely say that heap sort has a time complexity of Θ(log_{2} N).

References: Eternally Confuzzled

< Previous | Next > |
---|

To avoid waiting for your comment to be published after being reviewed, please log in as a registered user.