Monday, Dec 11th

Last update12:59:40 PM GMT

Programming

A Starter on Sorting Algorithms

Write e-mail

Computers are always on the lookout for information, whether stored on disks or on the network. And these searches only become efficient if the data to be searched is sorted. It’s said that computers spend more time sorting data than anything else, and so it makes sense to understand the different sorting methods and their relative strengths and weaknesses.

argaiv1865

There are many generic sorting libraries already present such as C's qsort, or C++'s std::sort or std::partial_sort. These libraries are ready to be used, but are not always the best option. Why, you may ask!

There are 2 problems. First, these libraried algorithms are designed to work for as many cases as possible, but for your specific case they may not be practical or efficient enough. For example, if you use the standard C qsort, it might give you a sorted output but after making say tens of different tests and hence be too slow. How to do you fix the problem? You can find another library, but how long can you keep doing that?

Second, libraries are designed as black boxes, so you would have no idea what algorithm is actually being used. The name might suggest the quicksort algorithm, but in reality it could be a bubble sort, or worse, a very poor implementation of quicksort that works for many cases but runs very very slow in common worst cases.

So you are left with the simple and sane option of writing a sorting algorithm, tailored to your needs, yourself.

Sorting Considerations

Now, because sorting an unordered array or list is time consuming, even when done efficiently, so it makes sense to avoid sorting when you can. This usually means that one should use a better data structure such as a hash table (where sorting probably isn't necessary), or maybe a binary search tree (where items are sorted by default). Then, there are times when you don't even need to sort! It could be just as fast, or faster, to just brute force your way through things. For example, if you compare the time taken to use a sequential search on an array of about 50 items with that of sorting it first and then using a binary search, there is hardly a noticeable difference!

There are two kinds of sorting algorithms - stable and unstable. The only difference between them is how they treat duplicate items.

In a stable algorithm, if the two items being compared are equal, they will retain their original relation to each other as in the original unsorted array or list. Why does it even make a difference?! See when two compared items are equal to 2, it really doesn't matter which 2 is placed before the other in the final sorted sequence. However, in a more common situation, items consist of more than one value (as in case of a linked list where you will have a serial no, age and a name as a single node), and only one of those values (for example age) is used as the item for sorting. Lets take an example:

|(5:4)|(6:6)|(3:1)|(5:5)|(1:0)|(3:2)|(4:3)|(6:7)|

For a stable sort, when sorting on the first half of the item, the result needs to retain the original order of the second half of the item. In other words, the second values have to be in sorted ascending order after the sort is finished, or the algorithm isn't stable:

|(1:0)|(3:1)|(3:2)|(4:3)|(5:4)|(5:5)|(6:6)|(6:7)|

An unstable sort will not be bound by any such restriction.

Following is a list of algorithms which are naturally stable and others which can be made stable at the cost of using more space or taking more time:

  • Selection sort is stable if you shift instead of swap.
  • Insertion sort is stable.
  • Bubble sort is stable.
  • Counting sort is stable.
  • Merge sort is stable if the merge algorithm is stable.
  • LSD radix sort must be stable to work correctly.
  • MSD radix sort is stable if the underlying sort is stable.

Performance

The performance complexities you see primarily are O(N2), O(Nlog N), and O(N). It is therefore obvious that the upper bound for any sorting algorithm is infinite as long as it manages to sort the items eventually. This suggests an upper bound of of O(N!), because sorting can be thought of as picking a sorted permutation of the elements.*

For the lower bound, the lowest possible bound for most sorting algorithms is Ω(Nlog N). This is because most sorting algorithms use item comparisons to determine the relative order of items. A comparison tree for the three numbers 1,2, and 3 can be easily constructed:

sorting algorithms

Each path results in a valid permutation of the three items and the height of the tree determines the lower bound of the sorting algorithm. Because there are as many leaves as there are permutations for the algorithm to be correct, the smallest possible height of the comparison tree is logn!, which is equivalent to Ω(Nlog N).

It's also possible to meet the safe lower bound of O(N) for sorting. That happens when we look at the counting and radix sorts.

*The canonical awful algorithm, called bogosort, is probably the worst possible sorting algorithm. Bogosort is equivalent to throwing a deck of cards in the air, then testing to see if they're sorted when you pick them up at random. Bogosort could potentially never end due to its random nature.

References: Introduction to Algorithms by Cormen, Rivest, Stein, Eternally Confuzzled

Last Updated on Sunday, 31 October 2010 00:38
  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  Next 
  •  End 
  • »

Page 1 of 3

Web Hosting