For many programmers, their first introduction to the theory behind code performance and efficiency is Big O notation. Without diving too deep into its definition, Big O notation is often introduced as the maximum time a program takes to complete as a function of the size of the data set n, as n becomes larger and larger (approaching infinity).
It may seem intuitive to understand the Big O of an algorithm by examining the code written to solve the algorithm. If the code contains a single for loop that repeats as many times as there are elements in the data set, then it’s safe to assume the time taken to run that code grows at the same rate that the data set grows (known as O(n)). However, as algorithms become more and more complex, it’s harder to understand how time-efficient it will be at a glance.
A misconception about Big O notation, particularly among newbies, is that it’s the only aspect of a program that matters; the “end all be all” of understanding code efficiency with respect to time. With this view, one may look at some known values of Big O for different algorithms and ask “why is algorithm A more efficient than algorithm B, if they share the same Big O?”
To answer this question, let’s examine bubble sort and selection sort. They share a Big O of O(n^2). The cProfile Python library will record the time taken for each algorithm to finish and output some data to the terminal. cProfile is a software profiler, designed to measure how the time needed to run Python programs and give developers insight on how to optimize their code. The sorting algorithms will be given a randomly generated list of integers of the same length to crack. The library NumPy has a useful method for this purpose.
Here is the output from using bubble sort to sort a list of one hundred random integers in the range of 0 to 999. The second column, tot time, represents “total internal time,” or the time taken to finish executing the function bubble_sort within the script comparing_sorts.py. We can see that bubble_sort took 0.005 seconds (5ms) to complete its job.
Increasing the length of the list by a factor of ten increases the time taken for the sort to finish by a factor of one hundred. Upon increasing the list to 10,000 values, bubble sort took 48.842 seconds to complete, continuing the trend.
The numbers speak for themselves. Even though these sorts are O(n^2), they vary drastically in the time needed to finish. This is where the first assumption about Big O notation comes in. Even though selection sort is more than four times as fast as bubble sort for the same array, they are both O(n^2) because Big O notation describes the time taken as a function of n as n approaches infinity.
In other words, bubble sort and selection sort have the same Big O because the time taken for both algorithms grew at the same rate as more elements were added. Both bubble sort and selection sort took around 100 times longer to finish on a list of 10,000 elements than on a list of 1,000 integers, and that general relationship is what Big O notation is for.
I hope this extremely brief analysis of two algorithms shed some light on Big O notation or at least piqued your interest in learning more. I have included a link to my Python code here, written for Python 3.6.4. You will need to install NumPy through pip before running the script if you have not installed NumPy already. Happy coding!
Why Learn Python?
Python is easy to learn, write and read. It’s a great starting programming language for beginners and is easy to pick up for experienced developers. Despite its simplicity, Python’s growing job demand and technical versatility are everything but ordinary.