Unit 05: Algorithm efficiency
This week's stuff:
You can read through / play the audio for each slide at your own pace, or
Processing power continues increasing as time moves forward. Many things process so quickly that we barely notice it, if at all. That doesn't mean we can just write code as inefficiently as possible and just let it run because "computers are fast enough, right?" - as technology evolves, we crunch more and more data, and as "big data" becomes a bigger field, we need to work with this data efficiently - because it doesn't matter how powerful a machine is, processing large quantities of data with an inefficient algorithm can still take a long time.
And some computers today are slow. Not usually the computers that the average person is using (Even though it might feel that way if they're running Windows). Some computers that handle systems are pretty basic, or they're small, and don't have the luxury of having great specs. Fitness trackers, dedicated GPS devices, a thermostat, etc. For systems like these, processing efficiency is important.
Let's say our algorithm's time to execute increases quadratically.
That means, as we increase the amount of items to process (n
), the time to process that data goes up quadratically.
For processing 1 piece of data, it takes 1 unit of time (we aren't focused so much on the actual unit of time, since each machine runs at different speeds) - that's fine. Processing 2 pieces of data? 4 units of time. 8 pieces of data? 64 units of time. We don't just double the amount of time it takes when we double the data - we square it.
How much data do you think is generated every day on a social media website that has millions of users? Now imagine the processing time for an algorithm with a quadratic increase…
From a design standpoint we also need to know what the efficiency of different algorithms are (such as the algorithm to find data in a Linked List or the algorithm to resize a dynamic array) in order to make design decisions on what is the best option for our software. |
|
As an example, you shouldn't use a recursive algorithm to generate a Fibonacci sequence (Each number F[n]
in the sequence is
equal to the sum of the two previous items: F[n] = F[n-1] + F[n-2]
). It's just not as efficient! Let's look at generating the string of numbers:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55
The most efficient way to do this would be with an iterative approach using a loop. However, we could also figure it out with a recursive approach, though the recursive approach is much less efficient.
Fibonacci sequence, iterative:
int GetFib_Iterative(int n) { if (n == 0 || n == 1) { return 1; } int n_prev = 1; int n_prevprev = 1; int result = 0; for (int i = 2; i <= n; ++i) { result = n_prev + n_prevprev; n_prevprev = n_prev; n_prev = result; } return result; }
This algorithm has a simple loop that iterates n
times to find a given number of the Fibonacci sequence.
The amount of time the loop iterates based on n
is:
n | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | … | 20 |
iterations | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | … | 19 |
Fibonacci sequence, recursive:
int GetFib_Recursive(int n) { if (n == 0 || n == 1) { return 1; } return GetFib_Recursive(n - 1) + GetFib_Recursive(n - 2); }
The way this version is written, any time we call this function with any value n
,
it has to compute the Fibonacci number for Fib(n-1)
, Fib(n-2)
, Fib(n-3)
… Fib(1)
, Fib(0)
… twice.
This produces duplicate work because it effectively doesn't "remember" what Fib(3)
was after it computes it,
so for Fib(n)
it has to recompute Fib(n-1)
and Fib(n-2)
each time…
n | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | … | 20 |
iterations | 2 | 4 | 7 | 12 | 20 | 33 | 54 | 88 | … | 10,945 |
Red solid line = exponential growth;
Blue dotted line = linear growth
int GetFib_Iterative(int n) { if (n == 0 || n == 1) { return 1; } int n_prev = 1; int n_prevprev = 1; int result = 0; for (int i = 2; i <= n; ++i) { result = n_prev + n_prevprev; n_prevprev = n_prev; n_prev = result; } return result; }
The growth rate of the iterative version ends up being linear: As n
rises linearly, the amount of iterations also goes up linearly.
int GetFib_Recursive(int n) { if (n == 0 || n == 1) { return 1; } return GetFib_Recursive(n - 1) + GetFib_Recursive(n - 2); }
The growth rate of the recursive function is exponential: As n
rises linearly, the amount of iterations goes up exponentially.
Y scale from 0 to 25 (the 0th, to 25th Fibonacci number to generate). |
Y scale from 0 to 11,000 (the 11,000th Fibonacci number to generate). |
For small amounts this might not be too bad. However, if we were generating a Fibonacci number further in the list, it would continue getting even slower…
If we are trying to generate the 11,000th item in the sequence, the iterative approach
requires 11,000 iterations in a loop. That linear increase is still
much faster than each Fib(n)
calling Fib(n-1)
and Fib(n-2)
recursively.
For the most part, we don't care much about the exact amount
of times a loop runs. After all, while the program is running,
n
could be changing depending on user input, or how much data is stored,
or other scenarios. Instead, we are more interested in looking
at the big picture: the generalized growth rate of the algorithm.
We use something called "Big-O notation" to indicate the growth rate of an algorithm. Some of the rates we care about are:
O(1) | O(log(n)) | O(n) | O(n^2) | O(n^3) | O(2^n) |
Constant | Logarithmic | Linear | Quadratic | Cubic | Exponential |
int F(int x) { return 3 * x + 2; }
We think of any single command as being constant time.
The operation |
int Search(int l, int r, int search, int arr[]) { while ( l <= r ) { int m = l + (r-l) / 2; if ( arr[m] == search ) { return m; } else if ( arr[m] < search ) { l = m+1; } else if ( arr[m] > search ) { r = m-1; } } return -1; } Having an algorithm that halves its range each iteration of the loop will result in a logarithmic growth rate. |
int Sum( int n ) { int sum = 0; for (int i=1; i<=n; i++) { sum += n; } return sum; } Having a single loop that iterates over a range will be a linear time increase. |
void TimesTables(int n) { for (int y=1; y<=n; y++) { for (int x=1; x<=n; x++) { cout << x << "*" << y << "=" << x*y << "\t"; } cout << endl; } }
Quadratic time comes into play when you have
one loop iterating |
void CubicThing(int n) { for (int z=1; z<=n; z++) { for (int y=1; y<=n; y++) { for (int x=1; x<=n; x++) { cout << x << " " << y << " " << z << endl; } } } }
Just like nesting two loops iterating |
int Fib(int n) { if (n == 0 || n == 1) return 1; return Exponential_Fib(n-2) + Exponential_Fib(n-1); }
With an exponential function, each step increases the complexity
of the operation. The Fibonacci example is a good illustration:
Figuring out |