This method looks as the simplest one as it doesn't require any implicit or explicit conditional operators or arithmetic operations:
Unfortunately, simplicity of this method doesn't mean it's the fastest one. Quite the opposite - the method is the slowest one, on Linux system it's at least 40% slower than any other method posted on this page. On my Windows system this method was 8 times slower than on my Linux box. Granted, my Windows system doesn't have as much memory as Linux box - only 1 GB of memory, but that amount of RAM should be enough to handle array with 1,500,000 elements that I used for testing :)
It's not clear why the method above is very slow - it could be due to the fact that the map function needs to build an anonymous array which is twice as large as the original one, or it could be because "=" operation is not very efficient when arrays are assigned to hashes, or a combination of both. Let's eliminate all inefficient parts by replacing the "map" function with foreach loop and see if it improves things:
This code executes 40% faster under Linux or 8 times faster under Windows.
Let's use a little different approach. Rather than getting list of all keys from a hash which we used to store all array values, we are going to use the hash to flag the values we've seen already, and store all un-seen values in a separate array:
Surprisingly enough, this code performs better than the previous example. It seems that the keys function is not that fast after all. Speed gain over previous example: 12% on Linux and 55% on Windows.
Perl has very useful function grep that loops implicitly over all array elements and returns only elements that satisfy certain condition. This function looks like a good fit for our task. Let's try to use it:
Well, the code looks very simple right now, and it executes 10% faster on Linux than the previous example. There is no speed advantage of using this code on Windows.
Using grep method is so simple that it's not clear how it can be improved any further. But still, let's try completely different approach. What if we eliminate "already seen" hash completely? We can do it if we sort the array first, then loop through all array elements and store only values that differ from previous value from the same array:
This method turns out to be the fastest from all methods from this page. The speed of this method depends on how many duplicate records are in the original array. The fewer duplicates records are in the original array, the faster this method is when compared to grep method. The speed of both sort and grep methods is about the same when the original array has 6 - 8 duplicate values for each unique value. The grep method becomes faster when the original array contains more than 8 duplicate values for each unique value. Below are a few benchmarks that I did:
Back to "Perl programming tricks"
| (c) Copyright 2007 Gennadiy Shvets |