Uniq with Hashing
Usually a simple command line can be used to extract the unique lines from a file.
cat data.txt | sort | uniq
Unfortunately the sort is required (as uniq requires a sorted input for its task) and the bigger the data set gets the more painful that sort operation becomes. I am surprised there is no option to ‘uniq’ to use hashing. But a short perl script also does the trick.
#!/usr/bin/perl
%seen = ();
while (<>) {
print $_ unless $seen{$_}++;
}
Any better suggestions?


