Uniq with Hashing
Usually a simple command line can be used to extract the unique lines from a file.
cat data.txt | sort | uniq
Unfortunately the sort is required (as uniq requires a sorted input for its task) and the bigger the data set gets the more painful that sort operation becomes. I am surprised there is no option to ‘uniq’ to use hashing. But a short perl script also does the trick.
#!/usr/bin/perl %seen = (); while (<>) { print $_ unless $seen{$_}++; }
Any better suggestions?