An Algorithm for Finding Common Elements Across Data Sets
Wednesday 12 July 2006 @ 8:08 am
Filed under:

I was recently working on a search algorithm for a Flash piece I am working on, and the way the algorithm worked involved doing separate searches for each keyword from an index that I had created, recording the number of occurrences for each keyword, then finding out which elements were common between each of the keywords. I came up with an algorithm to do this operation linearly, that is it only has to do one pass through the data which means that it scales well for large data sets. The gist is that if you have three sets of data, then any data element needs to appear three times in the search for it to be common across the data sets. As such, all that is needed is a tally kept of how many times each data element appears and anything that has a tally of three is part of the intersection of the data sets.

Note: this only works if there are no duplicated elements in each data set, which fortunately is not too hard a condition to impose.

Here is some code representing the algorithm:
read more…


Blog Categories

Browse by Date
July 2006
M T W T F S S
« Apr   Oct »
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Monthly Archives

Bloggers Area

XML Feeds Option