I'm working on the market basket analysis problem using the apriori algorithm.
I have some lists (which represent supermarket transactions) made like this:
[Bread, Milk]
[Bread, Milk, Cereal, Coffee]
[Bread, Cereal, Coffee]
[Milk, Cereal, Coffee]
[Cereal, Coffee]
[Bread, Coffee]
The goal is discovering which are the most frequent itemsets above an user specified threshold.
I'm doing my own implementation of apriori but I need someway to test if it works correctly.
I've found this website http://ift.tt/OcUkl7 which contains datasets for Frequent Itemset Mining.
These datasets have numbers instead of product names for privacy reason, but that's fine.
Of course I could input these data to my apriori implementation, but how can I tell if the results are correct?
Are there any datasets which also provides the most frequent items in them, so I can check if my program works fine?
Or if you can suggest me a tool which receives as input a dataset and produces as output the most frequent itemsets above an user specified threshold.
Thanks!
Aucun commentaire:
Enregistrer un commentaire