vendredi 16 décembre 2016

Supermarket dataset for Apriori algorithm with result for checking

I'm working on the market basket analysis problem using the apriori algorithm.

I have some lists (which represent supermarket transactions) made like this:

[Bread, Milk]
[Bread, Milk, Cereal, Coffee]
[Bread, Cereal, Coffee]
[Milk, Cereal, Coffee]
[Cereal, Coffee]
[Bread, Coffee]

The goal is discovering which are the most frequent itemsets above an user specified threshold.

I'm doing my own implementation of apriori but I need someway to test if it works correctly.

I've found this website http://ift.tt/OcUkl7 which contains datasets for Frequent Itemset Mining.

These datasets have numbers instead of product names for privacy reason, but that's fine.

Of course I could input these data to my apriori implementation, but how can I tell if the results are correct?

Are there any datasets which also provides the most frequent items in them, so I can check if my program works fine?

Or if you can suggest me a tool which receives as input a dataset and produces as output the most frequent itemsets above an user specified threshold.

Thanks!

Aucun commentaire:

Enregistrer un commentaire