The main steps of the Apriori algorithm for mining association rules.
2023. 3. 2. 17:29ㆍMachine Learning
The Apriori algorithm is a classic algorithm for mining frequent itemsets and discovering association rules in large datasets. Here are the main steps of the Apriori algorithm:
- Determine the support threshold: The support threshold is the minimum number of transactions in which an itemset must appear to be considered frequent. This value is typically set by the user.
- Generate frequent 1-itemsets: Scan the database to count the frequency of each item, and select the items that meet the support threshold as frequent 1-itemsets.
- Generate frequent k-itemsets: Use the frequent (k-1)-itemsets to generate candidate k-itemsets by joining each frequent (k-1)-itemset with itself and pruning the resulting candidates that do not meet the support threshold.
- Repeat step 3 until no more frequent itemsets can be generated: This involves generating frequent (k+1)-itemsets from frequent k-itemsets until there are no more frequent itemsets.
- Generate association rules: From the set of frequent itemsets generated in step 4, generate association rules that meet a minimum confidence threshold. An association rule is of the form X → Y, where X and Y are itemsets. The confidence of a rule X → Y is the percentage of transactions that contain both X and Y out of the transactions that contain X.
- Evaluate the rules and prune: Evaluate the generated rules based on a user-defined interestingness measure, such as lift or conviction, and prune the uninteresting or redundant rules.
The Apriori algorithm is a scalable and widely used approach for discovering association rules from large datasets. However, it can still be computationally expensive for very large datasets or datasets with high dimensionality.
'Machine Learning' 카테고리의 다른 글
| Important terms in data preprocesing (0) | 2023.03.02 |
|---|---|
| Nearest Neighbor Classifier (0) | 2023.03.02 |
| What is Numeric underflow? (0) | 2023.03.01 |
| What is the difference between scaling and normalization? (0) | 2023.03.01 |
| What is the difference between pre-processing and data mining? (0) | 2023.03.01 |