Advanced Data Mining: Clustering Lets You Find New Customers Hidden in Your Data
- by Dr. Andre Skusa, syskoplan AG, Germany
- April 15, 2007
After you master basic data mining methods such as ABC classification, you can try more advanced techniques such as clustering with Analysis Process Designer (APD). See how you can create a clustering model and view the reports that APD provides for clustering data.
Clustering is a statistical method applied to a set of objects to identify groups of similar objects. You define a distance measure to determine the similarity of the objects. Then, an algorithm compares the objects according to their distances and produces the clusters (i.e., the groups of similar objects). For example, you could use clustering to find out whether you can partition your customers into a number of distinct groups and whether these classifications influence customers’ buying behaviors. You can use the clustering results to analyze the structure in a set of objects. Then you can use the resulting criteria to predict in which group the new data belongs.
Analysis Process Designer (APD) offers advanced data mining capabilities such as clustering, decision trees, association analysis, and scoring models. These capabilities differ from simple transformations such as filters or ABC classification, which classifies data according to a simple key figure filter. Advanced data mining tools are more complex because you must first configure them before you can use them — this is called the training phase. In this phase you configure the system to recognize data, and then you keep adjusting that configuration until the system does exactly what you want. After the training phase, you can use the training results immediately to learn more about the structure of the data. These results serve as a base for predicting classifications or clusters of new and unknown data.
However, you cannot use the training results immediately as a transformation in an APD model. (A transformation is any object that connects data sources and targets — it receives data, transforms it, and outputs it again.) First, you create a trained data mining model with a closed APD process (i.e., without open branches and including at least one data source and one data target) that contains the desired data mining object as the data target.
I will briefly describe how to set up clustering to give you a general idea of how to configure this advanced data mining process. A full description of clustering — how the actual algorithm works, on which kind of data you should apply it, and how the different parameter settings influence the results — is outside the scope of this article. However, you should be able to create an advanced data mining process and explore the possibilities of the parameters and their effects using this article. For this, you should have SAP BW 3.5 or higher and SAP CRM 4.0 or higher. You should also have the CRM business partners extracted into BW. Typically, both the BW and CRM teams are involved in this process. If you’d like to read more about the process, refer to the sidebar “Additional Resources.”
Would you like to see this full item?