Simply stated,data mining refers to extracting or “mining” knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “data mining” should have been more appropriately named “knowledge mining from data”, which is unfortunately somewhat long. “Knowledge mining”, a shorter term, may not reect the emphasis on mining from large amounts of data. Nevertheless, mining is a vivid term characterizing the process that nds a smallset of precious nuggets from a great deal of raw material (Figure 1.3). Thus, such a misnomer which carries both”data” and “mining” became a popular choice. There are many other terms carrying a similar or slightly dierent meaning to data mining, such as knowledge mining from databases, knowledge extraction, data/pattern analysis, data archaeology, and data dredging .
Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases “, or KDD . Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery as a process is depicted in Figure 1.4, and consists of an iterative sequence of the following steps:
data cleaning (to remove noise or irrelevant data),
data integration (where multiple data sources may be combined)
data selection (where data relevant to the analysis task are retrieved from the database),
data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)
data mining(an essential process where intelligent methods are applied in order to extract data patterns),
pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures; Section 1.5),
and
knowledge presentation (where visualization and knowledge representation techniques are used to present
Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases “, or KDD . Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery as a process is depicted in Figure 1.4, and consists of an iterative sequence of the following steps:
data cleaning (to remove noise or irrelevant data),
data integration (where multiple data sources may be combined)
data selection (where data relevant to the analysis task are retrieved from the database),
data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)
data mining(an essential process where intelligent methods are applied in order to extract data patterns),
pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures; Section 1.5),
and
knowledge presentation (where visualization and knowledge representation techniques are used to present