Multi-Relational Data Mining
Multi-Relational Data Mining
by Arno Jan Knobbe
Publisher: IOS Press 2006
Number of pages: 130
This thesis is concerned with Data Mining: extracting useful insights from large and detailed collections of data. With the increased possibilities in modern society for companies and institutions to gather data cheaply and efficiently, this subject has become of increasing importance. This interest has inspired a rapidly maturing research field with developments both on a theoretical, as well as on a practical level with the availability of a range of commercial tools. Unfortunately, the widespread application of this technology has been limited by an important assumption in mainstream Data Mining approaches. This assumption – all data resides, or can be made to reside, in a single table – prevents the use of these Data Mining tools in certain important domains, or requires considerable massaging and altering of the data as a pre-processing step. This limitation has spawned a relatively recent interest in richer Data Mining paradigms that do allow structured data as opposed to the traditional flat representation.
Over the last decade, we have seen the emergence of Data Mining techniques that cater to the analysis of structured data. These techniques are typically upgrades from well-known and accepted Data Mining techniques for tabular data, and focus on dealing with the richer representational setting. Within these techniques, which we will collectively refer to as Structured Data Mining techniques, we can identify a number of paradigms or “traditions”, each of which is inspired by an existing and well-known choice for representing and manipulating structured data. For example, Graph Mining deals with data stored as graphs, whereas Inductive Logic Programming builds on techniques from the logic programming field. This thesis specifically focuses on a tradition that revolves around relational database theory: Multi-Relational Data Mining (MRDM).
Building on relational database theory is an obvious choice, as most data-intensive applications of industrial scale employ a relational database for storage and retrieval. But apart from this pragmatic motivation, there are more substantial reasons for having a relational database view on Structured Data Mining. Relational database theory has a long and rich history of ideas and developments concerning the efficient storage and processing of structured data, which should be exploited in successful Multi-Relational Data Mining technology. Concepts such as data modelling and database normalisation may help to properly approach an MRDM project, and guide the effective and efficient search for interesting knowledge in the data. Recent developments in dealing with extremely large databases and managing query-intensive analytical processing will aid the application of MRDM in larger and more complex domains.
To a degree, many concepts from relational database theory have their counterparts in other traditions that have inspired other Structured Data Mining paradigms. As such, MRDM has elements that are variations of those in approaches that may have a longer history. Nevertheless, we will show that the clear choice for a relational starting point, which has been the inspiration behind many ideas in this thesis, is a fruitful one, and has produced solutions that have been overlooked in “competing” approaches.