KNAW

Publicatie

Patterns that matter (2010) Open access

Pagina-navigatie:
Titel Patterns that matter
Auteur Leeuwen, M. van
Promotor Siebes, A.P.J.M.
Universiteit University Utrecht
Datum 2010-02-09
Trefwoord(en) pattern mining, mdl, compression, itemsets, dissimilarity, data generation, privacy, clustering, stream mining
Taal Engels
Type proefschrift
Uitgever Utrecht University
Samenvatting Pattern mining is one of the best-known concepts in Data Mining. A big problem in pattern mining is that humongous amounts of patterns can be mined even from small datasets. This makes it hard for domain experts to discover knowledge using pattern mining, for example in the field of Bioinformatics. In this thesis we address the pattern explosion using compression. We argue that the best pattern set is that set of patterns that compresses the data best. Based on an analysis from MDL (Minimum Description Length) perspective, we introduce a heuristic algorithm, called Krimp, which finds the best set of patterns. High compression ratios and good classification scores confirm that Krimp selects patterns that are very characteristic for the data. After this, we proceed with a series of well-known problems in Knowledge Discovery, which we each unravel with our compression approach. We propose a database dissimilarity measure and show how compression can be used to characterise differences between databases. We present an algorithm that generates synthetic data that is virtually indiscernible from the original data, but can also be used to preserve privacy. Changes in data streams are detected by using a Krimp compressor to check whether the data distribution has been changed or not. Finally, compression is used to identify the components of a database and to find interesting groups in a database. In each chapter, we provide an extensive experimental evaluation to show that the proposed methods perform well on a large variety of datasets. In the end, we conclude that having less, but more characteristic patterns is key to successful Knowledge Discovery and that compression is very useful in this respect. Not as goal in itself, but as means to an end: compression picks the patterns that matter.
Publicatie http://igitur-archive.library.uu.nl/dissertations/2010-0126-200131/UUindex.html
Persistent Identifier URN:NBN:NL:UI:10-1874-39597
Metadata XML
Repository Universiteit Utrecht
Universiteit Utrecht

Omhoog
Ga terug naar de inhoud
Ga terug naar de site navigatie