Project D9: Segmentation and coding of complex geometric objects

The major topic of this project concerns natural images and their segmentation and efficient coding for the purposes of (mainly) lossless image compression. Compression algorithms for image data storage and transmission are a central theme in multimedia industry, but lossless methods are inevitable in any cases where even least modifications could not be admitted because of

· highest quality demands

· the picture contains sensible structures (object structures, water marks, signatures, steganography)

· picture quality must be maintained after multiple copying.

This situation is characteristic for medicine, science, satellite and space applications, technology.

Compression rates are much lower than what is common in the lossy case; something like factors 2 or, sometimes, 3 can be expected. Correspondingly, even substantial improvements typically do not increase the compression rate for more than some percent.

The project is aimed at developing mathematical models for the description of natural images and their segmentation, as well as it is our intention to implement highly effective compression algorithms, which are based on these investigations. The spectrum of mathematical techniques spans from Ergodic and Information Theories to Complexity Theory and the investigation of models from classical and quantum Statistical Mechanics capable of reflecting relevant properties in natural pictures.

There is a plenty of lossless algorithms for data compression, but these are mainly pointed at linear (i.e. one-dimensional) data streams. They perform well in exploiting correlations in the linear order, while they do not work too well for images scanned line-by-line. It turns out that some of the most popular and effective linear compression schemes do not have an obvious and natural two-dimensional pendant (e.g. Lempel-Ziv, Burrows-Wheeler). So it is an important aspect in this context to search for intrinsically two-dimensional data compression algorithms.

From mathematical point of view, a picture can be modelled as two-dimensional random field which, in some idealization, can be thought to be (nearly) stationary, at least after applying de-correlating transformations which take into account local tendencies in the image, and after segmentation into homogeneous structures. Using this approach, compressibility is due to the fact that a natural image has not full entropy (compared with the uniform distribution over all possible pixel values). The mathematical background for this is the Shannon-McMillan-Breiman theorem, being valid in any dimension. In this approach, the main problem is to find good predictions of the current pixel value based on the knowledge of some previously known part of its local neighbourhood, and to find good segmentation criteria leading to maximum de-correlation and statistical homogeneity.

The figures below illustrate this approach. Starting from the well-known Lenna benchmark picture (left), the second picture represents some segmentation of the image area into small segments with similar prediction function, depending on the main orientation of the local gradient. The third figure is obtained as significantly de-correlated picture by application of the prediction transform and can be losslessly compressed by standard entropy methods. In the given case, we achieve a compression percentage of about 53 %, which is pretty good compared with standard picture packing programs.

Fig.1 Fig.2 Fig.3

It is another interesting question, whether natural images under certain circumstances can be considered as almost-zero-entropy objects. This is most probably not the case for the vast majority of cases, but might be so, if the relevant content of the picture could be specified somehow and could be separated from the ‘noise’ (this is already an approach typical for lossy compression). From a theoretical point of view, it is not at all obvious, what could be an efficient compression scheme in the zero-entropy situation. The majority of known compression algorithms show a rather bad performance in this case even if one has good bounds on the typical n-block probabilities . Furthermore it turned out that the Shannon-McMillan-Breiman theorem cannot be generalized in a straight-forward way to this case by a simple substitution of the normalising factor n²(typical for the two-dimensional positive entropy case) with some n^g , g<2. We conjecture that for a stationary field the validity of a modified Shannon-McMillan-Breiman theorem with a scale exponent g (saying, roughly speaking, that the probability of large blocks of size n in logarithmic scale behaves like -C n^g) implies that g is an integer. It is even unclear whether one can expect in the generic case of a stationary zero entropy system good bounds in the sense that the typical n-block probabilities are between -C n^g and – C’ n^g in a logarithmic scale. Nevertheless, like in fractal dimension theory one can still define a unique critical exponent g (such that the normalisation with a smaller g gives infinite and with a larger g gives zero entropy). So one could still hope to design compression algorithms which compress n-blocks at least up to a size n^g.

A second group of problems relates to non-stationary sources for which essentially no theory is available. Here one of the main tasks would be to find proper segmentation schemes for partitioning a given sample path of the source such that on each partition element the string (or field segment ) is close to a stationary process.