CSC/MATH 870

Computational Discrete Geometry

Spring 2007

Student Research Projects

 Anastasia Chavez & Chris O'Neill Ehrhart quasipolynomials I, II & III Brendan Colloran Proximity graphs I, II & III Philipp Richter Alpha shapes I & II Ido Heskia Image segmentation I & II Arash Farahmand Frobenius problem Connie Phong Reconstructing time series I, II & III Tim Lee Protein sequencing I, II & III Yelena Gartsman Hausdorff distance I, II & III

Schedule of final presentations

May 3: Tim, Brendan, Nacha, and Chris
May 10: Philipp, Yelena, Connie, and Ido

Here are some instructions and suggestions for the final presentations.

Suggested Projects

• Efficient K-Nearest Neighbors

The K-Nearest Neighbor technique (K-NN) is a powerful tool in pattern analysis and non-parametric classification.

General Goal: This project will study the algorithms used in K-NN classification and focus on specific approaches to reduce the complexity of this technique without impacting on the quality of results.

Specific Goals:

1. Study and implement algorithms for efficient NN-classification.
2. Provide an analysis of the complexity and correctness of the selected implementation.
3. Conduct experiments to analyze the technique for complexity and correctness both for data sizes and for number of classes.
4. Can these ideas be extended to design an efficient K-means algorithm?

Next Step: Read the paper "Proximity Graphs for Nearest-Neighbor Decision Rules", by G. Toussaint, Section 1-3.

• Proximity Graphs

Proximity graphs describe neighborhood relationships between entities, for example, points on an Euclidean plane.

General goal: Study the following graphs: Nearest Neighbor Graph, Minimum Spanning Tree, Relative Neighborhood Graph, Gabriel Graph, and Delaunay Triangulation. Explore the relationship that exists between these graphs.

Specific goals:

1. Study and understand efficient algorithms for creating NNG, MST, RNG, GG, and DT. Code an efficient implementation of each of these algorithms.
2. Prove the relationship that exists between these graphs. For 10 examples of random point distributions in the plane, confirm the relationship.
3. For randomly generated graphs of increasing size (complexity), verify the performance of your implementation.
4. Design an interface (e.g. an applet) to specify arbitrary point distributions and output an appropriate proximity graph.
5. Implement the Principal curve algorithm. Investigate the proximity relationships induced by it and relate them to the NNG, MST, RNG, GG, DT, and shape skeletons.

Next steps:

1. An excellent project report is available here.
2. Refer to "Relative Neighborhood Graphs and their Relatives", J. Jaromczyk and G. Toussaint, Proc. of IEEE, 1992, and references therein.
3. For Principal curves see: T. Hastie and W. Stuetzle, "Principal Curves", J. of Am. Stat. Assoc., Vol 84, pp. 502-516. >li> Also see: R. Singh, V. Cherkassky, and N. P. Papanikolopoulos, "Self-Organizing Maps for the Skeletonization of Sparse Shapes", IEEE Transactions on Neural Networks, Vol. 11, No. 1, pp. 241-248, 2000.

• Geometric Approaches for Reconstructing Time-Series Data

Many processes, such as a biological process, generates time-sequenced data. However, due to a variety of factors such as sampling-rate, synchronization etc., establishing the time-series for an unordered or poorly ordered series of data points is complex. This project seeks to study the role of geometric structures to solve this problem.

Specific Goals:

1. Analyze the MST-based algorithm of Magwene, Lizardi, and Kim.
2. Implement this algorithm on the data sets indicated in the above paper.
3. Using other, empirically derived data-sets, comment on the robustness of this algorithm.
4. Explore ways to improve this algorithm by using ideas from proximity-graphs and the notion of principal-curves (both in terms of quality and complexity).
5. Experimentally test the improved algorithm.

• T. Hastie and W. Stuetzle, "Principal Curves", J. of Am. Stat. Assoc., Vol 84, pp. 502-516.
• R. Singh, V. Cherkassky, and N. P. Papanikolopoulos, "Self-Organizing Maps for the Skeletonization of Sparse Shapes", IEEE Transactions on Neural Networks, Vol. 11, No. 1, pp. 241-248, 2000.
• B. Kegl et.al., "Learning and Design of Principal Curves", IEEE Trans. on PAMI, Vol 22, 2000.

• Alpha Shapes and Molecular Representations

Different methods have been used to study the shapes of small molecules and proteins. Alpha-shapes are a generalized shape descriptor which seems to show potential in this context.

Specific Goals:

1. Study the alpha shape algorithm and implementations.
2. Apply to different point sets in 2D and 3D.
3. Explore the problem of applying these to molecular coordinates from publicly available datasets such as PDB and ChemDB.

• Investigating the Hausdorff Distance

Felix Hausdorff (1868?-1942) devised a metric function between subsets of a metric space. By definition, the Hausdorff distance is the maximum distance of a set to the nearest point in the other set. Typically, when we talk of distances, we mean the "smallest" distance. For example, if a point X is at distance d from some polygon P, we mean that the distance of X from the nearest point in P is d. This idea can be unsatisfactory, however.

Specific Goals:

1. Study the Hausdorff distance.
2. Compute the Hausdorff distance between point sets. What is the effect of outliers and noise on the distance? Compare with other distances.
3. Compute the Hausdorff distance between polygons.
4. Application of Hausdorff distance: on image data sets or data from other application domains.

• Characteristic polynomials of hyperplane arrangements

A hyperplane arrangement is simply a finite set of hyperplanes in Rd. (A hyperplane is a set of the form {x in Rd: a.x = b}, for some vector a and some scalar b.) The characteristic polynomial of a hyperplane arrangement encodes the intersection properties of the arrangement, i.e., all possible interesections, ordered by set inclusions. Specific Goals:

1. Study hyperplane arrangements and compute a few simple examples by hand.
2. Devise an efficient algorithm that computes a characteristic polynomial of a given hyperplane arrangement.
3. Experimentally test your algorithm, using some classic combinatorial problems.

• Periods of degree-2 Ehrhart quasipolynomials

Given a rational polygon P (i.e., the vertices of P have rational coordinates), the Ehrhart quasipolynomial of P is the function E(t) that counts integer points in dilates of P, i.e., if we dilate P by a factor t, then E(t) records the number of integer points in tP. This function is a degree-2 quasipolynomial, namely, E(t) is of the form c2t2 + c1(t) t + c0, where c1(t) and c0(t) are periodic functions in t. Suppose c1(t) has period p1 and c0(t) has period p0. Which pairs (p1, p0) can occur? Specific Goals:

1. Study the paper "The Minimum Period of the Ehrhart Quasi-polynomial of a Rational Polytope" by Tyrrell McAllister and Kevin Woods, and basic Ehrhart theory (see, e.g., Chapter 2 & 3 of "Computing the Continuous Discretely", by M. Beck and S. Robins).
2. Familiarize yourself with the software LattE.
3. Implement an algorithm that computes many examples of degree-2 Ehrhart quasipolynomials.

• Transportation polytopes

A transportation polytope consists of all nonnegative matrices with a fixed margin, that is all matrices with a fixed prescribed sum for each row and column. These polytopes and their higher-dimensional analogs (multi-way contingency tables) have important applications to statistics, for example, disclosure limitation procedures. Many of these applications involve integer tables, that is, matrices with integer entries. In this case, we can study the associated counting function, with the margins as parameters. This function turns out to be a piecewise-defined polynomial, which is hard to compute.

Specific Goals:

1. Study the algorithm proposed in "The partial-fractions method for counting solutions to integral linear systems" by M. Beck.
2. Implement the partial-fraction algorithm for the case of transportation polytopes.
3. Experimentally test your algorithm using small transportation polytopes.

Next Step: Read Section 5 of "The Ehrhart polynomial of the Birkhoff polytope", by M. Beck & D. Pixton, for a definition of transportation polytopes and their associated counting functions.