Computational Discrete Geometry

**
**
Spring 2007

Student Research Projects

Anastasia Chavez & Chris O'Neill | Ehrhart quasipolynomials I, II & III |

Brendan Colloran | Proximity graphs I, II & III |

Philipp Richter | Alpha shapes I & II |

Ido Heskia | Image segmentation I & II |

Arash Farahmand | Frobenius problem |

Connie Phong | Reconstructing time series I, II & III |

Tim Lee | Protein sequencing I, II & III |

Yelena Gartsman | Hausdorff distance I, II & III |

**Schedule of final presentations**

May 3: Tim, Brendan, Nacha, and Chris

May 10: Philipp, Yelena, Connie, and Ido

Here are some instructions and suggestions for the final presentations.

**Suggested Projects**

**Efficient K-Nearest Neighbors**The

*K-Nearest Neighbor technique*(K-NN) is a powerful tool in pattern analysis and non-parametric classification.General Goal: This project will study the algorithms used in K-NN classification and focus on specific approaches to reduce the complexity of this technique without impacting on the quality of results.

Specific Goals:

- Study and implement algorithms for efficient NN-classification.
- Provide an analysis of the complexity and correctness of the selected implementation.
- Conduct experiments to analyze the technique for complexity and correctness both for data sizes and for number of classes.
- Can these ideas be extended to design an efficient K-means algorithm?

Next Step: Read the paper "Proximity Graphs for Nearest-Neighbor Decision Rules", by G. Toussaint, Section 1-3.

**Proximity Graphs***Proximity graphs*describe neighborhood relationships between entities, for example, points on an Euclidean plane.General goal: Study the following graphs: Nearest Neighbor Graph, Minimum Spanning Tree, Relative Neighborhood Graph, Gabriel Graph, and Delaunay Triangulation. Explore the relationship that exists between these graphs.

Specific goals:

- Study and understand efficient algorithms for creating NNG, MST, RNG, GG, and DT. Code an efficient implementation of each of these algorithms.
- Prove the relationship that exists between these graphs. For 10 examples of random point distributions in the plane, confirm the relationship.
- For randomly generated graphs of increasing size (complexity), verify the performance of your implementation.
- Design an interface (e.g. an applet) to specify arbitrary point distributions and output an appropriate proximity graph.
- Implement the Principal curve algorithm. Investigate the proximity relationships induced by it and relate them to the NNG, MST, RNG, GG, DT, and shape skeletons.

Next steps:

- An excellent project report is available here.
- Refer to "Relative Neighborhood Graphs and their Relatives", J. Jaromczyk and G. Toussaint, Proc. of IEEE, 1992, and references therein.
- For Principal curves see: T. Hastie and W. Stuetzle, "Principal Curves", J. of Am. Stat. Assoc., Vol 84, pp. 502-516. >li> Also see: R. Singh, V. Cherkassky, and N. P. Papanikolopoulos, "Self-Organizing Maps for the Skeletonization of Sparse Shapes", IEEE Transactions on Neural Networks, Vol. 11, No. 1, pp. 241-248, 2000.

**Geometric Approaches for Reconstructing Time-Series Data**Many processes, such as a biological process, generates time-sequenced data. However, due to a variety of factors such as sampling-rate, synchronization etc., establishing the time-series for an unordered or poorly ordered series of data points is complex. This project seeks to study the role of geometric structures to solve this problem.

Specific Goals:

- Analyze the MST-based algorithm of Magwene, Lizardi, and Kim.
- Implement this algorithm on the data sets indicated in the above paper.
- Using other, empirically derived data-sets, comment on the robustness of this algorithm.
- Explore ways to improve this algorithm by using ideas from proximity-graphs and the notion of principal-curves (both in terms of quality and complexity).
- Experimentally test the improved algorithm.

Further Reading:

- T. Hastie and W. Stuetzle, "Principal Curves", J. of Am. Stat. Assoc., Vol 84, pp. 502-516.
- R. Singh, V. Cherkassky, and N. P. Papanikolopoulos, "Self-Organizing Maps for the Skeletonization of Sparse Shapes", IEEE Transactions on Neural Networks, Vol. 11, No. 1, pp. 241-248, 2000.
- B. Kegl et.al., "Learning and Design of Principal Curves", IEEE Trans. on PAMI, Vol 22, 2000.

**Alpha Shapes and Molecular Representations**Different methods have been used to study the shapes of small molecules and proteins. Alpha-shapes are a generalized shape descriptor which seems to show potential in this context.

Specific Goals:

- Study the alpha shape algorithm and implementations.
- Apply to different point sets in 2D and 3D.
- Explore the problem of applying these to molecular coordinates from publicly available datasets such as PDB and ChemDB.

Further Reading:

- Connolly, Introduction to molecular surfaces, Kavraki, Talks about alpha shapes.
- Introduction to alpha shapes (intro + software + links to other papers), also see this web site.
- A nice set of slides have been made available by Celkik Marjan.

**Investigating the Hausdorff Distance**Felix Hausdorff (1868?-1942) devised a metric function between subsets of a metric space. By definition, the Hausdorff distance is the maximum distance of a set to the nearest point in the other set. Typically, when we talk of distances, we mean the "smallest" distance. For example, if a point X is at distance d from some polygon P, we mean that the distance of X from the nearest point in P is d. This idea can be unsatisfactory, however.

Specific Goals:

- Study the Hausdorff distance.
- Compute the Hausdorff distance between point sets. What is the effect of outliers and noise on the distance? Compare with other distances.
- Compute the Hausdorff distance between polygons.
- Application of Hausdorff distance: on image data sets or data from other application domains.

Further Reading:

**Characteristic polynomials of hyperplane arrangements**A

*hyperplane arrangement*is simply a finite set of hyperplanes in R^{d}. (A hyperplane is a set of the form {x in R^{d}: a^{.}x = b}, for some vector a and some scalar b.) The*characteristic polynomial*of a hyperplane arrangement encodes the intersection properties of the arrangement, i.e., all possible interesections, ordered by set inclusions. Specific Goals:- Study hyperplane arrangements and compute a few simple examples by hand.
- Devise an efficient algorithm that computes a characteristic polynomial of a given hyperplane arrangement.
- Experimentally test your algorithm, using some classic combinatorial problems.

**Periods of degree-2 Ehrhart quasipolynomials**Given a rational polygon P (i.e., the vertices of P have rational coordinates), the

*Ehrhart quasipolynomial*of P is the function E(t) that counts integer points in dilates of P, i.e., if we dilate P by a factor t, then E(t) records the number of integer points in tP. This function is a degree-2*quasipolynomial*, namely, E(t) is of the form c_{2}t^{2}+ c_{1}(t) t + c_{0}, where c_{1}(t) and c_{0}(t) are periodic functions in t. Suppose c_{1}(t) has period p_{1}and c_{0}(t) has period p_{0}. Which pairs (p_{1}, p_{0}) can occur? Specific Goals:- Study the paper "The Minimum Period of the Ehrhart Quasi-polynomial of a Rational Polytope" by Tyrrell McAllister and Kevin Woods, and basic Ehrhart theory (see, e.g., Chapter 2 & 3 of "Computing the Continuous Discretely", by M. Beck and S. Robins).
- Familiarize yourself with the software LattE.
- Implement an algorithm that computes many examples of degree-2 Ehrhart quasipolynomials.

**Transportation polytopes**A

*transportation polytope*consists of all nonnegative matrices with a fixed*margin*, that is all matrices with a fixed prescribed sum for each row and column. These polytopes and their higher-dimensional analogs (multi-way contingency tables) have important applications to statistics, for example, disclosure limitation procedures. Many of these applications involve*integer tables*, that is, matrices with integer entries. In this case, we can study the associated counting function, with the margins as parameters. This function turns out to be a piecewise-defined polynomial, which is hard to compute.Specific Goals:

- Study the algorithm proposed in "The partial-fractions method for counting solutions to integral linear systems" by M. Beck.
- Implement the partial-fraction algorithm for the case of transportation polytopes.
- Experimentally test your algorithm using small transportation polytopes.

Next Step: Read Section 5 of "The Ehrhart polynomial of the Birkhoff polytope", by M. Beck & D. Pixton, for a definition of transportation polytopes and their associated counting functions.