Carnegie Mellon University

Summer Undergraduate Applied Mathematics Institute

May 28 - July 16, 2013

Projects

► Introduction to Subword Pattern Avoidance, Munira Shahir

Advisor:  Irina Gheorghiciuc
Abstract: The student defined correlation polynomials and autocorrelation polynomials of words. She used them to find two generating functions that count words avoiding given patterns. She then used one of these generating functions to find the expected time of occurrence of a given word in a probabilistic experiment.

► Building Words from Exactly the Right Set of Subwords, Carson Sestili

Advisor:  Irina Gheorghiciuc
Abstract: The student worked on a problem that was both a containment and avoidance problem. He had to find the number of words of any given lengths than avoided a given pattern and was built from other patterns. This problem was completely solved using a result of Guibas and Odlysko. Then he worked on the distribution of words with the same subword complexity. He found the generating function of this distribution using two different methods. The first method was the result of Guibas and Odlysko combined with the inclusion-exclusion method. The second method used de Bruijn graphs and its subgraphs.

► Asymptotic Self-Similar 2D Coarsening Models, Daniel Chupin, Zachary Greenberg, Jourdain Lamperski, Alexander Voet

Advisor:  Nicholas Leger
Abstract:  The goal of this project was to investigate the existence of self-similar scaling limits for random coarsening and redistribution processes. More specifically, the group considered a coarsening process induced by the removal and repositioning of nodes in a two-dimensional, periodic Voronoi diagram. At each step in the process, a random pair of adjacent nodes are removed and replaced with a single node located at the midpoint of the removed pair. The Voronoi diagram is then reconstructed for the new set of the nodes and the process is repeated. In effect, adjacent cells merge, but then shed some fraction of their total area, which is redistributed to the surrounding cells.

The first part of the project was to conduct numerical simulations of the process using MATLAB. The simulations suggested that the distribution of cell sizes in the Voronoi diagram approached a self-similar form as the number of steps in the process became very large. The second part of the project was to develop a mean-field model for the process which exhibited scaling limits with the same qualitative properties as the Voronoi process. The idea was to model a simplified process without the geometric constraints of a Voronoi diagram, but with similar redistribution properties. For the new model, we established, analytically, the existence of self-similar solutions which deviate only slightly from the self-similar form observed in the numerical Voronoi computations. A new and interesting feature of the analysis is the derivation of mean-field equations which have a structure connected with delay differential equations.

► Using Regression Models to Predict Redshifts, Cathleen Gillette, Cary Morgan, Elizabeth Spencer

Advisor:  Chad Schafer
Abstract: In this project, we developed and deployed statistical regression models for predicting the redshift of a galaxy using only five images of that galaxy, taken through a set of five filters.

Redshift estimation using such data (called the "photometric redshift estimation problem") is a critical challenge facing researchers in astronomy and cosmology. The redshift of a galaxy is a proxy for distance to that galaxy, and also a proxy for the time at which that galaxy is being observed. By studying the changes of galaxies with redshift, one is studying the evolution of the Universe with time. It is clear that this can only occur with sufficiently accurate redshift estimates.

We worked with three well-established regression techniques: linear models, generalized additive models, and regression trees. Linear models are classic tools, and their consideration for this purpose is largely to illustrate the basic concepts; it was immediately clear that these simple models are inadequate for the complexity of the relationship between redshift and the photometric data. Generalized additive models are built on regression splines, and hence are a flexible, nonparametric approach to estimation. Regression trees arise largely from the machine learning literature, and are another nonparametric approach to regression.

A big issue in this work was the construction of relevant predictor variables. The raw data comes in the form of five images per galaxy; these images must be translated into real-valued quantities prior to their use in the models.

This required careful consideration of the most important information encoded in each image. For example, it is natural to first consider that the image intensity towards the center of the galaxy holds more important information than the edge of the image. But, the irregular shapes of galaxies complicates the construction of these predictors.

Another issue in this work is varying data quality. Looking through the thousands of galaxy images that comprise that data set, it is immediately clear that some of the images are of higher quality than others. Some of the images appear to be purely noise, with little signal from the galaxy itself. Other images appear to actually include two galaxies. It is important that low quality images be downweighted in the analysis. Hence, a significant amount of effort was put into quantifying the quality of an image, translating quality into a weight that could be attached to each galaxy.

A final theme that was stressed throughout the work was the risk of overfitting regression models. It is critical that any model that is constructed perform well at predicting redshifts of galaxies that were not included in the fitting of the model. This is, after all, the point of constructing the regression model. Overfitting was avoided by the use of cross-validation techniques, and separating the original data into training and test sets.

Finally, the results showed that the generalized additive models were the most effective at predicting redshifts. Predictors that were functions of higher wavelength bands seemed to be the most important to the models.

► Searching for Combinatorial Interpretations Using the Bhargava Defined Factorial, Joseph Gault, Samantha Davies

Advisor:  Gregory Johnson
Abstract: The goal of this project was to understand Bhargava's generalization of the standard factorial function and to investigate the corresponding generalization of the binomial coefficient. Bhargava showed that the binomial coefficient was integral using any Bhargava factorial, but there is no known significance of the binomial coefficient itself. We focused on the case where the factorial is defined over a geometric progression with initial term 1 and common ratio a power of a prime and showed that n!, in this case, was equal to the order of $GL_n(F_q)$ (where $q$ is the common ratio). They then proceeded to investigate Pascal's triangle of the generalized binomial coefficients and discovered (and proved) several interesting properties about the binomial coefficients.

► Generalization of the Giuga Number and Some of its Properties for Rings of Algebraic Integers, Jamaris Burns, Katey Casey

Advisor:  Gregory Johnson
Abstract: We read through several papers on Giuga's 1950 primality conjecture and several of the major theorems related to it. Their project was to investigate Giuga's conjecture in a more general setting by generalizing the conjecture to rings of integers of arbitrary number fields. In this broader setting, we defined notions of weak and strong Giuga ideals and proved several theorems analogous to the major theorems on Giuga's conjecture in the integers.

Rose-Hulman Undergraduate Mathematics Journal: Vol. 18 : Iss. 1 , Article 5.

Students

  • Brennan, Catherine, University of Hartford
  • Burns, Jamaris, Johnson C. Smith University
  • Casey, Katie, Cornell University
  • Chupin, Daniel, University of Texas, Austin
  • Davies, Samantha, Carnegie Mellon University
  • Gault, Joseph, Duquesne University
  • Gillette, Cathleen, University of Pittsburgh
  • Greenberg, Zachary, Carnegie Mellon University
  • Lamperski, Jourdain, University of Pittsburgh
  • Morgan, Carey, Allegheny College
  • Sestili, Carson, Carnegie Mellon University
  • Shahir, Munira, University of Maryland, Baltimore
  • Spencer, Elizabeth, University of Maryland, College Park
  • Voet, Alexander, University of California, Berkeley

Faculty

Tim Flaherty

Tim Flaherty, Associate Teaching Professor
Course: Fractals and Wavelets

E-mail:  tim@andrew.cmu.edu
Wean Hall 8216
412-268-8171

Handron

Dave Handron, Associate Teaching Professor
Course: Symbolic Programming in Mathematics

E-mail:  handron@andrew.cmu.edu
Wean Hall 6214
412-268-5583

Gheorghiciuc

Irina Gheorghiciuc, Assistant Teaching Professor

Project Director

E-mail:  gheorghi@andrew.cmu.edu
Wean Hall 8126
412-268-3023

Gregory Johnson

Gregory Johnson, Assistant Teaching Professor

Project Director

E-mail:  greggo@andrew.cmu.edu
Wean Hall 8122
412-268-1504

Nicholas Leger

Nicholas Leger, Postdoctoral Associate

Project Director

E-mail:  nleger@andrew.cmu.edu
Wean Hall 7124
412-268-4732

Chad Schafer

Chad Schafer, Associate Professor

Project Director

E-mail:  cschafer@cmu.edu
Baker Hall 228D
412-268-3967