Academic Research

Before my industry work (2012−present), I was a graduate student and then postdoctoral researcher in astronomy from 2000−2012. This page describes my main research projects and results from that time period. This information remains relevant today because most of the work had a data science component, for example, my first project in 2000 was using neural networks to classify galaxies.

For archival interest, my research website from that time remains available.

Completed work, with publication in refereed academic journals

Galaxy Classification with Artificial Neural Networks (2000−2004)

  • First study to provide detailed morphological classifications of a large sample of galaxies in a digital sky survey
  • Artificial neural networks were used to provide Hubble morphological types for 26,536 galaxies, to the same accuracy as human experts
  • Provided spectral types and photometric redshifts within the same framework
  • Used a more advanced training algorithm, Levenberg−Marquardt, than previously used in astronomy
  • Increase in size of more than 10x over existing classified samples

Bivariate Luminosity Functions in the Sloan Digital Sky Survey (SDSS) (2001−2006)

  • The luminosity function is the number density of galaxies in space as a function of their brightness
  • First study of the SDSS bivariate galaxy luminosity function with an extensive set of galaxy properties including morphological type
  • A wealth of new detail seen, including clear variation in the luminosity functions according to absolute magnitude (intrinsic brightness) and various second parameters
  • Consistent with the now standard bimodal galaxy population: early−type/bright/concentrated/red/quiescent vs. late−type/dim/diffuse/blue/star-forming
  • Bimodal distribution not well fit by a single underlying function (Schechter−Gaussian, aka. Choloniewski function)

Galaxy Colour, Morphology and Environment in the SDSS (2001−2004)

  • First study of the galaxy morphology−density relation using detailed morphology and comparison to the color−density relation
  • When split by density and luminosity the color, galaxy Sersic brightness profile, and concentration index are well described by a sum of 2 Gaussians, but the Hubble type is much less clear
  • When morphology is removed there is a residual color−density relation, but not vice−versa
  • This implies that either the morphology is simply a byproduct of color, or that a single galaxy “type” plus density constitutes insufficient information

Robust Machine Learning Applied to Astronomical Datasets I-III (2005−2009)

I: Star-Galaxy Separation in the SDSS Data Release 3
  • First application of machine learning to an astronomical dataset of order 100,000,000 objects, the SDSS
  • Improved the SDSS star−galaxy separation to a probabilistic measure that enables selection of cosmological datasets according to the desired precision and recall
  • Assigned each object within a framework of star, galaxy, or neither−star−nor−galaxy to highlight astrophysically interesting objects, e.g., quasars
  • Created a state−of−the−art sample of quasars
  • Demonstrated the efficacy of the method when blind−tested on the 2dFGRS and the 2QZ sky surveys
II: Photometric Redshifts for Quasars in the SDSS Data Release 5
  • Photometric redshifts measure object distances using just images when detailed spectroscopic information is not available
  • Spectroscopic redshifts can be used as a training set for photometric redshifts
  • Improved quasar photometric redshifts using nearest neighbors machine learning
  • Demonstrated dramatically improved redshifts for objects cross-matched to the Galaxy Evolution Explorer (GALEX) ultraviolet sky survey
  • The results are realistic for application to further surveys because they are blind-tested
III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX
  • Provided full probability density functions (PDFs) in redshift for SDSS and SDSS+GALEX galaxies and quasars
  • Generation of the PDFs takes into account the errors on the observations
  • Can virtually eliminate quasar “catastrophic failures” in distance using single−peaked PDFs, giving a sample of quasars with much better distance estimations than previously
  • Causes of bad quasar photometric redshifts are likely reddening, degeneracy, emission lines simulating other lines, and lines crossing filter bandwidth edges
  • The method uses a large number of distance calculations, for which conventional or novel supercomputing hardware may be utilized to enable a direct extension to petascale datasets

International Journal of Modern Physics D Review (2009−2010): Data Mining and Machine Learning in Astronomy

One of the world’s first invited major peer−reviewed review articles on this subject. 61 pages.

Photometric Redshift Estimation for the Next Generation Virgo Cluster Survey (NGVS) (2009−2012)

  • Coauthor in Raichoor et al. 2014, ApJ
  • Compiled spectroscopic redshift catalog for NGVS collaboration
  • K nearest neighbor−based & template−based redshifts

Completed, outside of journal publications

Geological mapping of the Northern Montana Thrust Belt (1999): Published aged 20. Conference proceedings (Moore et al. 2000)

Astroinformatics white paper for the Canadian Long Range Plan in Astronomy (2010): Explaining the vital role of astroinformatics for Canadian astronomy for 2010−20

CANFAR+Skytree (2012−3): Skytree on CANFAR: one of the world’s first cloud computing data mining systems for astronomy. Scalable to very large datasets, multiple conference posters, ADASS conference proceedings & focus demo

KDD-IG Guide (2010−2): Guide to Data Mining in Astronomy for the Knowledge Discovery in Databases Interest Group of the International Virtual Observatory. Some chapters by Sabine McConnell (Trent University)

Canadian Astronomy Data Center community project page with classifications from machine learning (2009−2012): 143 million star/galaxy/quasar probabilistic classifications are available

Contributor to Designing for Emerging Technologies: UX for Genomics, Robotics, and the Internet of Things, via Hunter Whitney, Chapter 7

Initial Work Completed

As with my industry work, a wide ranging brief in research will result in some projects that were started but did not complete. Some are listed here for interest.

  • Canadian Astronomical Data Centre (CADC) Operations (2013): Use CADC computer logs to predict required resources
  • Kuiper Belt Objects (2012): Classify candidates using supervised learning to supplement Cosmoquest project results
  • Canada−France−Hawaii Telescope Legacy Survey Luminosity Function (2009−12): Assigned photometric redshift PDFs to several million galaxies
  • NGVS Cluster Members (2009−12): Photometric redshifts, background subtraction, K−means clustering; machine learning was only method that solved membership
  • Luminosity−Redshift Degeneracy (2008−10): Break the degeneracy using cross-correlation of galaxies and active galactic nuclei; American Astronomical Society poster
  • The Luminosity and Mass Functions of Baryonic Structures in the Virgo Cluster (2009−12; planned to be Ball N.M., Ferrarese L.F., et al., 2013): Joint−led the SWG; wrote code to fit the functions
  • Missing Satellites and the Luminosity Function in the Core of the Virgo Cluster (Ferrarese L.F., et al., 2013)
  • Evolution of Galaxies from Low to High Redshift Within a Unified Framework I: The Morphology−Density−Luminosity Relation Between 0 < z < 1
  • The Real−Space Projected Clustering Signal of Quasars in the COSMOS Field (Ball N.M., Myers A.D., Brunner R.J., et al., 2011)
  • Probabilistic Classifications and Photometric Redshifts in COSMOS Using 30−band Photometry (Ball N.M., Myers A.D., Brunner R.J., et al.)
  • The Spectroscopic−Photometric Clustering of Quasars in the SDSS (Myers A.D, White M., Richards G.T., Ball N.M., et al.)