RSS Feed

Posts

High-fidelity plot image representations for machine learning classification

Apr 9, 2021

As computers become faster and data sets grow larger, machine learning is playing an ever increasing role in astronomy research. One application that I find very compelling is in the field of stellar pulsations, where Marc Hon has led advancements in “Detecting Solar-like Oscillations in Red Giants with Deep Learning” based on image representations of their data; specifically, their classifier can determine whether (and where) solar-like oscillations are present in a power spectrum plot. Here I detail one contribution I’ve made to this work that might be helpful for other image-based classification efforts: I present code that can turn a line plot into a 2D numpy array representation that is fast and aims to preserve fidelity of the original data. This aspect of machine learning is called “feature engineering” and is concerned with providing the most informative data that a classifier can use to base decisions on. This is the area where domain knowledge really gives the astronomer an advantage over the generic data scientist.

Read More »
Fitting features in a power spectrum with least squares and MLE

May 4, 2020

I have come to confess my statistical sin. I have used least squares to fit features in the power spectra of variable stars. In particular, I believe I was the first to use least squares to fit Lorentzians to the signals from incoherent pulsation modes in the power spectrum of a pulsating white dwarf star observed by the Kepler spacecraft. This has rightfully alarmed some colleagues, since least squares intrinsically assumes that measurement noise is Gaussian (normally) distributed, which is emphatically not true of noise in a power spectrum. I explore here what effect this might have on the best-fit parameters returned by this technically invalid method. I also demonstrate how to treat the statistics more appropriately with maximum likelihood estimation (MLE).

Read More »
Fitting a sine wave with a grid search, or how to get probability density from goodness of fit

Apr 25, 2019

This is a stupid way to fit a sinusoid to data.

Read More »
Strategically timing observations to avoid frequency aliases

Jan 27, 2019

In time domain research, it is well understood that the pattern of observations in time manifests itself in the frequency domain as a set of aliases that can confuse your identification of intrinsic frequencies. Conversely, if you want to avoid confusion in your frequency measurements, you should time your observations strategically.

Read More »
Flagging multiple conditions in Matplotlib scatter plots

Aug 2, 2018

Always seeking more complicated plots (i.e., to encode more information into simple, interpretable visual representations), I have written a script to make scatter plots that indicate the settings of multiple point-by-point binary flags: scatterflags.py. While I’m somewhat dubious about including such figures in refereed publications, I find them very helpful for data exploration (especially in interactive matplotlib windows that I base off the data_browser example).

Read More »

PREV 2 of 4 NEXT

Keaton Bell

Astronomy, statistics, programming in Python, data visualization, data science, and other great icebreakers.

GitHub Email Research

Newest Posts

Posts

High-fidelity plot image representations for machine learning classification

Fitting features in a power spectrum with least squares and MLE

Fitting a sine wave with a grid search, or how to get probability density from goodness of fit

Strategically timing observations to avoid frequency aliases

Flagging multiple conditions in Matplotlib scatter plots