RSS Feed

Posts

Three statistical tests for average spacing among numbers

Feb 28, 2024

The problem I’m interested in today is whether a set of values is distributed such that there is some regularity in their spacing, and how to identify that average spacing. This may be an incomplete set of measurements belonging to an evenly spaced pattern, in the way that 16, 17, 19, 23, 24, 28 belong to a set of numbers evenly spaced by 1. The values may not be strictly evenly spaced, and they may deviate from an even average spacing. The set of numbers could contain a mix of values, only some of which follow an even spacing.

Read More »
Confidence intervals for 2D Gaussian mixture models with contours

Sep 21, 2022

You’re likely familiar with the 68–95–99.7 rule that gives the percentage of a Gaussian distribution contained within 1-2-3 standard deviations. It’s more of a mnemonic for remembering these useful values, which are often used in rule-of-thumb significance estimation. Gaussians come up all the time in practice, often as approximations to probability distributions. In significance testing, one often wants to know how likely it is for some random value to have been drawn at its distance out into the exponential tail of a Gaussian distribution. This characteristic of a distribution is referred to as the “confidence interval” or the “credible interval,” depending on philosophy. See this post from Jake VanderPlas for a discussion of the different interpretations. I won’t be particularly careful about my language here.

Read More »
What's the expected average value of a noisy amplitude spectrum?

May 18, 2021

I find myself working out the relationship between the noise in time series data to the noise in the periodogram represented as an amplitude spectrum (Fourier transform) occasionally, so I’m writing it down somewhere I won’t lose it. I agree with the statement from “Asteroseismic Data Analysis: Foundations and Techniques” by Sarbani Basu and Bill Chaplin (Section 5.1.4)

Read More »
High-fidelity plot image representations for machine learning classification

Apr 9, 2021

As computers become faster and data sets grow larger, machine learning is playing an ever increasing role in astronomy research. One application that I find very compelling is in the field of stellar pulsations, where Marc Hon has led advancements in “Detecting Solar-like Oscillations in Red Giants with Deep Learning” based on image representations of their data; specifically, their classifier can determine whether (and where) solar-like oscillations are present in a power spectrum plot. Here I detail one contribution I’ve made to this work that might be helpful for other image-based classification efforts: I present code that can turn a line plot into a 2D numpy array representation that is fast and aims to preserve fidelity of the original data. This aspect of machine learning is called “feature engineering” and is concerned with providing the most informative data that a classifier can use to base decisions on. This is the area where domain knowledge really gives the astronomer an advantage over the generic data scientist.

Read More »
Fitting features in a power spectrum with least squares and MLE

May 4, 2020

I have come to confess my statistical sin. I have used least squares to fit features in the power spectra of variable stars. In particular, I believe I was the first to use least squares to fit Lorentzians to the signals from incoherent pulsation modes in the power spectrum of a pulsating white dwarf star observed by the Kepler spacecraft. This has rightfully alarmed some colleagues, since least squares intrinsically assumes that measurement noise is Gaussian (normally) distributed, which is emphatically not true of noise in a power spectrum. I explore here what effect this might have on the best-fit parameters returned by this technically invalid method. I also demonstrate how to treat the statistics more appropriately with maximum likelihood estimation (MLE).

Read More »

PREV 1 of 3 NEXT

Keaton Bell

Astronomy, statistics, programming in Python, data visualization, data science, and other great icebreakers.

GitHub Email

Newest Posts

Posts

Three statistical tests for average spacing among numbers

Confidence intervals for 2D Gaussian mixture models with contours

What's the expected average value of a noisy amplitude spectrum?

High-fidelity plot image representations for machine learning classification

Fitting features in a power spectrum with least squares and MLE