Basic Statistics (Part 4)#

# Let's start with importing our packages
import numpy as np
import scipy
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# We can beautify our plots by changing the matplotlib settings a little
plt.rcParams['font.size'] = 18
matplotlib.rcParams['axes.linewidth'] = 2
matplotlib.rcParams['font.family'] = "serif"

Fitting astronomical data#

Earlier, we looked at data from the Gaia space telescope. Gaia carefully observes the positions and properties of stars in our galaxy. The code below loads in the Gaia data for all stars within 15 parsec of the Sun.

# Let's load in the data
import os
from google.colab import drive
drive.mount('/content/drive/')
os.chdir('/content/drive/Shareddrives/AST207/data')

# path to the table
path = 'gaia_15pc.csv'

# loading the table; this function assumes a 'comma separated file (aka csv)'
gaia = pd.read_csv(path)

# simple errors
gaia.loc[:,'color']     = gaia.gmag - gaia.rmag
gaia.loc[:,'color_err'] = 0.04
gaia.loc[:,'teff_err']  = 10. # K

Exercise 1

In a previous activity, we searched for which properties of a star are correlated (like how football players heights and weights are correlated). Let’s revisit that activity here. Below, plot teff versus color for the Gaia data (teff is the temperature of the star and color is the difference between the stars’ g and r magnitudes). Be sure to label your plot.

# Your plot here

Cool! We have a clear observational result: color and temperature of a star are strongly correlated.

However, we still don’t know how color and temperature are related. Is the relation linear, quadratic, or perhaps much more complex? Answering this question will greatly increase our understanding of the stars in our catalog (it will also help our theorist friends try to explain what’s happening here).

Exercise 2

Like we did for the football data, fit the teff and color data with a linear function. Use the “grid search” method we used for the football data. Try fitting the data by hand first to choose appropriate ranges of m,b to search.

Use your objective_function from basic_stats_3.ipynb.

Print out the best fit m,b and plot the best-fit line against the observed data.

Hint: the slope (m) will be small. Try m = -1e-4 to start.

# Your answer

We just fit a model to astronomical data! This is a critical tool in modern astronomy. But so far, we can only find the best solution by a time consuming “grid search.” Let’s find a better way.

Many algorithms have been developed to find the parameters that minimize a function (our grid search is one example). Let’s use scipy.optimize.minimize.

Exercise 3

Run the code below to use scipy.optimize.minimize. Then plot the best fit linear function from scipy.optimize.minimize alongside the observed data and your “grid search” fit.

def general_objective_function(theta, fn, x, y):

  y_model = fn(x,*theta)

  return np.sum(np.power(y-y_model,2) / np.abs(y) )

out = scipy.optimize.minimize(general_objective_function, [-1e-4, 2], bounds = [[-1e-3,0], [0,10]], args = (linear, gaia.teff, gaia.color) )
m_fit, b_fit = out.x
# Your plot here

We can now perform a linear fit and even can do it quicky using scipy. However, the above plots show that a linear fit is not a great match to the data. So what is?

Fortunately, general_objective_function (which we provided for the activity above) allows us to change the function we are fitting…

Exercise 4

Let’s try a model where color is inversely proportional to temperature: $\( \mathrm{color} = m / T + b \)$ Write a function def inverse(x, m, b) that returns the inverse function. Then use scipy to find the best fit parameters. Plot the best fit inverse function alongside the observed data and the best fit linear function.

def inverse(x, m, b):
  # your code

# use scipy to find the best fit parameters
# Your plot