Experimental Data Analysis Lab

PHYS 391 - Fall 2020
Lab 1 - Python Intro

Updated Sunday September 27, 2020

Lab Goals

The goals of this lab are to find a working copy of python which you can regularly use and start to gain some familiarity with this program, including reading in data files and producing plots. Figuring out where to save your work and send your completed files is as much a part of this assignment as learning the python syntax. We also want to start building our skills with the command line.

Lab Manifest

Sources for python

See the Jupyter Notebooks page for more information about installing Jupyter on your own machine. More general information about python and links to some useful tutorials can be found on the Python Information page.

Lab Instructions

This first lab is significantly less formal than the labs which will follow. Really, I just want you to figure out a few things about how to navigate with the command-line, and also how to get a Jupyter Notebook running and start learning something about how to use it. The capstone of this lab is to read in data from a text file and make a few plots. We will expand on these basic skills with the remaining labs in this course. You will need to work out the answers to the command-line questions for the first part, and a completed Jupyter notebook answering the questions posed there. No other lab report will be required. You may work in small groups to finish this lab, but everyone needs to turn in their own notebook. Please make sure you are learning this material yourself. If you don't start learning some of these basics, you will struggle all quarter long.

Lab Rubric

I would encourage you to try to get through the command line questions and most of the python basics in your first week session. If this seems to be taking you a lot of time, I would recommend that you try to find some help.

A note on Markdown

You will be turning in all of your lab work in Jupyter notebooks. Jupyter allows you to write python code that you can execute, but also provides text entry through Markdown cells. A brief primer on useful Markdown commands can be found here or there are many other tutorials on the web. In addition, Jupyter allows you to enter LaTeX syntax which is very useful for writing equations. Some pointers on using LaTeX can be found here. A bit more information on Jupyter notebooks is available at the Jupyter Notebook page.

Navigating the Command Line

Unix-based systems are the foundation of scientific computing, and the command line (also know as a shell) is often needed to do real data analysis on a computer. Having a rudimentary working knowledge of how to use these tools are important. If you have never touched the command line before, I would recommend this tutorial. If you are a little rusty, this cheat sheet will probably be helpful.

Answer the following questions by providing the command you would enter to achieve each action. Your answers need to be entered into your Jupiter notebook as a text markdown box. Please note that your notebook does not need to execute these commands, you just need to answer the questions. Finding answers to these questions on the web is encouraged, as this is one of the most common ways to figure out how to do this kind of stuff. If you have a PC, you will need to log into a Unix system to finish this part of the lab. The machine shell.uoregon.edu is one possibility that you can access remotely. From a PC you will need a terminal emulator such as PuTTY, or with Windows 10 you can use the built-in OpenSSH client (although you may need to enable it first).

  1. How do you find the absolute path of your present directory?
  2. How do you list the contents of a directory with the newest files listed first?
  3. Suppose you have 500 files in a directory. How would you list these files such that you can view one screen at a time?
  4. How do you copy a file myfile.txt to the directory above your current directory (i.e.: the parent directory)?
  5. How do you print the contents of a file myfile.txt to the screen (standard output)?
  6. How do you create a sub-directory mydirectory in your present directory?
  7. How do you change the permissions on a file so that you can execute it (this can be important for scripts...)?
  8. How do you print the first 5 lines of myfile.txt to the screen (standard output)?
  9. How do you print the first 30 lines of myfile.txt to another new file myotherfile.txt?
  10. Name one method for creating and/or editing a file myfile.txt (there are many correct answers for this)? Make sure to try this!

Python Basics

Download the file Lab1Template.ipynb to the computer you are working on and rename this to your name with the same extension. In other words, something like Lab1_FirstnameLastname.ipynb. Put this in a location that the Jupyter Notebook browser can find it and open the file within Jupyter Notebook. If your web browser insists on renaming this with a .txt extension, just rename the file and remove this extra extension. Go through the first set of questions and fill in each code cell to demonstrate that you can answer the question. Be sure to describe where you are doing this work. If you have never used python before, read the Python page first, and it is probably worth working through at least one python tutorial.

Vectors and Arrays

One of the nice features of python is the ability to extend the basic language to add new functionality. One useful add-on for scientific analysis is the numpy package, which makes it considerably easier to do operations with vectors and arrays. Continue working through the questions involving vector and array manipulation with numpy. The numpy tutorial is probably helpful for this part.

Making Plots

Finally, we will explore the matplotlib package for making a few simple plots.

Putting it all together

The final part of the lab is to read in and display some data. To make this a bit more interesting, we will use data on a selection of over 3,000 galaxies within 300 megaparsecs of Earth (galaxydata.txt) provided by Dr. Elsa Johnson. If you open this file and look at the contents, you will see this has text and numbers separated by commas. This is called a CSV (Comma-Separated Values) file, and is a common format for input and output of data made popular by spreadsheet programs. We are going to load this data into python using pandas and do some explorative data analysis on this data. Don't worry too much about not understanding precisely what these various values mean.

Each column of this file is labelled in the header row with a terse descriptive text identifier. Briefly, the columns contain the following information.

First, read in the values from the text file. There are a number of ways to do this, and I have given you an example of doing this using pandas. Make sure you have read all the data correctly by comparing a few of the values read in to the original text file.

Plot Dec vs. RA positions as a scatter plot to produce a 'sky map'. Be sure to convert RA to degrees from hours. RA values should fall between 0 and 360 degrees. Make sure to label the axes of your plot appropriately.

Plot the (ut-bt) vs. (vt-kt) color. These are ratios of flux, but since this is a log scale you simply subtract the magnitudes. In astronomy, this is called a color-color plot, and galaxies with with small values of both means there are a lot of hot bright young massive stars in that galaxy with lots of ongoing star formation. Galaxies in the other corner contain mostly older stars, or there is a large fraction of metal content (metallicity) in the galaxy.

Plot the (bt-kt) color against the surface brightness (brief) for Sc galaxies and E galaxies on the same plot using different color points. Describe briefly what you observe. It is fine if you include variants of each (e.g. Scd or E?) as the difference in population should be clear.

Finally, make a histogram of redshift (v) for Sc galaxies and E galaxies. These can either be on the same or a different plot. Choose at least 15 bins for your histogram. Describe any difference you see in the shapes of the distributions.

Finishing Up

You will be turning in your Jupyter notebook, and I will run all of the cells one at a time to check your output. Please make sure this works before turning in your notebook! When you are done, save and close your notebook, and then re-open your notebook in a new browser window and select Run All from under the Cells menu. This will run each cell in the order they are entered in the file (which may not be exactly how you ran them when you were developing your answers). Check for any errors and fix them before turning this in. This is how your notebook will be graded, so please make sure it works! Also, please make sure your notebook assumes that the galaxydata.txt file is in the same directory as your ipynb file.

Closing Remarks

This dataset was meant to be somewhat abstract. I don't expect you to have any knowledge about what the various quantities mean, but we can already see how exploring data graphically can tell us things and illustrate potential problems and biases. One major challenge of doing a proper analysis is to try to anticipate problems, look for evidence of those problems in your data, and take measures to either mitigate those problems, or estimate uncertainties on your measurements as a result. We will start discussing these topics more in Lab 2.