About   People   Research   Publications   Software   Data   Blog   Join   Internal 
Introduction to biological data analysis in python
Stilianos Louca (2022), self-published.

This book introduces python programming as a tool for biological data analysis. The following is a non-exhaustive list of covered topics:
  • Variables, in particular strings and floats
  • String formatting
  • If statements, for loops
  • Custom functions
  • Lists, sets, dictionaries, list comprehension
  • Reading and writing files (including fasta and CSV/TSV files)
  • Batch-processing multiple files
  • Numpy arrays
  • Plotting with pyplot (scatterplots, curves, bar plots, histograms, heatmaps, box plots, maps)
  • Boolean arrays and boolean indexing
  • Pandas dataframes
  • Randomness, hypothesis testing
  • Basic time series analysis
  • Regular expressions
The book targets undergraduate and graduate students in biology with a strong interest in computational methods, but could also be of interest to students in other sciences. Numerous examples and exercises are included throughout the book, many based on realistic datasets from the scientific literature (exercise solutions are not made public since the book may be used for university courses). Examples and exercises cover a broad range of topics, including neuroscience, cell biology, genetics, ecology, microbiology, physiology, epidemiology and conservation. References to the scientific literature are provided throughout for the interested reader. This book is suitable as reading material in related university courses as well as for self-teaching.

Where to get the book:
The book is available on the Google Play Bookstore (GGKEY:UTJQTJSHTD3). If you are an educator wishing to use this book in your course and need the solutions to the exercises, please email the author with your request.

Jupyter notebooks:

Supporting datasets (for examples and exercises):

Primate body masses by sex and species.

Boolean indexing of arrays.

Basal metabolic rates vs. body masses for different species.

Time series smoothing via moving average.

Detecting peaks in bever population time series.

Night-time sea surface temperature across the worlds oceans. Migratory route of a peregrine falcon.

Control diagram of program flow.

A python function is like a useful black box.

Histogram of gull egg lengths by year.

Row names for numpy arrays.

Relative abundances of microbial metabolic functions in different environmental samples.

Genome size differences between animal-associated and non-animal-associated Enterobacterales species.

Illustration of d-dimensional array indexing.

Simplified random route of a grazing animal.

Loading a subset of columns from a file.

Boxplot of bird wing lengths, separated by environment.

Boolean array operations.

Local tetranucleotide frequency variances and GC contents across the <i>Mycoplasma pneumoniae</i> genome. Benthic faunal density measurements across the world's oceans.

Louca lab. Department of Biology, University of Oregon, Eugene, USA
© 2025 Stilianos Louca all rights reserved