About   People   Research   Publications   Software   Data   Blog   Join   Internal 
Introduction to biological data analysis in python
Stilianos Louca (2022), self-published.

This book introduces python programming as a tool for biological data analysis. The following is a non-exhaustive list of covered topics:
  • Variables, in particular strings and floats
  • String formatting
  • If statements, for loops
  • Custom functions
  • Lists, sets, list comprehension
  • Reading and writing files (including fasta and CSV/TSV files)
  • Batch-processing multiple files
  • Numpy arrays
  • Plotting with pyplot (scatterplots, curve plots, bar plots, histograms, heatmaps, box plots)
  • Boolean arrays and boolean indexing
  • Pandas dataframes
  • Randomness, hypothesis testing
  • Basic time series analysis
  • Regular expressions
The book targets undergraduate and graduate students in biology with a strong interest in computational methods, but could also be of interest to students in other sciences. Examples cover a broad range of topics, including neuroscience, cell biology, genetics, ecology, microbiology, physiology, epidemiology and conservation. This book may be freely used as reading material in related university courses or for self-teaching. The book is still under heavy development.

Numerous programming exercises are included throughout the book (however, example solutions are not made public since the book may be used for university courses). References to literature covering additional material are also provided for the interested reader.

Prerequisites: Recommended prerequisites for this book are familiarity with high school biology and mathematics. Readers also need access to a recent version of python, for example on their personal computer.

Free book download:
Book_BioDataAnalysisPython.pdf

Supporting datasets (for examples and exercises):
albatross_foraging_routes_Dodge2013.htsv
amphibian_upper_temperature_limits_Pottier2022.htsv
animal_movement_2D.tsv
animal_species_names.txt.gz
animal_species_names_1000.txt
AVONET_bird_traits.htsv
Bacillus_subtilis_genomes.zip
bird_nesting_sites.htsv
bromeliad_OTU_overlaps_Louca2016_nonames.tsv
captured_lobsters_Atlantic_Koepper2022.htsv
cetacean_abundances_Branch2001.htsv
cetacean_species_names.txt
coastline_coordinates_50m.tsv.gz
Copepod_population_density_GPDD.htsv
deep_ocean_temperature_Scotese2021.tsv
EEG_IRF_Kanda1996.htsv
EEG_S03EC_resting_state_Azar2020.tsv.gz
Enterobacterales_genome_sizes.htsv
Escherichia_coli_eggNOG_table_Louca2022.htsv.gz
Escherichia_coli_genes_KOfam_GCF_008124005.1.txt
Escherichia_coli_genes_KOfam_GCF_015831325.1.txt
Escherichia_coli_ZZb4_GCF_013403045.1.fasta
Eugene_airport_hourly_temperatures_NOAA_2022.01.01.txt
falcon_migration_route_Burnham2012.htsv
falcon_migration_route_Burnham2012.tsv
FAPROTAX_function_table.htsv
fin_whale_route_Silva2013.htsv
forest_patch_properties.tsv
gene_expression_data_Fujita2007.tsv
gene_expression_Hughes2009.tsv
gull_egg_and_chick_properties_Yurlov2022.htsv
gull_egg_lengths_Yurlov2022.txt
HapMap_human_relatedness_matrix_Gross2017.tsv
HIV1_genome_NC_001802.1.txt
HOTS_monthly_pp.csv
HOTS_monthly_pp.htsv
HOTS_monthly_pp.tsv
Hudson_Bay_fur_bearing_animals.tsv
Krebs2011_hare.htsv
Krebs2011_lynx.htsv
mass_metabolic_rate_Hatton2019.htsv
mass_metabolic_rate_Hatton2019_mammals.htsv
mass_vs_growth_rate_Hatton2019.htsv
mass_vs_growth_rate_Hatton2019_mammals.htsv
mouse_microbiomes_OTU_table_Marino2014.htsv
mouse_microbiomes_OTU_taxonomies_Marino2014.htsv
multiple_albatross_routes_Dodge2013.zip
Mycobacterium_lepraemurium_Hawaii_genome_CP021238.1.txt.gz
primate_body_masses.tsv
prokaryote_genome_sizes.htsv
red_grouse_shot_annually_Potts1984.htsv
RefSeq_prokaryotic_genome_properties.htsv
sei_whale_route_Silva2013.htsv
Staphylococcus_aureus_A69_GCF_000698045.1.fasta
Strep_mutans_AND_vs_eggNOG_differences.htsv
subterranean_spider_traits_Mammola2022.tsv
WHO_COVID19_US_daily_counts.htsv
WHO_COVID19_US_weekly_cumulative_counts.htsv
WWTP_AOB_abundances_Ofiteru2010.htsv

Primate body masses by sex and species.

Boolean indexing of arrays.

Basal metabolic rates vs. body masses for different species.

Time series smoothing via moving average.

Detecting peaks in bever population time series.

Migratory route of a peregrine falcon.

Flow diagram of program flow.

A python function is like a useful black box.

Histogram of gull egg lengths by year.

Row names for numpy arrays.

Relative abundances of microbial metabolic functions in different environmental samples.

Genome size differences between animal-associated and non-animal-associated Enterobacterales species.

Flow chart illustrating program flow.

Simplified random route of a grazing animal.

Loading a subset of columns from a file.

Boxplot of bird wing lengths, separated by environment.

Louca lab. Department of Biology, University of Oregon, Eugene, USA
© 2022 Stilianos Louca all rights reserved