About   People   Research   Publications   Software   Data   Blog   Join   Internal 
Introduction to biological data analysis in python
Stilianos Louca (2022), self-published.

This book introduces python programming as a tool for biological data analysis. The following is a non-exhaustive list of covered topics:
  • Variables, in particular strings and floats
  • String formatting
  • If statements, for loops
  • Custom functions
  • Lists, sets, dictionaries, list comprehension
  • Reading and writing files (including fasta and CSV/TSV files)
  • Batch-processing multiple files
  • Numpy arrays
  • Plotting with pyplot (scatterplots, curves, bar plots, histograms, heatmaps, box plots, maps)
  • Boolean arrays and boolean indexing
  • Pandas dataframes
  • Randomness, hypothesis testing
  • Basic time series analysis
  • Regular expressions
The book targets undergraduate and graduate students in biology with a strong interest in computational methods, but could also be of interest to students in other sciences. Numerous examples and exercises are included throughout the book, many based on realistic datasets from the scientific literature (exercise solutions are not made public since the book may be used for university courses). Examples and exercises cover a broad range of topics, including neuroscience, cell biology, genetics, ecology, microbiology, physiology, epidemiology and conservation. References to the scientific literature are provided throughout for the interested reader. This book is suitable as reading material in related university courses as well as for self-teaching.

Where to get the book:
The book is available on the Google Play Bookstore (GGKEY:UTJQTJSHTD3). If you are an educator wishing to use this book in your course and need the solutions to the exercises, please email the author with your request.

Supporting datasets (for examples and exercises):
albatross_foraging_routes_Dodge2013.htsv
amphibian_upper_temperature_limits_Pottier2022.htsv
animal_movement_2D.tsv
animal_species_names.txt.gz
animal_species_names_1000.txt
animal_species_names_200000.txt.gz
animal_traits_Herberstein2022.htsv
Australian_ocean_zooplankton_biomass_McEnnulty2020.htsv.gz
AVONET_bird_traits.htsv
AVONET_bird_traits_with_nans.htsv
Bacillus_subtilis_genomes.zip
BenBioDen_benthic_biomasses_Stratmann2020.htsv.gz
BenBioDen_benthic_densities_Stratmann2020.htsv.gz
bird_nesting_sites.htsv
bromeliad_OTU_overlaps_Louca2016_nonames.tsv
captured_lobsters_Atlantic_Koepper2022.htsv
cetacean_abundances_Branch2001.htsv
cetacean_species_names.txt
coastline_coordinates_50m.tsv.gz
codon_table.csv
Copepod_population_density_GPDD.htsv
deep_ocean_temperature_Scotese2021.tsv
EEG_IRF_Kanda1996.htsv
EEG_S03EC_resting_state_Azar2020.tsv.gz
Enterobacterales_genome_sizes.htsv
Escherichia_coli_eggNOG_table_Louca2022.htsv.gz
Escherichia_coli_genes_KOfam_GCF_008124005.1.txt
Escherichia_coli_genes_KOfam_GCF_015831325.1.txt
Escherichia_coli_ZZb4_GCF_013403045.1.fasta
Eugene_airport_hourly_temperatures_NOAA_2022.01.01.txt
falcon_migration_route_Burnham2012.htsv
falcon_migration_route_Burnham2012.tsv
FAPROTAX_function_table.htsv
fin_whale_route_Silva2013.htsv
forest_patch_properties.tsv
gene_expression_data_Fujita2007.tsv
gene_expression_Hughes2009.htsv.gz
global_chlA_Martinez2020_2010.06.15_scatter.htsv.gz
global_chlA_Martinez2020_multiple_months.zip
gull_egg_and_chick_properties_Yurlov2022.htsv
gull_egg_lengths_Yurlov2022.txt
Haemophilus_influenzae_GCF_020731605.1.fasta.gz
HapMap_human_relatedness_matrix_Gross2017.tsv
HIV1_genome_NC_001802.1.txt
HOTS_monthly_pp.csv
HOTS_monthly_pp.htsv
HOTS_monthly_pp.tsv
Hudson_Bay_fur_bearing_animals.tsv
Krebs2011_hare.htsv
Krebs2011_lynx.htsv
MAG_subset_AAIs_ANIs_GCDs.htsv.gz
MAG_subset_taxonomies.htsv
mass_metabolic_rate_Hatton2019.htsv
mass_metabolic_rate_Hatton2019_mammals.htsv
mass_vs_growth_rate_Hatton2019.htsv
mass_vs_growth_rate_Hatton2019_mammals.htsv
mouse_microbiomes_OTU_table_Marino2014.htsv
mouse_microbiomes_OTU_taxonomies_Marino2014.htsv
multiple_albatross_routes_Dodge2013.zip
museum_inventory.txt
Mycobacterium_lepraemurium_Hawaii_genome_CP021238.1.txt.gz
Mycoplasma_pneumoniae_GCF_001455795.1.fasta.gz
primate_body_masses.tsv
prokaryote_genome_sizes.htsv
red_grouse_shot_annually_Potts1984.htsv
RefSeq_prokaryotic_genome_properties.htsv
sei_whale_route_Silva2013.htsv
sequences.fasta
SST_NOAA_2023.01.01_grid.htsv.gz
SST_NOAA_2023.01.01_scatter.htsv.gz
Staphylococcus_aureus_A69_GCF_000698045.1.fasta
Strep_mutans_AND_vs_eggNOG_differences.htsv
subterranean_spider_traits_Mammola2022.tsv
WHO_COVID19_US_daily_counts.htsv
WHO_COVID19_US_weekly_cumulative_counts.htsv
WWTP_AOB_abundances_Ofiteru2010.htsv

Primate body masses by sex and species.

Boolean indexing of arrays.

Basal metabolic rates vs. body masses for different species.

Time series smoothing via moving average.

Detecting peaks in bever population time series.

Night-time sea surface temperature across the worlds oceans. Migratory route of a peregrine falcon.

Control diagram of program flow.

A python function is like a useful black box.

Histogram of gull egg lengths by year.

Row names for numpy arrays.

Relative abundances of microbial metabolic functions in different environmental samples.

Genome size differences between animal-associated and non-animal-associated Enterobacterales species.

Illustration of d-dimensional array indexing.

Simplified random route of a grazing animal.

Loading a subset of columns from a file.

Boxplot of bird wing lengths, separated by environment.

Boolean array operations.

Local tetranucleotide frequency variances and GC contents across the <i>Mycoplasma pneumoniae</i> genome. Benthic faunal density measurements across the world's oceans.

Louca lab. Department of Biology, University of Oregon, Eugene, USA
© 2024 Stilianos Louca all rights reserved