Projects

Projects #

Some of my projects, more or less related to my work, and which I hope might interest you.

FileFinder #

Github PyPi

Find files from their filename structure, with a simple yet powerful syntax. Useful for working with databases with many different files. For example files with different dates and depth:

finder = Finder('/root_folder/', '%(Y)/SST_%(Y)%(m)%(d)_%(depth:fmt=.1f).nc')
files = finder.get_files()

Tol-colors #

Github PyPi

A package which supplies color schemes for lines and maps, all color-blind safe. They were designed by Paul Tol, I merely made it available on PyPi to make it easier to install and use. For example one of the available scheme (‘vibrant’):

color samples from the vibrant color scheme

Data-assistant #

Gitlab

Help jump-start a data analysis project:

  • Obtain your parameters from a configuration file or command line arguments. Validate them against a structured specification that is easy to write, expandable, and which allows to document every parameter.
  • Declare datasets in a flexible way to manage varying parameters, multiple files, to read/write the data, etc.
  • Setup Dask either on a local machine, or distributed on a cluster (using dask-jobqueue).

Heterogeneity-Index #

Gitlab

A Python library to compute the Heterogeneity Index, as defined in Haëck et al. (2023) and Liu & Levine (2016), and some associated diagnostics (front detection, statistics of variable in and outside fronts).

It can be viewed as an example of a complex front detection algorithm implemented in Python but still competitive thanks to Numba. It can run on numpy, dask, or xarray arrays. As such the project could support other front detection methods.

XArray-histogram #

Github

I use histograms a lot as intermediary results to reduce the volume of data to analyse. I explored (a little bit) ways to construct histogram efficiently on large datasets. This is a tentative to use Boost Histogram and its Dask counterpart dask-histogram.

I measured an increase in speed, but I am currently not sure if it scales properly with larger datasets.

VisibleEarth Homepage #

Github

Because having as a homepage the latest image from NASA VisibleEarth in full resolution and full-screen, is really nice.

Dateloop #

Github

A simple bash command to create lists of dates. Useful for operations on sets of dates.

$> dateloop 20010227 20010301 -f %Y-%m-%d
2001-02-27 2001-02-28 2001-03-01