Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

About

About Will High

Will High

Machine learning, causal inference, AI, ML engineering

Now

Will High now

Publications

Will High’s Publications

Sitemap

Tools

Tools I use

Posts

The Mightyohm Geiger Counter on Raspberry Pi 4

4 minute read

I slogged my way to a working Mightyohm Geiger Counter integration with a Raspberry Pi 4.

Machine Learning Likelihood, Loss, Gradient, and Hessian Cheat Sheet

6 minute read

Cheat sheet for likelihoods, loss functions, gradients, and Hessians.

Deploy Custom Shiny Apps to AWS Elastic Beanstalk

5 minute read

How I tricked AWS into serving R Shiny with my local custom applications using rocker and Elastic Beanstalk.

Debugging Metaflow Jobs

3 minute read

The combination of an IDE, a Jupyter notebook, and some best practices can radically shorten the Metaflow development and debugging cycle.

Metaflow Best Practices for Machine Learning

8 minute read

Some of these are specific to Metaflow, some are more general to Python and ML.

Machine Learning Model Selection with Metaflow

7 minute read

Configurable, repeatable, parallel model selection using Metaflow, including randomized hyperparameter tuning, cross-validation, and early stopping.

Where Those Loss Constants Come From

9 minute read

Here’s where that n and that 2 come from in the square-loss objective function, in gory detail.

Parallel Grep and Awk

4 minute read

I get a nearly 6x speedup over standard grep by using GNU parallel.

Hacking a Serverless Machine-Learning Scoring Microservice with AWS Lambda

5 minute read

In this post I’ll attempt to hack a scikit-learn model prediction microservice with AWS Lambda.

Guaranteeing k Samples in Streaming Sampling Without Replacement

8 minute read

If you need $k$ samples out of $N$ in Hive or Pig, typically you’d naively choose $p = k/N$, but this only gives you $k$ on average.

The Streaming Distributed Bootstrap

14 minute read

The streaming distributed bootstrap is a really fun solution, and I’ve mocked up a Python package to test it out.

Fast and Lean Ad Hoc Binary Classifier Evaluation

9 minute read

Sitemap

Pages

Posts