Binomial proportions in the wild! Data Quality on Humongous Data

<Disclaimer: my opinions are my own. Situations depicted in this blog are not intended to depict any former, current or future employer, or any particular person or minority> Nature is beautiful. Can we make sense of the forest while still keeping track of the sick trees? Forest Photo by Jeroen Bendeler on StockSnap Tuesday morning. […]

Turn the world into a board game with H3 and Public Data – Part I

How many of you ever played a game called Civilization? To those who have not, it’s an amazing game of strategy in which you play a civilization with certain beliefs and features. There is a PC, console and a board game version. In the PC game you start as a single settler unit, found a […]

Studying people and what they purchase

Or “Serpent people”. The old saying goes something on the lines of “you are what you eat”. On modern societies the sentiment shifts towards “you are what you buy”. Which may include food, of course. One important difference since a decade ago is what can be measured about how a user consumes a product. Lately, […]

Looking into Maximum Spacing Estimation (MSP) & ML.

The maximum spacing estimation (MSE or MSP) is one of those not-so-known statistic tools that are good to have in your toolbox if you ever bump into a misbehaving ML estimation. Finding something about it is a bit tricky, because if you look for something on MSE, you will find “Mean Squared Error” as one of the […]

Kolmogorov-Smirnov for comparing samples (plus, sample code!)

The Kolmogorov-Smirnov test (KS test) is a test which allows you to compare two univariate, continuous distributions by looking at their CDFs. Such CDFs can both be empirical (two-sample KS) or one of them can be empirical, and the other one built parametrically (one-sample). Client: Good Evening. Bartender: Good evening. Rough day? Client: I should […]