Binomial proportions in the wild! Data Quality on Humongous Data

<Disclaimer: my opinions are my own. Situations depicted in this blog are not intended to depict any former, current or future employer, or any particular person or minority> Nature is beautiful. Can we make sense of the forest while still keeping track of the sick trees? Forest Photo by Jeroen Bendeler on StockSnap Tuesday morning. […]

Looking into Maximum Spacing Estimation (MSP) & ML.

The maximum spacing estimation (MSE or MSP) is one of those not-so-known statistic tools that are good to have in your toolbox if you ever bump into a misbehaving ML estimation. Finding something about it is a bit tricky, because if you look for something on MSE, you will find “Mean Squared Error” as one of the […]

Kolmogorov-Smirnov for comparing samples (plus, sample code!)

The Kolmogorov-Smirnov test (KS test) is a test which allows you to compare two univariate, continuous distributions by looking at their CDFs. Such CDFs can both be empirical (two-sample KS) or one of them can be empirical, and the other one built parametrically (one-sample). Client: Good Evening. Bartender: Good evening. Rough day? Client: I should […]

Trying out Copula packages in Python – II

And here we go with the copula package in (the sandbox of) statsmodels! You can look at the code first here. I am in love with this package. I was in love with statsmodels already, but this tiny little copula package has everything one can hope for! First Impressions First I was not sure about […]

Trying out Copula packages in Python – I

You may ask, why copulas? We do not mean this copulas. We mean the mathematical concept. Simply put, copulas are joint distribution functions with uniform marginals. The kicker, is that they allow you to study dependencies separately from marginals. Sometimes you have more information on the marginals than on the joint function of a dataset, […]