How might we deal with a jumble of jigsaw puzzle pieces? A tried-and-tested strategy is to start by spreading out the pieces so they may be taken in with a glance. Then we sort them into recognizable groups, first of edge pieces and then by color. Grouping the pieces according to some pre-defined shared attribute helps us perceive how they may relate to one another, and facilitates solving the puzzle. Clustering in data science follows a similar process.
Clustering seeks to find groups of objects such that the objects in a group are similar to one another, yet different from…
Chernobyl (2019) is a mesmerizing drama of human incompetence, ingenuity and courage in the face of disaster. The show’s analytical examination of the swirling confusion and haphazard reponse in the aftermath of an unprecedented catastrophe also offers valuable lessons for data scientists facing a dynamic situation amid uncertainty and incomplete information. Moreover, the show cast a scientist (a physicist no less!) as the main hero, and put the science and scientific reasoning up front and center.
Discussions of data science all too often focus on the “how to do” analytics and modelling, but neglect how should we make decisions when…
Financial markets are a cornucopia of delights for data scientists, disgorging endless reams of data. Yet that temptation can be a siren song for the unwary because financial market statistics are slippery and treacherous. Oftentimes they seem almost normal (as in distribution), but what appears as solid ground can dissolve swiftly into quicksand, as was seen earlier this year.
The most important stock market index in the world is the S&P 500 Index (SPX), which is widely regarded as the bellweather of the overall US equity market, and even of the state of the American economy. The SPX suffered a…
The gift of ultra-low interest rates in the pandemic
A few days ago investors lined up to pay the British government for the privilege of lending it money. Nearly $5 billion of UK Gilt bonds were issued at a yield of -0.003% on May 20. Let that sink it: a negative interest rate. If a person had even voiced this possibility out loud pre-2008, he/she would have been laughed out of town.
It’s still remarkable, but actually no longer a wholly novel situation given that the German and Japanese governments have been issuing negative-yielding debt for some time now. …
The 2012 news story that Target could predict its customers’ pregnancies was arguably a watershed moment in machine learning’s rise to mass consciousness. Apocryphal or not, the story went viral, swathed in those layers of wonder and anxiety that embody the popular view of AI. Most businesses do not require such intimate insights into customer behavior, but data science has revolutionized customer analytics.
The recency-frequency-monetary (RFM) customer segmentation model is one of the fundamental customer analytics frameworks. The three facets of the model are: Recency: How recent was the last transaction (usually measured in days)? Frequency: How frequent were the…
Zen and the Art of Motorcycle Maintenance was one of my favorite books in college. Set amidst a father-son motorcycle journey across the United States, the book considers how to lead a meaningful life. Arguably, the key message expounded by the author, Robert Pirsig, is that we achieve excellence only when we are fully engaged, heart and mind, with the task at hand. If something is worth doing, then it is worth doing well.
At about the same time, research design and statistical inference courses drilled the importance of interpretability and parsimony into me. Communication is indeed widely stated (hoped?)…
The feature richness of the Ames housing dataset (2011) is both alluring and bewildering in equal measure. It is easy to become entagled in its bountiful features while trying to uncover its patterns. It is first and foremost useful to understand that the Ames dataset fits into the long-established hedonic pricing method to analyzing housing prices.
I had previously studied statistics/econometrics in some detail. Cognizant of the “big data” revolution and intrigued by its promise, I have immersed myself in coding and machine learning these past few months. It was in this process that I encountered the Ames housing dataset…
Financier by profession. Economist by training. Data scientist & essayist by inclination.