# Enterprise Data Science. It’s complicated.

In 2017 I wrote a blog post on how data science efforts should be about building representations and nothing else. Now in 2021, four years older, I have different thoughts. I have come to realize that my thoughts four years ago were too myopic, too technically oriented. As it turns out, enterprise data science, like many things in life, is complicated.

# On Time-varying Markov Systems

In this article I consider a discrete time Markov system which has a transition matrix that is not constant over time. The derivation of the forward and backward Chapman-Kolmogorov equations is then shown.

In Markovian systems, the transition matrix determines the behaviour of the system and the Chapman-Kolmogorov equation is important as it allows us to associate the transition probability matrix, which may not be constant, changes the system across time and allows us to move through the system forwards and backwards in time, exploring the system behaviour fully.

Such systems are often encountered in real-world processes. For example, the transition probability of commuters on a train system will depend on the time of the day and day of the week.

# Basic Structure of Markovian Population Models

When considering Markov systems, we are often interested in the behaviour of the population of individuals traversing through the graph rather than that of an individual. For example, we might be interested in how crowds move through a shopping mall when entering from different entrances. In such a situation, we would be interested in the crowd density that can be attributed to a particular entrance at some store and time.

# On Growth Trajectories

There are many ways to think about growth in various aspects of one's life. A useful mental model is that of a Sigmoid curve.

$$\text{growth} = \frac{A}{1+\exp(-B t + C)}$$

$B$ will determine how steep your progress is, $C$ will determine when that steep progression happens and $A$ determines the level at which saturation happens. All the above parameters are functions of your trajectory.

# Apache SPARK: Setting up and Use with IPython Notebooks

When it comes to exploring and analysing large amounts of data, few tools beat the Apache Spark - IPython Notebook combination. However, in my journey to set up and use Apache Spark for my work, I often had to go through a lot of trial and error as the instructions often assume certain knowledge of Spark's internal settings. I had to plough through documentation in various places before I could get a reasonably working Spark + IPython setup going. So here I chronicle my experience in setting up Apache Spark for use with IPython notebooks, attempting at each step to explain the rationale and the settings used.

# Log-Sum-Exp Trick

The Log-Sum-Exp trick is a really cool computational trick to prevent overflow or underflow when computing the log of the sum of exponentials (exactly as it name suggests!). I got to know about it while trying to code up mixture density networks which required me to calculate the log of the sum of a bunch of Gaussian distributions for its log-likelihood.

# Mixture Density Networks: Basics

## Mixture Density Networks¶

### Background¶

I got interested in Mixture Density Network while reading Bishop's book on machine learning. His original paper can be found here.

It is useful in problems where inputs can map to multiple output values. This is where traditional discriminative neural networks fail.

# On the Futility of Machine Learning "Projects"

In recent years, there has been a great buzz around big data and analytics, and the potential of a confluence of computing capabilites as well as intelligent algorithms to deliever massive business efficiencies and insights. Everyone looks to companies like Google and Facebook and says to themselves, "we should be like them!". In Singapore, it seems as if in the past couple of years, every other company is setting up an analytics team and the demand for data scientists, even though it is not clear that every one thinks the same thing when they all say data scientist, has never been higher. In a way, it is reminiscent of the dot-com bubble that drove an over-investment in intercontinental fibre-optic cable. However, as noted by Thomas Friedman in The World is Flat, new platforms, techniques or methods cannot achieve their full potential unless they are combined with new ways of conducting business. For example, the invention of the light bulb was not able to light up households and allow productive activities to carry on at night until efficient electricity generation and delivery was widespread. Likewise, many companies are, in my opinion, grappling with how to embed data science into their core business practices and in many cases they might be grappling in the wrong direction by applying old thinking to new ways.

In this post, I offer some of my thoughts of how data science needs to change in order for businesses to derive actual benefit from it.

# Evolution of the Singapore Population

## Executive Summary¶

• A quick and cursory look at Singapore population evolution based on publicly available dataset
• Singapore has aged dramatically from 2000 to 2016
• A comparision with population in 2000 with that of 2016 shows that a shortage working age population had to be made up with permanent residents or naturalised citizens
• The proportion of working age Singapore residents has been decreasing since 2011 and looks likely to continue so in the next few years due to the lack of younger population projected to enter the work force
• The ethnic mix of Singapore has remained relatively constant from 2000 -- 2016

## Motivation¶

Singapore has always been population-challenged. The fact that Singapore will always depend on the quality its people in order to survive in this globalised world has been known since its founding in 1965.

Its size represents a glass ceiling limiting economic potential and in turn societal stability. Without a high quality labor force, Singapore would find it hard to attract investors, talents and business interest when compared to neighbours such as Indonesia which has a large market to which companies can sell to.

As Singapore develops and acquires some of the characteristics of developed economies such as an aging labour force and the need to move towards a consumer/knowledge-driven economy (if that is even possible in Sinpapore's context), demographics will play a crucial role in the future of Singapore. The effects of demographic evolution are magnified in Singapore since unlike other countries, we have neither natural resources to cushion the effects of global economic vagaries nor a hinterland for us to be self-sustaining in terms of key need such as water and food.

Thus to satisfy my curiousity about how the Singapore demographics have changed over the years, I decided to make a quick and cursory analysis of publicly available data from data.gov.sg.