Skip to main content

Thoughts on the Innovation Process

Innovation is a messy endeavour. You never know whether you are heading in the right direction and you fail all the time. A lot, if not most, of your ideas are useless, impractical and just fluff. To an outsider, it looks like you are wasting a lot of resources and meandering around aimlessly. In an organization, the sense that innovation is just a giant waste of time and a drag on the rest of the organization can feel acute.

Innovation is also necessary, both in our personal lives and the organizations we work in. How else would one break new ground? How can an organization solve problems better, faster and cheaper in ways that they don't already know? How can we discover new green fields to play in if we don't lift our heads up and look around and beyond?

The question is, how do we manage the tension between the chaos of creative forces and the orderly march towards realizing real world benefits, both in our personal lives and our organizations? How do we meander effectively?

Read more…

Enterprise Data Science. It’s complicated.

In 2017 I wrote a blog post on how data science efforts should be about building representations and nothing else. Now in 2021, four years older, I have different thoughts. I have come to realize that my thoughts four years ago were too myopic, too technically oriented. As it turns out, enterprise data science, like many things in life, is complicated.

Read more…

On Time-varying Markov Systems

In this article I consider a discrete time Markov system which has a transition matrix that is not constant over time. The derivation of the forward and backward Chapman-Kolmogorov equations is then shown.

In Markovian systems, the transition matrix determines the behaviour of the system and the Chapman-Kolmogorov equation is important as it allows us to associate the transition probability matrix, which may not be constant, changes the system across time and allows us to move through the system forwards and backwards in time, exploring the system behaviour fully.

Such systems are often encountered in real-world processes. For example, the transition probability of commuters on a train system will depend on the time of the day and day of the week.

Read more…

Basic Structure of Markovian Population Models

When considering Markov systems, we are often interested in the behaviour of the population of individuals traversing through the graph rather than that of an individual. For example, we might be interested in how crowds move through a shopping mall when entering from different entrances. In such a situation, we would be interested in the crowd density that can be attributed to a particular entrance at some store and time.

Read more…

On Growth Trajectories

There are many ways to think about growth in various aspects of one's life. A useful mental model is that of a Sigmoid curve.

$$\text{growth} = \frac{A}{1+\exp(-B t + C)}$$

$B$ will determine how steep your progress is, $C$ will determine when that steep progression happens and $A$ determines the level at which saturation happens. All the above parameters are functions of your trajectory.

Read more…

Apache SPARK: Setting up and Use with IPython Notebooks

When it comes to exploring and analysing large amounts of data, few tools beat the Apache Spark - IPython Notebook combination. However, in my journey to set up and use Apache Spark for my work, I often had to go through a lot of trial and error as the instructions often assume certain knowledge of Spark's internal settings. I had to plough through documentation in various places before I could get a reasonably working Spark + IPython setup going. So here I chronicle my experience in setting up Apache Spark for use with IPython notebooks, attempting at each step to explain the rationale and the settings used.

Read more…

Log-Sum-Exp Trick

The Log-Sum-Exp trick is a really cool computational trick to prevent overflow or underflow when computing the log of the sum of exponentials (exactly as it name suggests!). I got to know about it while trying to code up mixture density networks which required me to calculate the log of the sum of a bunch of Gaussian distributions for its log-likelihood.

Read more…

On the Futility of Machine Learning "Projects"

In recent years, there has been a great buzz around big data and analytics, and the potential of a confluence of computing capabilites as well as intelligent algorithms to deliever massive business efficiencies and insights. Everyone looks to companies like Google and Facebook and says to themselves, "we should be like them!". In Singapore, it seems as if in the past couple of years, every other company is setting up an analytics team and the demand for data scientists, even though it is not clear that every one thinks the same thing when they all say data scientist, has never been higher. In a way, it is reminiscent of the dot-com bubble that drove an over-investment in intercontinental fibre-optic cable. However, as noted by Thomas Friedman in The World is Flat, new platforms, techniques or methods cannot achieve their full potential unless they are combined with new ways of conducting business. For example, the invention of the light bulb was not able to light up households and allow productive activities to carry on at night until efficient electricity generation and delivery was widespread. Likewise, many companies are, in my opinion, grappling with how to embed data science into their core business practices and in many cases they might be grappling in the wrong direction by applying old thinking to new ways.

In this post, I offer some of my thoughts of how data science needs to change in order for businesses to derive actual benefit from it.

Read more…