with Matlab

Jesse Dorrestijn

8 July 2016

This webpage can be used as an extra source for understanding the stochastic parameterization schemes in the Ph.D. thesis
"Stochastic Convection Parameterization" of Jesse Dorrestijn, of which the defence is set at 8 September 2016 at Delft University of Technology in The Netherlands. In the thesis, Markov chains are used to construct stochastic convection parameterizations
from data of convection and clouds. In the thesis and published articles, the method has been described as clearly as possible, however, codes
have not been published. Therefore, the focus of this page is on Matlab codes, written for researchers and students that like to use or to learn
about data-driven Markov chains. The web page gives examples, of increasing complexity, and presents Matlab codes in each example.

Contents of this web page:

Example 1: Markov chains with 2 small-scale states

Example 2: Markov chains with N small-scale states

Example 3: Markov chains conditioned on a large-scale variable

Example 4: Markov chains conditioned on a large-scale variable on two time instances

Example 5: Clustering of observations

Example 6: Simultaneous clustering of 2 observations

Example 7: Extra simple example of 2D clustering

Example 8: Cross-correlation analysis

A Finite State Markov chain has a finite number of states and it switches
between these states with certain probabilities. These probabilities can
be put into a matrix P. If the Markov chain has 2 states, this transition
probability matrix is of size 2 x 2. On the diagonal are the
probabilities that the state does not change in one time-step from t to
t+1. The other probabilities are off the diagonal. For example, the
probability that it switches from state 1 to state 2 is at the entry (1,2) in
the matrix. Together with an initial value, the Markov chain can produce
sequences.

A Markov chain can be used to mimic a certain process. If a process has
for example only two states, and a long sequence is available, transition
probabilities of the Markov chain can be estimated from this sequence.

** Example 1: Markov chains with 2 small-scale states**

In the following example, we first construct a sequence with only two
states. This sequence, which we call the observations y_obs, will be used
to estimate transition probabilities of a Markov chain and put into a
matrix P_MC. Finally, we will use the Markov chain to construct a sequence y_MC which is
similar to the original y_obs sequence. We will plot the two sequences. The length of the original sequence L can be adjusted. Then, the probability matrix P_MC can be compared to P_obs.
Also the transition probabilities can be adjusted.

-->Click here to get the example 1 Matlab file.<--

**Example 2: Markov chains with N small-scale states**

Let us do almost the same as in Example 1. However, now we will consider Markov
chains with more than 2 states. Now, let N>2 be the number of states. You can adjust this number.

-->Click here to get the example 2 Matlab file.<--

**Example 3: Markov chains conditioned on a large-scale variable**

Let us now introduce conditioning. Imagine that the transition
probabilities depend on a certain variable X. Then, the Markov chain
becomes a Conditional Markov Chain, because it is conditioned on X. The
observational data now consists of a sequence y_obs (as was the case in
example 1 and 2) and an additional sequence X_obs. After construction of
the Conditional Markov Chain, an additional sequence X should be
available, e.g. X=X_obs. In the context of climate or weather models, the large-scale variable X can for example be the average surface temperature in an area: a variable that is known to the model. The small-scale variable y, for example the convective area fraction, is not known to the model and can be represented by a Markov chain.

-->Click here to get the example 3 Matlab file.<--

**Example 4: Markov chains conditioned on a large-scale variable on two time instances.**

Let us now additionally condition on X(t+1). In the previous example, the
transition probabilities were conditioned only on X(t). The transition
probability of the Conditional Markov chain to switch from state k to
state l will now be: P(Y_CMC(t+1)=l | Y_CMC(t) = k & X(t)=m & X(t+1)=n).
For each combination of X(t) and X(t+1) there will be a transition matrix
for Y_CMC.

-->Click here to get the example 4 Matlab file.<--

**Example 5: Clustering of observations.**

To use a finite state Markov chain, the observational sequence needs to
be classified in a finite number of states. Also the variable X needs to
be classified in a finite number of states. We use k-means.

-->Click here to get the example 5 Matlab file.<--

**Example 6: Simultaneous clustering of 2 observations**

Same as previous example, but with two variables to condition on: X and Z. We use k-means for 2D clustering of (X,Z).

-->Click here to get the example 6 Matlab file.<--

**Example 7: Extra simple example of 2D clustering**

Using k-means for 2D clustering of two normally distributed variables.

-->Click here to get the example 7 Matlab file.<--

**Example 8: Cross-correlation analysis**

Using cross-correlation analysis to determine which variable can best be used to condition on.
In the example figure, X_ori displays much stronger correlation than Z_ori: therefore, X_ori can
be chosen to condition on instead of Z_ori.

-->Click here to get the example 8 Matlab file.<--