Thursday, August 18, 2016

GSoC 2016 #Final

Prediction and forecasting

The last challenge I faced during GSoC was to implement Kim prediction and forecasting. At first it appeared to be quite difficult, because I dealt with both mathematical and architectural issues. Here's a short overview of the subject.

Prediction maths

Kalman filter in Statsmodels has three modes of prediction:
  • Static prediction. This is prediction of the next observation based on current. This type of prediction is equivalent to usual Kalman filtering routine.
  • Dynamic prediction. Still don't get its purpose.
  • Forecasting. Mathematically speaking, forecast of the out-of-sample observation is its expectation conditional on known observations.
My goal was to implement two of these types - static prediction and forecasting in case of switching regimes, i.e. construct Kim prediction and forecasting device.
I haven't got any problems with static prediction, because it's also equivalent to Kim filtering. But forecasting issue is not covered in [1], so I had to use my own intelligence to come up with mathematical routine, calculating future data expectation and covariance, conditional on known observations. Basically it's just a Kim filter, but without Hamilton filtering step and with underlying Kalman filters working in prediction mode.

Prediction code

My laziness forced intention to write less code and reuse the existing. The idea was to somehow reuse Kalman filter prediction, but when I came up with the correct forecast routine I understood that it doesn't appear to be possible. So I had to implement all the routine myself, which is located at kim_filter.KimPredictionResults._predict method. Luckily the prediction code is much more compact then the filter's one. Also I didn't have to care so much about optimality, since prediction doesn't take part in likelihood optimization.

Some nice pics

Since no test data is available for forecasting, I used iPython notebooks as a sort of visual testing.
I added pictures of static (one-step-ahead) prediction and forecasting to Lam's model and MS-AR model notebooks, they look sensible and my mentor liked them:
(this is for Lam's model)
(and that's for MS-AR)
Forecast's variance is constant, because it's fast to find a stationary value.

GSoC: Summary

I'm proud to say that I've completed almost all items of my proposal, except of constructing a generic Kim filter with an arbitrary r number of previous states to look up. But this is due to performance problems, which to be solved require more time then GSoC permits. In detail, implemented pure-Python r=2 case works slowly and is to be rewritten in Cython.
Anyway, a good advantage of my Kim filter implementation, as mentioned by my mentor, is using logarithms to work with probabilities. It gives a high improvement in precision, as I conclude from testing.
A broad report on what's completed and what's not can be found here in github comments.

GSoC: My impressions

GSoC has surely increased my level of self-confidence. I've made a lot of nice work, written 10k lines of code (I was expecting much less, to be honest), met many nice people and students.
I have to admit, that GSoC appeared to be easier than I thought. The most difficult and nervous part of GSoC was building a good proposal. I remember, that I learned a lot of new material in very short terms  - I even had to read books about state space models during vacation, from my smartphone, sitting in a plane or a subway train.
I also started working on my project quite early - in the beginning of May. So, I had like 60% of my project completed by the midterm, and I didn't start working full time yet, because my school and exams finished only by the end of June.
So I worked hard during July, spending days in the local library, but still, I think I never worked like 8 hours a day. Eventually, to the beginning of the August I completed almost everything, the only left thing was prediction and forecasting, discussed previously in this post.
I dreamed about taking part in GSoC since I was sophomore, and I'm glad I finally did it. The GSoC code I produced is definitely the best work I've ever done, but I hope to do more in future.
Thanks for reading my blog! It was created for GSoC needs only, but I think I will continue writing, as I get something interesting to tell.

Literature

[1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

Saturday, August 6, 2016

GSoC 2016 #5

Dynamic Factor Model

There is one more item in my proposal, which I haven't yet mention in my reports, although I've been working on it before refactoring phase and TVP model implementation. This is a Markov switching dynamic factor model, we will use its following specification:
y, as usual, is an observation process, and (1) is an observation equation. f is a factor, changing according to (2) - factor transition equation, which is a VAR model with Markov switching intercept. Observation error is a VAR model, too, as (3) states.
Statsmodels already has a non-switching DFM realization in statespace.dynamic_factor.py file, which has almost the same specification, but without Markov switching intercept term in factor transition equation, so the first challenge was to extend DynamicFactor class and add analogous, but non-switching intercept. This parameter is required, because it is used for switching intercept initialization in Maximum Likelihood Estimation. regime_switching/switching_dynamic_factor.py and regime_switching/tests/test_switching_dynamic_factor.py files contain my experiments with MS-DFM, which were unsuccessful due to reasons, discussed in the next section.

Irresistible obstacles

Non-switching DynamicFactor class is a big piece of code itself, and since a lot of its functionality is shared with switching model, the only right solution is to extend it by SwitchingDynamicFactor class. The problem is that this class wasn't supposed to be extended, so it was quite tricky until I realised that it's a bad idea. For example, I have to substitute DynamicFactor's KalmanSmoother instance by an ugly descendant of KimSmoother with some interface changes to achieve compatibility with non-switching model. After a series of similar sophisticated manipulations I came up with a thought that it's impossible to construct a SwitchingDynamicFactor class without changes in the parent class. However, in my experience there are not so many changes needed.
Another problem is about testing data. I use this Gauss code sample from Kim and Nelson book. This is the only code I know to test MS-DFM against. But the disappointment is that this testing model is incompatible with presented formerly - observation equation contains lagged factor terms, while ours uses only the current factor. I also tried to use some tricks, the main was to group lagged factors into one vector factor. After several errors and considerations, I figured out that this is a bad idea, because a transition noise covariance matrix becomes singular. The only solution I see now is to extend DFM and MS-DFM model, so that they could handle lagged factors in observation equation, but this is a time-consuming challenge.

What's next?

The thing I'm working on right now is constructing a generic forecasting to Kim filter, which is the last important feature to be added. I spent a couple of days just thinking how to implement this, but now I'm finally writing the code. Forecasting should be a very visual thing, so I would add it to the existing notebooks, which is also a kind of testing.

Literature

[1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

Monday, July 25, 2016

GSoC 2016 #4

Time Varying Parameters

Let's consider the following process:
Here yt is an observed process, xt is an exogenous vector, beta are so called time varying parameters that change with time, as (2) equation states. e and v are white noise terms:
Presented model has a name of Time-Varying-Parameter model, and it was a part of my proposal. As you can see, it is non-switching, but it is used to evaluate a good start parameters for switching model likelihood optimization.
TVP and MS-TVP models occurred to be the easiest and the most pleasant items of my proposal. Due to their simplicity I didn't have any difficulties implementing and debugging them. During MLE their parameters occurred to converge to expected values nicely, as well.

TVP: Implementation and testing

TVP model was implemented in upper-level statespace module (tvp.py file), rather then in the regime_switching. Implementation is a concise extension of MLEModel class. I used Kim and Nelson's (1989) modelling changing conditional variance or uncertainty in the U.S. monetary growth ([1], chapter 3.4) as a functional test and for iPython notebook demonstration.
A special thing about TVP is that its MLE results class (TVPResults) has a plot_coefficient method, which can draw a nice plot of time varying parameters, changing with time:
 

Heteroskedastic disturbances

Adding heteroskedastic disturbances to observation equation (1) allows to make a model regime-switching:
where St is a Markov regime process.

MS-TVP: Implementation and testing

TVP model with heteroskedastic disturbances is implemented in switching_tvp.py file of regime_switching module. It is as concise and elegant, as a non-switching analog. I'm going to implement coefficient plotting soon.
I used Kim's (1993) Time-varying-parameter model with heteroskedastic disturbances for U.S. monetary growth uncertainty to perform functional testing. One nice thing about MS-TVP is that it finds a near-correct likelihood maximum from a non-switching start. As you can see in tests.test_switching_tvp.TestKim1993_MLEFitNonswitchingFirst class, I use 0.05% relative tolerance.

What's next?

The remaining part of the summer will be about improving and polishing existing models. Now I am working on adding heteroskedastic disturbances to transition equation (2). As I noted above, I have to add coefficient plotting for a switching model. Other goals are making a MS-TVP notebook demonstration and overall improvement of MS-AR model.

Literature

[1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

Sunday, July 10, 2016

GSoC 2016 #3

Improving the code

During May and June I've been working hard, producing thousands of code lines, implementing Markov switching state space logic and tests, assuring that everything works correctly. After the midterm evaluation I've already implemented Kim filter, switching MLE model and Markov switching autoregression all generally working and passing basic tests.
So this was a nice moment to take a break and look closer at the existing code. Since the primary aspect of the project is its usability and maintainability after the summer, a detailed documentation, covering some hard mathematical calculations with comments, architectural enhancements are even more important things to do than to produce another model.
Here are an items completed so far to achieve a perfect code.

Refactoring

Several architectural improvements were done to decompose functionality into logical modules and match Statsmodels state space idioms. Initial architecture of regime_switching module wasn't anything sophisticated but something that just worked for the beginning:

As you can see, the KimFilter class aggregated the entire regime switching state space functionality like a bubble of code, which is something obvious to split into parts.
Another inconvenient thing about KimFilter was its complex state architecture, that is, to perform filtering, first thing you need is to bind some data to the filter, optionally select a way of regime probabilities and unobserved state initialization, than call filter method, after that filtered_regime_probs, etc. attributes are fulfilled with a useful data. This is inconvenient, because you have to look after the current state relevance by yourself.
This is how regime_switching looks after completed refactoring iteration:



Responsibilities of a different kind are now divided between an increased number of entities:
  • SwitchingRepresentation handles switching state space model, that is, it aggregates KalmanFilter instances for every regime and stores a regime transition probability matrix. FrozenSwitchingRepresentation is an immutable snapshot of representation.
  • KimFilter class is related to filtering, but it neither performs actual filtering nor stores any filtered data, it only controls the process. The first thing is handled by private _KimFilter class, while the second - by KimFilterResults, which is returned from KimFilter.filter method.
  • Smoothing is organized in a mirrored way, as you can see from the diagram: KimSmoother, KimSmootherResults and _KimSmoother classes.
MLE model wasn't touched by any major changes, except that a private ssm attribute is now KimSmoother class instance, rather than KimFilter.

Docstrings

An iteration of documenting was also done. It touched all main entities and the testing code.
This process also had some educational advantages for me personally, because I often feel a problem to express my thoughts and ideas to other people (e.g. my classmates), when it is about a very abstract things like coding or Math. So this was a nice practice. Moreover, documenting helped me to improve the code to make it more clear and concise, sometimes it even helped me to find bugs.

Comments

When it comes to optimal implementation of mathematics algorithms with a lot of matrix manipulations, code becomes quite unreadable. This is where inline comments help a lot. I tried to comment almost every logical block inside every method, the most dense comments are in _KimFilter and _KimSmoother classes, doing all the hard computational work.

What's next?

I will continue to enhance written code. There is some interface functionality to be added and to be covered by smoke tests. Only after that I will switch back to model implementation (MS-DFM and MS-TVP).


Thursday, June 23, 2016

GSoC 2016 #2

Markov switching autoregression

If you studied statistics and remember basics of time series analysis, you should be familiar with Autoregressive model, usually denoted as AR(p):
Here y is an AR process, e is a white noise term, nu is a mean of the process. Phi is a polynomial of order p:
L is a lag operator, which, multiplied by time series element, gives previous element. So (1), actually, can be rewritten in the following explicit form:
Since the process definition (1) is essentially a linear equation between process lagged values and error, it can be put in a state space form, which is shown in [1], chapter 3.3.
Again, let's extend equation (1) by adding an underlying Markov discrete process St of changing regimes:
You can notice, that the mean, error variance, and lag polynomial become dependent on switching regime value. This is a so called Markov switching autoregressive model (MS AR). Where can it be used in practice? Let's look at the example from [2], chapter 4.4, which I also used for testing of my code:
This is a sequence of U.S. real GDP. Looking at the data, two regimes are noticeable - expansion and recession. Using maximum likelihood estimation, we can fit this data into two-regime switching mean AR model to describe real GDP changing law quantitatively. Authors use AR(4) model, so do we. The next picture displays (smoothed, that is conditional on the whole dataset) probabilities of being in the recession regime:

Peaks of probability accurately correspond to recession periods, which proves that Markov switching AR provides a sophisticated tool for analyzing an underlying structure of time process.

Implementation

Markov switching autoregression is implemented in ms_ar.py file in my PR to Statsmodels. This file contains MarkovAutoregression class, which extends RegimeSwitchingMLEModel. This class "translates" equation (4) to the state space "language".
It was quite entertaining to express ideas, explained in chapter 3.3 of [1] within the Python code. One thing I had to be very careful about was that having AR(p) model of k regimes, state space representation should carry k^(p+1) regimes, since switching means occur in (4) with different regime indices. Thus, every state space regime represents p+1 lagged AR regimes.
Such a big number of regimes leads to longer computation time, which caused some problems. For example, Kim filtering of the former real GDP model took 25 seconds, which is inappropriate, when we are doing a lot of BFGS iterations to find likelihood maximum. Luckily I found a way to optimize Kim Filter, which was quite straightforward, in fact. If you remember a previous blog post, Kim filter iteration consists of heavy-weight Kalman filter step, where Kalman filtering iteration is applied a lot of (k^(2(p+1)) for MS AR!) times, and then summing the results with weights, equal to joint probabilities of being in current and previous regime. The thing is that in the case of sparse regime transition matrix, which MS AR model is about, these joint probabilities are very rare to be non-zero, and we don't need to calculate Kalman filtering for zero ones! This feature decreased Kim filter routine evaluation dramatically, giving 2-3 seconds on my machine (which is not very powerful, by the way).

EM-algorithm

MarkovAutoregression class also has a feature of EM-algorithm. Markov switching autoregressive model, defined by (4), with some approximations, though, is a regression with switching parameters and lagged observations as regressors. Such model, as shown in chapter 4.3.5 of [2], has a simple close-form solution for EM iteration. EM-algorithm is a great device to reach a very fast convergence. For example, in the comments to my PR I copied a debug output with the following numbers:
#0 Loglike - -1941.85536159
#1 Loglike - -177.181731435
Here #0 indicates random starting parameters likelihood, and #1 indicates the likelihood of parameters after one iteration of EM-algorithm. A very significant improvement, isn't it?
 MarkovAutoregression has two public methods to run EM-algorithm: fit_em and fit_em_with_random_starts. First just performs a number of EM iterations for given starting parameters, while the second generates a set of random starting parameters, then applies EM-algorithm to all of them, finally choosing one with the best likelihood.

Testing

Right now there are two test files for MarkovAutoregression class, each based on one model - test_ms_ar_hamilton1989.py and test_ms_ar_garcia_perron1996.py. Besides formal functional tests, such as that filtering, smoothing and maximum likelihood estimation give correct values against this and this Gauss code samples, these files contain testing of EM-algorithm in its typical usage scenario - when user knows nothing about correct parameters, but wants to estimate something close to likelihood global maximum. This task is handled by already mentioned fit_em_with_random_starts method, which, by default, runs 50 sessions of EM-algorithm from random starts, each session consists of 10 iterations.

What's next?

I hope that the hardest part of the project, that is developing of Kim filter and Markov autoregression, is passed. Two more models remain: dynamic factor and time-varying parameters model with regime switching. There also will be a lot of refactoring of already written code, so some articles are going to be all about coding.

Literature

[1] - "Time Series Analysis by State Space Methods", Second Edition, by J. Durbin and S.J. Koopman.
[2] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

Sunday, June 5, 2016

GSoC 2016 #1

Kim filter


(This is a picture I get, searching for "Kim filter" in Google)

If you look at the code in my Pull Request, you can see the class KimFilter (kim_filter.py file), which is the main part of my project. So what does it exactly do?
There is such thing called a linear state space representation:

It describes some observed vector process yt in time. Ht, At, Ft, Gt are some matrices, zt is a vector of exogenous data and beta_t is an unobserved process vector, which generates yt. et and vt are white noise vectors and nu is the intercept term.
This representation is usually used, when we have some real life process yt, and we want to understand its internal structure, to find a law describing it.
Exact values of matrices in the model definition depend on unknown vector of parameters in a model-specific way. In the general case it's just a vector of all matrices' elements, written in a row (but I never witnessed anything like this before). To estimate the unknown parameters we use the principle of maximum likelihood, widely used in Statistics and Machine Learning, which says that the most likely parameter values are the most probable. To apply maximum likelihood estimation (MLE), we need to calculate probability density of the observed process values. This is an issue a well known Kalman Filter deals with, calculating filtered values of unobserved process in addition. Then we apply some numerical maximization towards Kalman filter output as a function of parameters and get the solution.
Let's modify the linear state space representation a little:

You can see that instead of time indexes under the representation matrices we now have St, a discrete Markov process with P states and P x P transition matrix, which represents a changing regime of the process. When the process St is known a priori, we have a usual linear model. But when it is unknown and random, the model is very complex. To estimate it, we use Kim Filter, described in [1], which has been implemented by me recently in formerly mentioned PR.
It iterates over the process and during every iteration runs through three steps:
  1. Kalman filter step (_kalman_filter_step method in KalmanFilter class). During this phase we use Kalman filter for every value of current and previous regimes and thus get a battery of P x P filtered states and their covariances, conditional on every previous and current regime combination. If you look in the code, you can see, that I delegate this calculations to P KalmanFilter instances, which is very handy.
  2. Hamilton filter step (_hamilton_filter_step method). During this phase we calculate the likelihood of current observation, conditional on previous observations, and append it to  the overall likelihood. Also we calculate some helpful weight probabilities.
  3. Approximation step (_approximation_step method). Here we sum up filtered states and covariances, obtained during the first step, with weights, obtained during the second step, to achieve filtered states, conditional on only current regime value.
For the detailed mathematical description you can refer to [1] or look around my code.

Maximum likelihood estimation

We now can estimate likelihood of the parameters. The next goal is to optimize parameters according to the likelihood. This responsibility is handled by RegimeSwitchingMLEModel class, located in rs_mlemodel.py file and derived from MLEModel class. It was very handy to extend MLEModel and substitute Kalman filter by Kim filter instance in a corresponding property. These filters have a very similar interface, and MLEModel, being more about optimizing, delegates all model-specific actions to filter instance.
All noticeable changes were about parameters transforming, since we have another group of parameters, representing transition matrix. Also, RegimeSwitchingMLEModel has got a feature, discussed in the next item.

Starting parameters


Just passing the likelihood function to optimizer doesn't guarantee its convergence to global maximum. Optimizer can get stuck in local maxima, so the goal is to start from somewhere very close to the best likelihood value. My proposal suggested two approaches to this problem:

  1. Just to pretend like there is no problem and let user provide the starting parameters. Probably he will enter something very close to global maximum. This scenario is also implemented, because it is used for testing, based on Kim's code samples, where author passes some a priori known starting values to optimizer.
  2. To fit a one-regime analog of the switching model, using provided data, first (to run this feature, set fit_nonswitching_first=True in fit method of your RegimeSwitchingMLEModel extension). Then to initialize a switching model, where transition matrix is set to identity (like if there is no switching) and all regimes are equivalent to that obtained by non-switching fit. This approach seems to be intuitive, because we are choosing the model, complicating it gradually, according to the likelihood improvement.

Tests

To test the model I used this Gauss code from Kim's site, implementing Lam's Generalized Hamilton model. There is a basic test test_kim.py of Kim filter, which just tests likelihood and filtered states, obtained from my code, against Gauss code output. I am lucky that it is passed! Also, there is a file test_rs_mlemodel.py, which tests MLE against the same code. It is passed as well! Almost the same, but commented and more readable code you can see in the notebook, presenting regime_switching module usage.
Looking at the output of the notebook, you can see, that fit-non-switching-first start works not so bad, likelihood differs not much from the best, as well as obtained parameters.

What's next?

I hope to start implementing models soon, to make my module more ready-to-use. The first model in a row will be Markov-switching Autoregressive model, and it probably will be the hardest one. There is going be some fun, like implementing EM-algorithm for starting parameters estimation, I can't wait to start! Feel free to leave comments if you don't understand something or have got any thoughts about my project. Thanks!

Literature

[1] "State-space Models With Regime Switching" by Chang-Jin Kim and Charles R. Nelson.

Friday, May 13, 2016

GSoC 2016 #0

About me


Welcome to my blog! Feel free to leave comments or anything. I am a student from Russia, right now I am on my 3rd year at Ural Federal University studying Computer Science. Also, I am studying at the Yandex School of Data Analysis, which is a free Master's level program by Yandex, Russian search engine and the biggest IT-company in the country. 
My interests lie in areas of Statistics, Machine Learning, also in Computer Vision and Natural Language Processing a little. I really love Mathematics, as well as I love its application, expressed within code.
My hobbies are listening to hip-hop music, working out in gym, visiting art galleries and reading novels by Dostoyevsky.
This year I am happy to take part in Google Summer of Code in Statsmodels under Python Foundation. It's a great opportunity, and I hope I will pass all deadlines successfully :)

About my GSoC project

The goal is to implement a Python module to do inference for linear state space models with regime switching, i.e. with underlying parameters changing in time by markovian law. A device for that, called Kim Filter, is described in "State-space Models With Regime Switching" book by Chang-Jin Kim and Charles R. Nelson.
Kim Filter includes Kalman Filter as a phase, which makes my work much easier and motivates pure Python approach, because I can delegate all the heavy lifting to statsmodels' Cython module performing Kalman Filter routine.
The next step of my project is to implement well-tested econometric models with regime switching, including Markov switching autoregression, Dynamic Factor model with regime switching and Time varying parameter model with Markov-switching heteroscedasticity. You can find details and exact specification of models in my proposal.
To perform testing, I am going to use Gauss code examples, published on Professor Chang-Jin Kim's website.

Setting up a development environment

To setup the environment, I followed advices of my mentor Chad Fulton, who helps me a lot with technical, coding, and field-specific issues. Probably, this would be helpful to anyone, who wants to contribute to statsmodels or any other Python library.
I am using Mac OS, so I performed the following steps:
  1. Deleted all versions of Statsmodels in my site-packages directory.
  2. Cloned the Statsmodels master repository to ~/projects/statsmodels (or anywhere else in your case).
  3. Added ~/project/statsmodels to my PYTHONPATH environment variable (included line export PYTHONPATH=~/projects/statsmodels:$PYTHONPATH at the end of ~/.bash_profile).
Now, any changes made to Python files are available when I restart the Python instance or use reload(module) command in the Python shell.
If I pull any changes to Cython files, I recompile them with python setup.py build_ext -i in statsmodels folder.

Running Gauss code

Code examples, provided by "State-space Models With Regime Switching" authors, require Gauss language interpreter, which is not a free software. But there is an open-source Ox console, which can run Gauss code.
But OxGauss doesn't support some Gauss functions by default, and you have to load analogous Ox language wrappers. In my case that was a function optmum, widely used in Kim-Nelson code samples, Ox Console developers provide M@ximize package for it. Another problem I spent some time to figure out is that M@ximize is incompatible with Ox Console 7, so I have to use 6th version, which works just fine.


What's next?

I will post regular reports about my work during the whole summer. I have already implemented a part of my project, so the next report is coming soon. But if interested, you already can see the code. Next time I will talk about design of Kim Filter and details of its implementation and testing. I'm sure you'll enjoy that!