Sunday, July 10, 2016

GSoC 2016 #3

Improving the code

During May and June I've been working hard, producing thousands of code lines, implementing Markov switching state space logic and tests, assuring that everything works correctly. After the midterm evaluation I've already implemented Kim filter, switching MLE model and Markov switching autoregression all generally working and passing basic tests.
So this was a nice moment to take a break and look closer at the existing code. Since the primary aspect of the project is its usability and maintainability after the summer, a detailed documentation, covering some hard mathematical calculations with comments, architectural enhancements are even more important things to do than to produce another model.
Here are an items completed so far to achieve a perfect code.

Refactoring

Several architectural improvements were done to decompose functionality into logical modules and match Statsmodels state space idioms. Initial architecture of regime_switching module wasn't anything sophisticated but something that just worked for the beginning:

As you can see, the KimFilter class aggregated the entire regime switching state space functionality like a bubble of code, which is something obvious to split into parts.
Another inconvenient thing about KimFilter was its complex state architecture, that is, to perform filtering, first thing you need is to bind some data to the filter, optionally select a way of regime probabilities and unobserved state initialization, than call filter method, after that filtered_regime_probs, etc. attributes are fulfilled with a useful data. This is inconvenient, because you have to look after the current state relevance by yourself.
This is how regime_switching looks after completed refactoring iteration:



Responsibilities of a different kind are now divided between an increased number of entities:
  • SwitchingRepresentation handles switching state space model, that is, it aggregates KalmanFilter instances for every regime and stores a regime transition probability matrix. FrozenSwitchingRepresentation is an immutable snapshot of representation.
  • KimFilter class is related to filtering, but it neither performs actual filtering nor stores any filtered data, it only controls the process. The first thing is handled by private _KimFilter class, while the second - by KimFilterResults, which is returned from KimFilter.filter method.
  • Smoothing is organized in a mirrored way, as you can see from the diagram: KimSmoother, KimSmootherResults and _KimSmoother classes.
MLE model wasn't touched by any major changes, except that a private ssm attribute is now KimSmoother class instance, rather than KimFilter.

Docstrings

An iteration of documenting was also done. It touched all main entities and the testing code.
This process also had some educational advantages for me personally, because I often feel a problem to express my thoughts and ideas to other people (e.g. my classmates), when it is about a very abstract things like coding or Math. So this was a nice practice. Moreover, documenting helped me to improve the code to make it more clear and concise, sometimes it even helped me to find bugs.

Comments

When it comes to optimal implementation of mathematics algorithms with a lot of matrix manipulations, code becomes quite unreadable. This is where inline comments help a lot. I tried to comment almost every logical block inside every method, the most dense comments are in _KimFilter and _KimSmoother classes, doing all the hard computational work.

What's next?

I will continue to enhance written code. There is some interface functionality to be added and to be covered by smoke tests. Only after that I will switch back to model implementation (MS-DFM and MS-TVP).


No comments:

Post a Comment