viernes, 23 de noviembre de 2012

Dr. What the thesis was about

On Tuesday, I've defended my thesis, and therefore I've become a doctor. They've graded me with an "Excellent", but I still have no idea what that would mean. Neither have I an idea whether it is relevant for my future career. I only know this is a European doctorate program, and therefore - a European PhD title, which means that I've spent roughly 1/4 of my PhD time abroad, learned a lot, met great people and had a lot of fun. The other 1/4 of my PhD time it was summer in Spain, so you can imagine, right? In a nutshell, these last three years have been pretty awesome.
Hereby, I provide a presentation with some brief highlights of the matter of my thesis. This is a shortened version of my actual PhD thesis presentation, a half of it, to be precise. My report lasted approxinately for an hour followed by another hour of questions.



The basic idea of the dissertation is to provide valid predictions of air pollution concentrations even in conditions of a severe lack of observed data. This is no news that air pollution influences adversely people's health and well-being. Both adults and children are affected, and both short- and long-term exposures lead to health effects. This is why air pollution assessment methodology is being constantly updated. There is a bunch of methods nowadays that serve for the purpose, but, roughly, they can be divided into two groups: cheap to implement, and expensive to implement yet extremely precise. The latter is a desiderata, of course, for all the methods. Sometimes, the prediction may lack validity. Especially when the actual concentrations at some points of the map (say, specifically where a person with some health effect lives) are not available. Then, the prediction error cannot be properly assessed. 

In my case, I had a tiny data set of mean annual concentrations of some contaminants. Those major contaminants can be altogether referred to as "criteria pollutants". Two most investigated of them are nitrogen dioxide and fine particulate matter, and these are the two that I have taken up for my study. For each of them, I had the annual values measured at the monitoring stations across the Barcelona Metropolitan Region. There are 49 stations in total, and for every year and for every pollutant the measurements were available roughly at 24. So, there were 24 points on the average for every year and pollutant. 


In order to provide a valid prediction for pollution surface for every year and pollutant, conformal predictors have been employed. It is a technique that has been recently developed by people at the Royal Holloway University of London, more precisely, in its unit called Computer Learning Research Centre. This is a machine learning method, and it comes from statistical learning theory. A conformal predictor is always valid, and it can be build upon almost any statistical algorithm, including, of course, regression - the one that is majorly used for air pollution modeling. 


A conformal predictor has been derived on the basis of a classic kriging: this method has been chosen because of the given data configuration. The next step is to derive an anisotropic approach for  kriging, once more data is available. Also, a conformal predictor on the basis of the most (I'd say "pop", but it is a serious blog) frequently used algorithm, land use regression is on the way.

If you have any questions, please do not hesitate to ask.

P.S. Now I am back to financial data analysis. My next post will be about  personal income savings, the most popular bank in Spain for this purpose (in my perception)and its efficiency as such. 

2 comentarios: