viernes, 18 de enero de 2013

An R tip: string-to-time conversion on non-English machines

Hi, guys,

The next two paragraphs are bla-bla-bla, so if you've come here for an R tip please kindly scroll them. Thank you.

I have been absent from my blog for a while now, although I have promised myself to write at least once every two weeks. The reason for my not complying with this plan is that I have a belief that purely technical posts are barely interesting. You tech guys have your own research going on, and probably won't dig into mine, although the majority of stuff I post here is fun, not work. Nevertheless, it is truly hard to write about statistics and research and still keep it an entertaining piece of entry for a general reader. As myself. I try to do my best, really.

So, now I find myself in France (who could think of it, right?), specifically - in Lyon, and I am looking for a job. I have a pending offer in Barcelona which I am very willing to take, but the assessment process is not finished yet, so I sit tight waiting for the resolution. I have been clearly said that the things will be clear by the end of this month. In an unsuccessful outcome of the events, I will be moving to France permanently and intensively look for a job in the area of Lyon (the search area radius is 200 km). If you are my possible employer and have somehow and suddenly made your way up to this blog entry, please don't think that your possible offer is sort of a second best choice for me. It is not, not at all. I am very willing to work as a statistician here in Lyon or as a quant in Geneva or the like. The point is I have my life established in Barcelona already, and I have moved out and about a lot, no, A LOT in the past three years, and I speak a really basic French (yet), so I am just very scared to change the things again. Although deep inside I know that it will surely work out. France is actually nice, much nicer than you could guess as looking at it from the exterior. What the hell am I doing in Lyon? It is simple: my husband has a really good job here, and we have a flat and stuff, and a circle of friends, so it is easy for me to get settled here or in the surroundings. 

Ok, so now to an R tip.

As a person from Spain, I have my computer in Spanish. So my software is in Spanish, too, including the spreadsheet processing programs (I use Numbers and OpenOffice) and R.

Now imagine you have to do some analysis with a time-series, and you have your computer in a language other than English. You read your data from a .csv file, and in an R object your time-series date notions will appear like:


"1-dic-98"

This is because your spreadsheet programs are too smart and are taking care of you, and they want you to read the "01/12/1998" or even the 912501753 in a suitable-for-human format. In Spain, everything is adapted to a suitable-for-an-Iberian-human format, so even Eros Ramazzotti sings his songs in Spanish. Which can be a little tricky when you have to process data with R. 

In order to convert your data frame to a time-series format (ts, or zoo, or zooreg), you'll have to have your dates in a time format (POSIXct and POSIXlt or yearmon). You'll most probably do it with the function strptime, which has not been translated into Spanish yet. So it will just not understand the notions: 

"dic", "ene", "abr" and "ago"

What you could do here? Three things: 
  • change your R language to English (I personally won't do that)
  • use Excel to convert .xls to .csv and specifically indicate it that the dates should be numerical
  • use the function sub in R to replace a part (pattern) in a string


The latter solution is the easiest and a true time-saver. With just four lines of code, you're done:



Voilà! Now you can proceed with the strptime in a usual way.

P.S. If you have a French machine, with its abbreviations: "jan", "fév", "mar", "avr", "mai", "jun", "jul", "aoû", "sep", "oct", "nov", "déc" - it will take you five lines of code. 

No hay comentarios:

Publicar un comentario