Tuesday, 14 May 2013

Claims Inflation - a known unknown

Over the last year I worked with two colleagues of mine on the subject of inflation and claims inflation in particular. I didn't expect it to be such a challenging topic, but we ended up with more questions than answers. The key question and biggest challenge is to define what inflation, or indeed claims inflation actually is and how to measure it. We published a summary of our thoughts and findings in this month's issue of The Actuary.

Last year's discussion about the differences between the retail price index (RPI) and consumer price index (CPI) in the UK only exemplified the challenge. The economist Tim Harford illustrated the differences between the RPI and CPI with a simple example of price changes for a shirt and blouse in his Radio 4 programme More or Less. The radio podcast is still available from the BBC. Start listening after about 18 minutes into the show.



Tuesday, 7 May 2013

R in Insurance: Programme and Abstracts published


I am delighted to announce that the programme and abstracts for the first R in Insurance conference at Cass Business School in London, 15 July 2013, have been published.

The conference committee received strong abstracts from academia and the industry, covering:
  • Pricing
  • Reserving
  • Data mining
  • Capital modelling
  • Automate reporting
  • Catastrophe modelling
  • High-performance computing
  • Software development management
Register by the end of May to get the early bird booking fee.

We gratefully acknowledge the sponsorship of Mango Solutions and CYBAEA, without whom the event wouldn't be possible.

Programme and Abstracts


Register by the end of May to get the early bird booking fee.

Tuesday, 30 April 2013

How to change the alpha value of colours in R

Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has an argument for alpha, in combination with the wonderful apply and sapply functions.


The example below illustrates how this function can be used with colours provided in different formats, thanks to the col2rgb function.

Tuesday, 23 April 2013

Review: Kölner R Meeting 12 April 2013

Our 5th Cologne R user group meeting was the best attended meeting so far, with 20 members finding their way to the Institute of Sociology for two talks by Diego de Castillo on shiny and Stephan Holtmeier on cluster analysis, followed by beer and schnitzel at the Lux, a gastropub nearby.

Shiny

Diego gave an overview of the design principles behind shiny, which provides a powerful API to build web apps in pure R. His explanation of the reactive programming model was particularly helpful to understand how shiny works under the hood and why it is so responsive. His live demonstrations of shiny even included shiny server, which he had running in a virtual machine. Diego's slides are available via our Meetup site.

Diego de Castillo: Introduction to shiny

You can hear more from Diego and me at the UseR!2013 conference in Albacete, where we will give a googleVis tutorial. We will touch on googleVis on shiny as well. A dedicated shiny tutorial will be given in the afternoon by Josh and Winston from RStudio.

Cluster analysis

Stephan Holtmeier, who is a psychologist by background, presented an introduction to cluster analysis with R, motivated by his work in analysing survey data. As a toy example he used a 360° feedback survey of a group of managers within a big company. In his example he wanted to understand the profile of those managers better. Stephan illustrated how a cluster analysis can help to identify groups of managers with similar strengths, e.g. for communication, leadership and/or performance. Depending on how he measured the distance between managers he could look for people who have similar levels of competency or a similar profile (correlation). Stephan also touched on the differences between hierarchical and centroid based cluster analysis, such as k-means. You can find Stephan's slides (in German) also on our Meetup site.

Stephan Holtmeier: Cluster Analysis with R

For more information on cluster analysis functions in R see also the cluster task view on CRAN. If you would like to get an overview of how psychologists look at data, then check out William Revelle's vignette of the psych package. Finally, if you are interested in how a k-means cluster analysis can be used for image manipulation, see an earlier post of mine.

Next Kölner R meeting, 19 July 2013

The next meeting has been scheduled for 19 July. Günter Faes will present his experiences using the XLConnect package as an interface between R and Excel. Dietmar Janetzko agreed to present how he used R and Twitter to predict exchange rate movements. Of course, the evening will close with a few Kölsch in a nearby beer-garden.

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

Tuesday, 16 April 2013

Test Driven Analysis?

At the last LondonR meeting Francine Bennett from Mastodon C shared some of her experience and findings from an analysis of a large prescriptions data set of the UK's national health service (NHS). However, it was her last slide, which I found the most thought provoking. It asked for the definition of the following term:
Test-driven analysis?
Francine explained that test driven development (TDD) is a concept often used in software development for quality assurance and she wondered if a similar approach could be also used for data analysis. Unfortunately the audience couldn't provide her with the answer, but many expressed that they face similar challenges. So do I.


Indeed, how do I go about test driven analysis? How do I know that I haven't made a mistake, when I start an analysis of a new data set? Well, I don't. But I try to mitigate risks. Similar to TDD, I consider which outputs I should expect from my analysis. Those outputs form the test scenarios of my analysis. Basically I try to write down everything I know, before I start working with the data, e.g.
  • any other data sets or reports I can use for cross referencing,
  • any back-of-the-envelope analysis I can carry out to provide ballpark answers,
  • any relativities and ratios which should hold true,
  • any known boundaries and thresholds,
  • test scenarios for my code with small well known data, for which I know the outcome,
  • names of experts, who could sense check and peer review my output.
But most importantly: I try to think long and hard which questions I want to answer, following the advice of John Tukey: Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.

Tuesday, 9 April 2013

How to set axis options in googleVis

Setting axis options in googleVis charts can be a bit tricky. Here I present two examples where I set several options to customise the layout of a line and combo chart with two axes.


The parameters have to be set in line with the Google Chart Tools API, which uses a JavaScript syntax. In googleVis chart options are set via a list in the options argument. Some of the list items can be a bit more complex, often wrapped in {} brackets, e.g. for various formatting options or in [] brackets, if there are multiple series to consider. Within those brackets sub-options are set via argument : value, using the : character for assignments.

There are many other options as part of the Google Chart Tools API, which are not supported by googleVis yet, such as columns roles, controls and dashboards, etc. Please get in touch if you have ideas in this regard and/or would like to collaborate.

In my first example I display two series of dummy data in a line chart with two axes. The left hand scale is in percentages and the right hand scale in amounts. Note in the code below how I set the various parameters and the placements of the different kinds of brackets.

Monday, 8 April 2013

Next Kölner R User Meeting: 12 April 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 12 April 2013. We will discuss cluster analysis and shiny. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here.



Thanks also to Revolution Analytics, who sponsors the Cologne R user group as part of their vector programme.