Condividi tramite


Make Vectorize your friend in R

Old habits die hard. For someone coming from linear programming models such as C++ or Java, it takes quite a bit of effort before a fully-compliant R functions can be developed. The major pit-fall is, defining a function that is not vector-compliant.

For example, consider the below function that accepts two numbers and returns some computed value based on them.

     getNc2 = function(vc, ec)
    {
      # compute nC2 and return the ratio
      max_ec = vc * (vc-1) / 2
      return (trunc(ec * 100 / max_ec))
    }    

This is a valid function in R, and perhaps similar to any C++ or Java method, except it works on both single number input and vector inputs. For example, note the below single number input calls:

 > getNc2(5, 4)
[1] 40

> getNc2(6, 5)
[1] 33    

Now, those two calls can be combined into a single vector call as below and our above definition works perfectly fine without complaints:

     > getNc2(c(5,6), c(4,5))
    [1] 40 33    

The method accepted two vectors (instead of just two individual numbers) and returned vectorized results. Something not quite natural in C++ or Java, unless special care is taken to define the argument types as templated parameters and so on.

Now, what about the limits? Our method is computing a ratio which essentially involves a division. If we are not careful, we can endup with dividing-by-zero. For example, consider the below calls:

 > getNc2(0, 5)
[1] -Inf

> getNc2(0, 0)
[1] NaN    

Suppose, we want to handle these cases where the argument is zero and return 0 instead of -Inf or NaN. Then we will be adding an if condition as below:

     getNc2 = function(vc, ec)
    {
      # if no vertices return 0
      if(vc <= 1) return (0)
      # compute nC2 and return the ratio
      max_ec = vc * (vc-1) / 2
      return (trunc(ec * 100 / max_ec))
    }    

Now, the earlier calls will return 0 correctly as shown below:

 > getNc2(0, 0)
[1] 0
> getNc2(0, 5)
[1] 0    

However, the if condition introduces a new problem. if in R is not a vector-compatible opration. It works only for scalar inputs and as such our method hence fails on vector inputs. For example:

 > getNc2(c(5,0), c(4,5))
[1]   40 -Inf
Warning message:
In if (vc <= 1) return(0) :
  the condition has length > 1 and only the first element will be used    

Now, there are couple of ways you can address this problem. One would be to use apply constructs - but they can be overkill oftentimes. Other way is, using ifelse instead of if etc. vector-capable operations explicitly. That would work, provided you have the option to change the source. What about cases where you are using a method from some package and you cannot change the method ?

Take the method get.edge from igraph package, for example. It accepts an edge id and returns the vertices for that edge. However, if you supply a vector of edge ids, it will not work - it just returns the vertices for only the first edge, as can be seen from the below output:

 > cg <- erdos.renyi.game(8, 0.6)
> get.edge(cg, 13)
[1] 4 6
> get.edge(cg, c(13,15,16))
[1] 4 6    

How do you make these kind of methods work for vector inputs as well as scalar inputs? R has a nice base method under its sleeve that exactly addresses this problem. Named Vectorize, it helps in these situations - coverting scalar methods to vector methods.

Vectorize creates a function wrapper that vectorizes the action of its argument function. For example, we can vectorize our getNc2 method as shown below:

 > Vectorize(getNc2)(c(5,0), c(4,5))
[1] 40  0    

Instead of directly calling getNc2, we are rather creating a wrapper around it by calling Vectorize method with getNc2 as its argument, and supplying the original vector inputs to the resultant function. In other words, the above is same as:

 > vecNc2 <- Vectorize(getNc2)
> vecNc2(c(5,0), c(4,5))
[1] 40  0    

Remember, the vectorized vecNc2 method works perfectly fine on scalar inputs too. For example:

 > vecNc2(5, 4)
[1] 40    

By default Vectorize tries to vectorize all the arguments of the supplied method, which may not work sometimes. For example, the earlier example of get.edge in igraph:

 > vecGE <- Vectorize(get.edge)
> vecGE(cg, c(13,15,16))
Error in function (graph, id)  : Not a graph object    

Here the get.edge method is accepting two inputs, one the graph object itself and the second the edge id - and we want to vectorize only the second argument, namely the id argument and not the graph object. For this, we would have to use the vectorize.args facility of Vectorize method as shown below:

 > vecGE <- Vectorize(get.edge, vectorize.args="id")
> vecGE(cg, c(13,15,16))
     [,1] [,2] [,3]
[1,]    4    1    2
[2,]    6    7    7    

This is a simplified output of the Vectorize method. We can use the SIMPLIFY facility of Vectorize method to get the original results unaltered. For example:

 > vecGE <- Vectorize(get.edge, vectorize.args="id", SIMPLIFY=FALSE)
> vecGE(cg, c(13,15,16))
[[1]]
[1] 4 6

[[2]]
[1] 1 7

[[3]]
[1] 2 7    

Note, this is same as the lapply output.

 > lapply(c(13,15,16), function(id) get.edge(cg, id))
[[1]]
[1] 4 6

[[2]]
[1] 1 7

[[3]]
[1] 2 7    

So go ahead and use Vectorize whenever you want to avoid lapply constructs for methods that accept only scalar inputs.