Parallel processing inside an R Shiny app reactive expression

I have an example here of how to use the base R parallel package within a Shiny web application. In general, I do not put content into a Shiny app which requires a large amount of processing. I’d prefer the user visiting the app url to have to wait for nothing. I see Shiny apps more as presentation tools than as data processing tools. Nevertheless, performing parallel processing inside of reactive expressions really throws the shackles off and redefines what I previously was seeing as some of the “limitations” of Shiny.

Below are the ui.R and server.R scripts. Further below, I provide a more complex example of a reactive expression which has a parallel processing call within it.

The ui.R script:

shinyUI(pageWithSidebar(
	headerPanel("Parallel processing example"),
	sidebarPanel(
		h4("An attempt to parallelize an R function."),
		selectInput("n","N",c(10,100,1000),"100")
	),
	mainPanel(
		h4("Print means of 10 samples"),
		verbatimTextOutput("means")
	)
))

The server.R script:

library(shiny)
library(parallel)
shinyServer(function(input, output) {
	myFun <- function(x){ # some function to parallelize
		Sys.sleep(1)
		shiny::isolate(mean(rcauchy(input$n)))
	}

	# some dataset (vector, matrix, data.frame, list, etc.) that takes a while to process,
	# stored as a reactive expression, typically isolated and run by clicking an action button
	dat <- reactive({ 
		if(!is.null(input$n)){
			cl <- makeCluster(getOption("cl.cores", 10)) # I've hard-coded this for 10 cores
			v <- parLapply(cl,1:10,fun = myFun)
			stopCluster(cl)
		} else v <- NULL
		v
	})

	output$means <- renderPrint({ # Print the result to the main panel
		if(!is.null(dat())) dat()
	})

})

Notice how I isolate the only reactive expression in the function myFun. Then when parLapply is called within the reactive expression dat, myFun is successfully sent to each of 10 nodes and the results are brought back and stored in the list object v.

Here is a more convoluted reactive expression, something more realistic. I took this from one of my other apps which uses the randomForest package.

numVar <- reactive({
	input$cvRepsButton
	d <- NULL
	isolate(
		if(!is.null(input$cvRepsButton) & !is.null(input$n.reps)){
			if(input$cvRepsButton!=0){
				n <- as.numeric(input$n.reps)
				r <- input$response
				d <- d()
				d2 <- d2()
				cl <- makeCluster(getOption("cl.cores", n))
				clusterExport(cl,c("d","d2","r"),envir=environment())
				x <- parLapply(cl,1:n, fun = function(dummy) randomForest::rfcv(d2, d[,r],step=0.75))
				stopCluster(cl)
				err.cv <- sapply(x, "[[", "error.cv")
				d <- data.frame(x[[1]]$n.var, err.cv,rowMeans(err.cv))
				names(d) <- c("NV",paste("Rep",c(1:n),sep="."),"Mean")
				d <- melt(d,id="NV")
				names(d)[2:3] <- c("Replicate","CV.error")
			}
		}
	)
	d
})

Some things to note. In this case there is no non-reactive function defined elsewhere in which an isolate call is made. I state my function and its arguments directly inside parLapply. Notice how in this more realistic example, I reference an action button (from the shinyIncubator package) and then I use isolate to isolate all the other reactives inside of my numVar reactive expression. For something I would legitimately be considering parallel processing for, it is likely that this reactive expression may take some time to update when user reactive inputs change. For that reason, I do not want it constantly firing off every time the user changes something, forcing them to wait repeatedly. Rather, I isolate everything but the action button. This way, only when the user elects to press the button on their screen (which probably should have some statement next to it informing them of approximately how long they will have to wait), will the reactive expression update to reflect changes in reactive inputs.

Additionally, notice that I store other reactive expressions in the objects n, r, d, and d2. This approach makes the objects easier to reference later, particularly for exporting these objects to the multiple nodes. In the next two lines I open connections to multiple nodes with makeCluster and export the necessary variables to those nodes with clusterExport. With clusterExport I was required to include envir=environment(). I believe this is due to the fact that I am calling this function from within another function. I can’t be sure, but it’s not necessarily because I’m invoking it inside of a Shiny reactive expression. I might have to do this anyway just because it’s not being called from, and/or the objects it’s looking for are not in, the global environment.

Now we are all set! The next line uses parLapply from the parallel package, which is the parallel analog to lapply.

x <- parLapply(cl,1:n, fun = function(dummy) randomForest::rfcv(d2, d[,r],step=0.75))

I specify the object cl produced by makeCluster, the indices now go from 1:n, n being the number of nodes to use, which was one of our reactive inputs initally, and finally the function to be run n times. Here I happen to be using a random forest cross-validation function. Again, sometimes you have to specify a function’s namespace in order for it to be found by the function that is calling it. Last of all, I call stopCluster.

Everything after this is unimportant here. I only leave the remainder of the code in place so that you can see that all there is left to do is extract the desired data from the list object returned by parLapply and eventually make the object d, which is returned by numVar, and which began prior to the isolate call as NULL.

Similar to the simpler example above in the server.R script, this works because nothing that eventually gets passed to the different nodes for parallel processing is in a reactive context, due to the use of isolate. The earlier example looks much less involved, but the fact is I simply had much less to isolate there. In fact, in server.R I did not even need to use isolate in the reactive expression within which parLapply was called. There was so little to isolate that I was able to just wrap it around a specific line in an external non-reactive function. In that case I used it to isolate the reactive input$n. No isolation is then needed in the dat reactive expression. But it is important to note that the reference if(!is.null(input$n)) in dat is not isolated, and this is what allows dat to update whenever input$n changes, and the nodes, which do not have an active reactive context, are able to run the function they are each given because that component is isolated from the reactive context.

I hope this helps anyone else who is attempting to perform parallel processing in R from inside of a Shiny app. My understanding is not 100% and I hope my description does the topic some justice. Special thanks to Joe Cheng from RStudio for helping me figure out how to make this work! And of course, if you do something like this, it kind of goes without saying that you are limited to the number of available nodes on you local host or whatever web server you may be running your app on via Shiny Server.

This entry was posted by Matt Leonawicz.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: