Mike Badescu, PhD
Dallas R Users Group
March 24, 2018
March 24, 2018
Mike Badescu, PhD
Dallas R Users Group
March 24, 2018
Authors and Contributors
Typical Data Product Progression
4. Can we use it in production?
HackerNews API
{
"by" : "dhouston",
"descendants" : 71,
"id" : 8863,
"kids" : [ 9224, 8952, 8917, 8884, 8887, 8943, 8869, 8940, 8958, ... ],
"score" : 104,
"time" : 1175714200,
"title" : "My YC app: Dropbox - Throw away your USB drive",
"type" : "story",
"url" : "http://www.getdropbox.com/u/2/screencast.html"
}
Facebook Graph API
{
"data": {
"height": 50,
"is_silhouette": false,
"url": "https://scontent.xx.fbcdn.net/v/t1.0-1/p50x50/12006203_ ...",
"width": 50
}
}
Main Tool: curl
Facebook Graph API: using curl from Git Bash
Can we serve API requests from R?
Approaches
An R package that generates a web API from the R code you already have.
roxygen2plumb() on the .R fileshinyDecorators
#* @get /mean
normalMean <- function(samples = 10) {
data <- rnorm(samples)
mean(data)
}
#* @post /sum
addTwo <- function(a, b) {
as.numeric(a) + as.numeric(b)
}
install.packages("plumber")
r <- plumber::plumb("docs/plumber_functions.R")
r$run(port = 8000)
GET Method
POST Method
curl --data "a=4&b=3" "http://localhost:8000/sum"
Output customization, including JSON (default), HTML, PNG, JPEG
#' @get /plot
#' @png
function(){
myData <- iris
plot(myData$Sepal.Length, myData$Petal.Length,
main="All Species", xlab="Sepal Length", ylab="Petal Length")
}
Filters
#* @filter checkAuth
function(req, res){
if (is.null(req$username)){
res$status <- 401 # Unauthorized
return(list(error="Authentication required"))
} else {
plumber::forward()
}
}
Dynamic Routes
/users/13, where 13 is dynamic#' @get /users/<id>
function(id) {
subset(users, uid == id)
}
Local Server
Web Server / Docker
nginxA reliable and interoperable HTTP API for data analysis based on R.
tmp key for extra info)Local Server Demo
install.packages("opencpu")
library(opencpu)
ocpu_start_app("rwebapps/appdemo", port = 5656)
Local Server UI: http://localhost:5656/ocpu/test/
- test GET and POST methods
Other Apps: https://www.opencpu.org/apps.html
- example: nabel
JavaScript Client on jsfiddle: http://jsfiddle.net/user/opencpu/
- example: Posting Code Snippets
Features
tmp key exposes: the input, the R command, the code, the outputDeployment
plumber, except no RStudio ConnectWeb Server / Docker
Function as a Service
FaaS providers (as of March 2018)
FaaS providers (cont.)
Common concerns
Previous Work
rpy2 to call R from PythonMotivation
Top view
How hard can it be?
.zip deployment package and store it on S3.zip deployment package and store it on S3.zip deployment package and store it on S3
aws_lambda_r <- function(input_json) {
output_json <- '{"message": "Cannot create output JSON"}'
tryCatch({
input_lst <- from_json(input_json)
request_id <- input_lst$request_id[1]
output_lst <- list(
result_id = request_id,
result_lst = list(a = 1, b = 2:4),
result_dbl = 1:10 / 2,
message = NULL
)
output_json <- to_json(output_lst)
}, error = function(e) {
output_json <<- paste0('{"message": "', e$message, '"}')
})
output_json
}
import os
os.environ["R_HOME"] = os.getcwd()
os.environ["R_LIBS"] = os.path.join(os.getcwd(), 'libraries')
import rpy2
import ctypes
import rpy2.robjects as robjects
import json
for file in os.listdir('lib/external'):
file_name='lib/external/' + file
ctypes.cdll.LoadLibrary(os.path.join(os.getcwd(), file_name))
# source R file
# this R file might load libraries and source other files
robjects.r['source']('example.R')
# exposing R entry point to python
aws_lambda_r = robjects.globalenv['aws_lambda_r']
def handler_get(event, context):
input_json = json.dumps(event)
output_json = json.loads(str(aws_lambda_r(input_json)))
return output_json
What we need
~/.aws/ directory with two files~/.ssh/directorylambda/ - will contain your .R and .py files before uploading to AWSpython/ - Python scrips (one is usually sufficient)scripts/ - configuration and deploymentsettings/ - default, auto-config, user settings and secretssettings/settings_default.sh
secrets_default.sh
setup_auto.sh
setup_user.sh
scripts/Auto-configuration: settings will be saved in setup_auto.sh
21_setup_vpc.sh
22_setup_custom_ami.sh
11_install_packages.sh23_setup_s3.sh
24_setup_lambda.sh
scripts/Local Scripts
01_main.sh - calls all other local scripts02_setup.sh - loads all settings03_check_settings.sh - checks and print outs04_create_ec2.sh - new EC2 instance05_update_ec2.sh - update EC2 if no custom AMI found06_copy_files.sh - copy files from local to EC2
settings/lambda_files.txt07_deploy_lambda.sh - calls remote scripts08_terminate_ec2.sh - terminate EC2 instance09_test_deployment.sh - curl deployment testsscripts/Remote Scripts
12_configure_ec2.sh
13_create_deployment_package.sh
.zip deployment package14_create_lambda_api_method.sh
15_create_alias_api_method.sh
Deployment to AWS region us-east-1 (N. Virginia)
tideverse, DBI packages installed/tmp disk space (500MB) for large packages (e.g., BH)## R version 3.4.4 (2018-03-15) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 16299) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] compiler_3.4.4 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2 ## [5] tools_3.4.4 htmltools_0.3.6 yaml_2.1.18 Rcpp_0.12.16 ## [9] stringi_1.1.7 rmarkdown_1.9 knitr_1.20 stringr_1.3.0 ## [13] digest_0.6.15 evaluate_0.10.1