Mike Badescu, PhD
Dallas R Users Group
March 24, 2018
March 24, 2018
Mike Badescu, PhD
Dallas R Users Group
March 24, 2018
Authors and Contributors
Typical Data Product Progression
4. Can we use it in production?
HackerNews API
{ "by" : "dhouston", "descendants" : 71, "id" : 8863, "kids" : [ 9224, 8952, 8917, 8884, 8887, 8943, 8869, 8940, 8958, ... ], "score" : 104, "time" : 1175714200, "title" : "My YC app: Dropbox - Throw away your USB drive", "type" : "story", "url" : "http://www.getdropbox.com/u/2/screencast.html" }
Facebook Graph API
{ "data": { "height": 50, "is_silhouette": false, "url": "https://scontent.xx.fbcdn.net/v/t1.0-1/p50x50/12006203_ ...", "width": 50 } }
Main Tool: curl
Facebook Graph API: using curl from Git Bash
Can we serve API requests from R?
Approaches
An R package that generates a web API from the R code you already have.
roxygen2
plumb()
on the .R
fileshiny
Decorators
#* @get /mean normalMean <- function(samples = 10) { data <- rnorm(samples) mean(data) } #* @post /sum addTwo <- function(a, b) { as.numeric(a) + as.numeric(b) }
install.packages("plumber") r <- plumber::plumb("docs/plumber_functions.R") r$run(port = 8000)
GET Method
POST Method
curl --data "a=4&b=3" "http://localhost:8000/sum"
Output customization, including JSON (default), HTML, PNG, JPEG
#' @get /plot #' @png function(){ myData <- iris plot(myData$Sepal.Length, myData$Petal.Length, main="All Species", xlab="Sepal Length", ylab="Petal Length") }
Filters
#* @filter checkAuth function(req, res){ if (is.null(req$username)){ res$status <- 401 # Unauthorized return(list(error="Authentication required")) } else { plumber::forward() } }
Dynamic Routes
/users/13
, where 13
is dynamic#' @get /users/<id> function(id) { subset(users, uid == id) }
Local Server
Web Server / Docker
nginx
A reliable and interoperable HTTP API for data analysis based on R.
tmp
key for extra info)Local Server Demo
install.packages("opencpu") library(opencpu) ocpu_start_app("rwebapps/appdemo", port = 5656)
Local Server UI: http://localhost:5656/ocpu/test/
- test GET and POST methods
Other Apps: https://www.opencpu.org/apps.html
- example: nabel
JavaScript Client on jsfiddle: http://jsfiddle.net/user/opencpu/
- example: Posting Code Snippets
Features
tmp
key exposes: the input, the R command, the code, the outputDeployment
plumber
, except no RStudio ConnectWeb Server / Docker
Function as a Service
FaaS providers (as of March 2018)
FaaS providers (cont.)
Common concerns
Previous Work
rpy2
to call R from PythonMotivation
Top view
How hard can it be?
.zip
deployment package and store it on S3.zip
deployment package and store it on S3.zip
deployment package and store it on S3
aws_lambda_r <- function(input_json) { output_json <- '{"message": "Cannot create output JSON"}' tryCatch({ input_lst <- from_json(input_json) request_id <- input_lst$request_id[1] output_lst <- list( result_id = request_id, result_lst = list(a = 1, b = 2:4), result_dbl = 1:10 / 2, message = NULL ) output_json <- to_json(output_lst) }, error = function(e) { output_json <<- paste0('{"message": "', e$message, '"}') }) output_json }
import os os.environ["R_HOME"] = os.getcwd() os.environ["R_LIBS"] = os.path.join(os.getcwd(), 'libraries') import rpy2 import ctypes import rpy2.robjects as robjects import json for file in os.listdir('lib/external'): file_name='lib/external/' + file ctypes.cdll.LoadLibrary(os.path.join(os.getcwd(), file_name)) # source R file # this R file might load libraries and source other files robjects.r['source']('example.R') # exposing R entry point to python aws_lambda_r = robjects.globalenv['aws_lambda_r'] def handler_get(event, context): input_json = json.dumps(event) output_json = json.loads(str(aws_lambda_r(input_json))) return output_json
What we need
~/.aws/
directory with two files~/.ssh/
directorylambda/
- will contain your .R
and .py
files before uploading to AWSpython/
- Python scrips (one is usually sufficient)scripts/
- configuration and deploymentsettings/
- default, auto-config, user settings and secretssettings/
settings_default.sh
secrets_default.sh
setup_auto.sh
setup_user.sh
scripts/
Auto-configuration: settings will be saved in setup_auto.sh
21_setup_vpc.sh
22_setup_custom_ami.sh
11_install_packages.sh
23_setup_s3.sh
24_setup_lambda.sh
scripts/
Local Scripts
01_main.sh
- calls all other local scripts02_setup.sh
- loads all settings03_check_settings.sh
- checks and print outs04_create_ec2.sh
- new EC2 instance05_update_ec2.sh
- update EC2 if no custom AMI found06_copy_files.sh
- copy files from local to EC2
settings/lambda_files.txt
07_deploy_lambda.sh
- calls remote scripts08_terminate_ec2.sh
- terminate EC2 instance09_test_deployment.sh
- curl deployment testsscripts/
Remote Scripts
12_configure_ec2.sh
13_create_deployment_package.sh
.zip
deployment package14_create_lambda_api_method.sh
15_create_alias_api_method.sh
Deployment to AWS region us-east-1
(N. Virginia)
tideverse
, DBI
packages installed/tmp
disk space (500MB) for large packages (e.g., BH
)## R version 3.4.4 (2018-03-15) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 16299) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached): ## [1] compiler_3.4.4 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2 ## [5] tools_3.4.4 htmltools_0.3.6 yaml_2.1.18 Rcpp_0.12.16 ## [9] stringi_1.1.7 rmarkdown_1.9 knitr_1.20 stringr_1.3.0 ## [13] digest_0.6.15 evaluate_0.10.1