SKIL Documentation

The Skymind Platform

SKIL Community Edition (SKIL CE) gives developers an easy way to train and deploy powerful deep learning models to production environments quickly and easily.

SKIL CE is a free, on-premise, AWS-like platform for machine learning, where data scientists and data engineers can use an open-source stack of machine learning and big data tools. It enables a managed Spark/GPU cluster as well as a managed AI model server for experiment tracking and model deployment, accessible through notebooks and a GUI. The platform is extensible, like a job runner for machine learning apps.

Get Started

Welcome to SKIL CE v1.0.1

SKIL CE: A Free Machine Learning Platform

The Skymind Intelligence Layer (SKIL) gives developers an easy way to train and deploy powerful deep learning models to production environments quickly and easily.

The community edition (SKIL CE) is a free, on-premise, AWS-like platform for machine learning, where data scientists and data engineers can use an open-source stack of machine learning and big data tools. It enables a managed Spark/GPU cluster as well as a managed AI model server for experiment tracking and model deployment, accessible through notebooks and a GUI. The platform is extensible, like a job runner for machine learning apps.

In this quick start, you’ll learn how to:

  • Download and install SKIL CE 1.0.1
  • Create a sample workspace notebook and train a model
  • Deploy the model to the SKIL model server
  • Get a prediction from the REST endpoint
    • Via a Web browser
    • Via Java code

Currently, the SKIL Community Edition supports the Centos 7 and Redhat 7 operating systems directly. To try SKIL on other systems please see this page for instructions on setting up our Docker image.

Installing SKIL CE on AWS

A simple way to setup SKIL CE on AWS is by using official CentOS 7 images.

Specific details: CentOS Linux 7 1701_01 (2018-Mar-31), us-east-1 - AMI: ami-ae7bfdb8 (x86_64)

Users will need to install git and Apache Maven as well on their Linux image to do specific parts of the quick start.

Setup and Start SKIL CE

First, disable SELinux. Run setenforce 0 to disable it temporarily, or follow this guide to disable it permanently:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Enabling_and_Disabling_SELinux-Disabling_SELinux.html

Make sure that ports 9008 and 8080 are open.

To start the SKIL CE system locally, use the following command:

[centos@skil ~]$ sudo systemctl start skil

Wait about a minute for SKIL to finish starting. Everything is up and running when the output of jps looks like this:

$ jps
38802 ZeppelinInterpreterMain
43044 Jps
38341 ModelHistoryServer
38295 ProcessLauncherDaemon
38363 ZeppelinMain

Once SKIL CE is running locally, log in to build and deploy your first model.

Log In to SKIL

To log in to a local installed version of SKIL CE, visit the address http://[host]:9008in a Web browser. This will lead to the login screen below.

For the Community Edition of SKIL, there's a single account and login:

Login: “admin”
Password: “admin”

SKIL CE is meant to be used and evaluated by a single data scientist, so in this quickstart article, we will ignore role and account management.

Next, this guide will how how to build a notebook and deploy it to the model server.

Taking a Model from Notebook to Deployment

The purpose of SKIL and SKIL CE is to prototype deep learning applications quickly, and deploy the best models to a production-grade AI model server.

Deep learning models can be complex to build and unwieldy to deploy. SKIL addresses these pain points for data scientists and infrastructure engineering teams. Deep learning has a wide domain of applications in predictive analytics and machine perception, and it impacts nearly every industry.

Here are a few deep-learning models that can integrate into our applications:

  • ICU Patient Mortality
  • Computer Vision
  • Anomaly Detection

Beyond building the above models, SKIL lets developers plug their predictive engines into real-world applications such as:

  • A Tomcat Web application
  • A Wildfly application
  • A mobile application
  • A streaming system (e.g. Streamsets)
  • Robotics systems and edge devices (Somatic, Lego Mindstorms)

SKIL helps data scientists build deep learning models and integrate them into applications with a workbench and AI model server.

With it, you can track multiple versions of a model, compare how each model performs after training, and deploy the best model to production on SKIL’s AI model server with a single click.

With SKIL CE, developers can start building real, state-of-the-art deep learning applications without worrying about infrastructure and plumbing.

The diagram below provides a general overview of how the SKIL workspace system and SKIL’s AI model server work together to operationalize deep learning applications for enterprise.

Here’s how to build your first SKIL model with SKIL CE.

Create Workspace

To get started, create a new workspace in the SKIL user interface by following these steps:

1) Click on the workspaces tab on the left side of the main SKIL interface to bring up the workspaces screen, as seen in the image below.

Every new workspace is a place for data science teams to conduct a set of “experiments” centered around a given project or problem to solve.

“Experiments” are just different algorithm architecture and configurations, linked with data pipelines and applied to a given problem. A workspace is a shared laboratory that data scientists can use to test which neural networks perform best on that problem. An experiment in SKIL can be contained in a Zeppelin notebook, which allows you to save, clone, and run a specific modeling job as needed.

Each notebook is tracked and indexed by SKIL, and a trained model that has been configured in the notebook can be sent directly to SKIL’s AI model server. A SKIL workspace can have many different notebooks containing different experiments that are carried out as data scientists seek the best model.

2) Create a Workspace

After clicking on the workspaces tab on the left, click on the “Create Workspace” button on the right side of the page (see below).

Clicking on the “Create Workspace” button brings up this dialog window:

In this window, you name a workspace, and keep in mind that the name should distinguish it from all other workspaces to be created for your team's projects.

For this tutorial, you can call this workspace “First Sensor Project.” We can also add a few labels to this workspace to help identify it later (e.g., “side_project”, or “prod_job”, etc). When you’re ready to finalize the new workspace, click “Create Workspace” on the lower right corner of the window.

A new workspace should appear in the list of workspaces:

Clicking on the name of the workspace just created (“First Sensor Project”) will show the workspace details:

Now you are ready to create the first experiment in the new workspace.

Create Experiment

Inside a workspace, you can create multiple experiments, all of which will be contained in separate Zeppelin notebooks. You and your team can create, run, and clone these notebook-experiments to improve collaboration and speed up time to insight.

By clicking on the “Create New Experiment” button, you’ll bring up the “Create Experiment” dialog window (below).

Using the input box under “Experiment Name," give this experiment a unique and descriptive name that will make it easy to find later.

Select the only listed option for “Model History Server ID” and “Zeppelin Server ID” (in the present version of SKIL CE, there will be only one option for each of them). You can also provide a distinct notebook name that will apply within the Zeppelin notebook storage system.

Once the experiment and its notebook are set up, click the “Create Experiment Notebook” button. The new experiment should appear in the list of experiments in the current workspace (below).

With the experiment created, check out the associated experiment notebook by clicking the “Open Notebook” button for the new experiment. That will bring up the embedded notebook system (below).

Note: the first time you use a notebook in SKIL, you need to initialize the SKIL system. Click “Login” within the Zeppelin window and use an “admin” username and “admin” password.

Once logged in, click the “Notebook” drop-down menu and select your experiment’s notebook.

Each notebook is initiated with a generic template containing DL4J code that could serve as the basis of a typical deep learning project.

For this example, you’re going to build an LSTM model based on sensor data. The code for this notebook is here:

uci_quickstart_notebook.scala

In this specific example, delete the blocks of template code and copy and paste blocks of code from the GitHub link above into the notebook. The notebook should save automatically, but to save it at any point, you can make a specific commit to the Zeppelin version control system with the “version control” button, as shown below.

You can click on the version control button inside Zeppelin to add a commit message when saving the current state. Once the notebook code is entered into the notebook itself and saved, you’re ready to execute this notebook and produce the model.

Run Experiment

To run the experiment in the notebook, click on the “play” icon on the top toolbar inside the embedded Zeppelin notebook UI. This will run all of the code paragraphs inside that notebook.

The notebook will take some time to run. Once it’s complete, its output will be visible within the notebook itself.

This guide will walk through a few of the notebook's code snippets. The first paragraph runs all the needed imports (in Scala) and sets the other paragraphs up for execution.

There are four major functional areas within this notebook:

  1. UCI data download and data prep / ETL
  2. Neural network configuration
  3. Network Training loop
  4. Registering modeling results with SKIL model server

The first area of the highlighted code downloads the training data and performs some basic ETL. The link takes us to code where we see the raw CSV data:

  1. Loaded from disk
  2. Converted into sequences
  3. Statistics about the data are collected
  4. Sequences are then normalized/standardized

In the next snippet of code, we configure the neural network with the DL4J API. In this example, we’re using a variant of a recurrent neural network called a long short-term memory network (LSTM).

// Configure the network
val conf: ComputationGraphConfiguration = new NeuralNetConfiguration.Builder()
  .seed(123)
  .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
  .iterations(1)
  .weightInit(WeightInit.XAVIER)
  .updater(new Nesterovs(0.005, 0.9))
  .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
  .gradientNormalizationThreshold(0.5)
  .graphBuilder()
  .addInputs("input")
  .setInputTypes(InputType.recurrent(1))
  .addLayer("lstm", new GravesLSTM.Builder().activation(Activation.TANH).nIn(1).nOut(10).build(), "input")
  .addVertex("pool", new LastTimeStepVertex("input"), "lstm")
  .addLayer("output", new OutputLayer.Builder(LossFunction.MCXENT)
            .activation(Activation.SOFTMAX).nIn(10).nOut(numLabelClasses).build(), "pool")
  .setOutputs("output")
  .pretrain(false)
  .backprop(true)
  .build()

The network has a single LSTM hidden layer (Graves variant) and a softmax output layer to give probabilities across the six classes of time series data. It is a classifier with six categories being predicted. The network is set to train for 40 epochs (each epoch is a complete pass over all records in the training set).

val nEpochs: Int = 40
for (i <- 0 until nEpochs) {
  network_model.fit(trainData)
  // Evaluate on the test set:
  val evaluation = eval(testData)
  var accuracy = evaluation.accuracy()
  var f1 = evaluation.f1()
  println(s"Test set evaluation at epoch $i: Accuracy = $accuracy, F1 = $f1")
  testData.reset()
  trainData.reset()
}
// Right below the training loop in the notebook, a few debug lines show how to query an LSTM network.
// Test one record (label should be 1)
val record = Array(Array(Array(
  -1.65, 1.38, 1.37, 2.56, 2.72, 0.64, 0.76, 0.45, -0.28, -2.72, -2.85, -2.27, -1.23, -1.42, 0.90,
  1.81, 2.77, 1.12, 2.25, 1.26, -0.23, -0.27, -1.74, -1.90, -1.56, -1.35, -0.54, 0.41, 1.20, 1.59,
  1.66, 0.75, 0.96, 0.07, -0.70, -0.32, -1.13, -0.77, -0.96, -0.55, 0.39, 0.56, 0.52, 0.98, 0.91,
  0.23, -0.13, -0.31, -0.98, -0.73, -0.85, -0.77, -0.80, -0.04, 0.64, 0.77, 0.50, 0.98, 0.40, 0.24
)))
var flattened = ArrayUtil.flattenDoubleArray(record)
var input = Nd4j.create(flattened, Array(1, 1, 60), 'c')
var output = network_model.output(input)
var label = Nd4j.argMax(output(0), -1)
println(s"Label: $label")

The label returned from the network prediction should be “1”, which we’ll hand check from the client side in a moment. The notebook ends with a block of code that collects the model just trained, and then catalogs it in the model-history tracking system.

Each experiment needs to send a model and its evaluation metrics to the model server to be registered and archived. Each SKIL notebook must include a small amount of code, explained below, to make sure the model gets stored properly.

val modelId = skilContext.addModelToExperiment(z, network_model)
val evalId = skilContext.addEvaluationToModel(z, modelId, evaluation)

These lines of code include the correct import headers, they create the skilContext object near the top of the notebook, and they connect this notebook with the result of the SKIL system.

In the first line, the specific model is attached to the experiment in the SKIL system. In the next line, the evaluation metric results are cataloged with the model ID tag in SKIL, so the model can be evaluated in the UI against other models later.

Catalog Model

The AI model server allows SKIL to store and integrate deep learning models with AI applications. It stores all of the model revisions for a given experiment, and lets you choose which model you’d like to "deploy" or mark as “active”. Deploying a particular model means that it will serve the predictions to any production applications that query a particular REST endpoint.

After the above notebook example is done running in SKIL, click on the “Models” sub-tab in the experiment page to see the new model listed in the table (as below). This may require a page refresh.

Now that you’ve built a model with the notebook and made sure the model was cataloged in the model server, the next guide will show how to expose the model to the rest of the world through the REST interfaces by deploying the model.

Deploy Model

Once the model is indexed in the model history server, it will show up in the list of models that can be deployed to production to handle new data inference requests.

In the “Models” sub-tab in the experiment page, there’s a list of all models produced by notebook runs for this experiment, as seen above. Clicking on one of the models in the list will bring up specific model details (below).

For each model in this list, two operations can be performed:

  1. Mark Best
  2. Deploy

Marking a model as “Best” will pin it to the top of the model list, as seen above. Clicking the “Deploy Wizard” button brings up a “deploy model” dialog window.

Within SKIL, a deployment is a logical group of models, transforms, and KNN endpoints. It helps us logically group deployed components in order to track which components belong together and manage the system better.

As explained in the dialog, this wizard will make your model available via a REST API. You can expose the ETL process as a transform endpoint, and configure the model. You also have the option to update an existing model in place. Clicking “next” will let you either create a new deployment or replace an existing one, as seen in the dialog below.

Let’s create a new deployment and name it “OurFirstDeployment”. In the dialog window that comes up after pressing the “Next” button, you see the option to “deploy the current ETL JSON as a transform”.

This refers to the vectorization of data for ETL (explaining the details of that process is beyond the scope of this quickstart article). For now, you can leave that checkbox unchecked. Clicking “Next” again takes us to the final deployment wizard screen (below).

The “name” option differs from “deployment name”, as it distinguishes the model within the deployment group. It is a required field. For this example, use the name “lstm_model”. You can also see the static file path for the physical model file in the local filesystem.

The “Scale” option tells the system how many model servers to start for model replication. SKIL CE is limited to 1 model server, so you don’t have to change that parameter.

In the next line, the option exists to provide additional JVM arguments. The last option, “Endpoint URLs”, gives the option of housing multiple models under the same URI. We won’t set this option in the course of this quick start tutorial. Accept the “Deployment Mode” as “New Model”, and then click “Deploy” to finalize the deployment.

Once the deployment is finalized by clicking the “Deploy” button, the model will be listed in the “Deployments” screen (below).

Clicking on the entry for this newly deployed model brings up the deployment details (below).

NOTE: If the model is not deployed, click the “Start” button on the model:

This deployment includes its respective ETL vectorization transforms. It also includes the endpoint URLs tied to the model server in the “endpoint” column.

Troubleshooting

You may see the following error at some point (Deploying Model, No JWT present or has expired):

If so, leave the current browser tab open. Create a new tab and log in again. Close the new tab once you’re logged back in. Try once more to perform the original action that caused the error.

Inference

Get Predictions via Model Server

Once a deployment has been created and launched, you need to go the “last mile” by actually serving live predictions from this newly created AI model to a real application.

In this section, you’ll learn how to set up a sample Java client (which could easily be integrated into a Tomcat or Java application) to query the model you just built with the notebook.

The model server is running locally and it has exposed a REST endpoint at:

http://[host]:9008/endpoints/ourfirstdeployment/model/lstmmodel/default/

Now configure the client to “speak REST” properly and send the specific query using input data that you select. Let’s take a look at how to get these predictions, with the REST Java client sample code below.

Get Predictions via REST

To get a working SKIL model server client, “git clone” the project stored here...

https://github.com/SkymindIO/SKIL_Examples

...with the following command:

$ git clone git@github.com:SkymindIO/SKIL_Examples.git

Build the client application JAR with the following commands:

$ cd SKIL_Examples
$ cd sample-api
$ mvn clean package

Once you build the client application, use the JAR to make a REST call to SKIL’s AI model server with the following command:

$ java -jar target/skil-ce-examples-1.0.0.jar quickstart http://host:9008/[skil_endpoint_name]

Note

Replace [skil_endpoint_name] with the endpoint to which your model was deployed.

The output of the client example code should look like this:

Inference response: Inference.Response.Classify{results{[1]}, probabilities{[0.9729845]}
  Label expected: 1
Inference response: Inference.Response.Classify{results{[4]}, probabilities{[0.8539419]}
  Label expected: 4
Inference response: Inference.Response.Classify{results{[5]}, probabilities{[0.9414516]}
  Label expected: 5
Inference response: Inference.Response.Classify{results{[5]}, probabilities{[0.94135857]}
  Label expected: 5
The output above shows inference predictions the SKIL AI model server returns, referring to specific labels in the classifier.

With that, you’ve built your first deep learning application with SKIL, from notebook to deployed production model. Not so hard, was it?

Watch this space for further tutorials on more applications of deep learning in production.

Welcome to SKIL CE v1.0.1