Tensorflow JS: Linear Regression with Webpack and ES6 Classes

AI in the browser [1 of 2] Creating a Tensorflow JS model in a Webpack environment

This article acts as an introduction to using Tensorflow JS in the browser, by adopting ES6 class syntax within a Webpack environment. The accompanying demo for the project discussed can be found on Github here.

Before structuring a linear regression prediction model with @tensorflow/tfjs and the accompanying graphing solution, @tensorflow/tfjs-vis, we will set up a bare-bones Webpack environment specifically for building Tensorflow JS solutions, that will lend support to the latest features of Javascript while providing polyfills for the final build of the app.

We will refer and expand upon the two-dimensional linear regression tutorial hosted on the Tensorflow JS tutorials section, refactor the code to coincide with modern development standards, and expand on some of the concepts mentioned for a clearer understanding of how Tensorflow JS works and its capabilities.

Prelude: AI is now in the browser, with Tensorflow JS

Tensorflow models in the browser have never been more capable, with Tensorflow JS and the accompanying Javascript ecosystem of tools to aid in the development of predictive solutions. The online demos are already quite impressive, in conjunction with the open source models readily available to test-drive what Tensorflow JS is capable of, from predicting a human posture to sentiment of textual content.

With Tensorflow’s primary implementation being in Python, it may be a tough ask for Javascript developers to onboard another programming language and environment into their workflow, on top of learning the concepts required for developing machine learning solutions. This paradigm changes with Tensorflow JS, with the package allowing Javascript developers to get their hands dirty within the set of tools and syntax they’re familiar with, leaving the concepts and terminology of ML as the main barrier to entry.

How capable is Tensorflow JS now?

Tensorflow JS utilises WebGL to process models, the code of which is wrapped with APIs you will be readily familiar with, mostly with the tf.<method>() syntax. Some methods are promise based, whereas some are synchronous. Some utilise callback functions as arguments, and objects are heavily used to pass configurations into a model. In any case, the API is an up-to-date implementation that is inline with the latest features of Javascript.

Because WebGL has no garbage collection, a common design pattern with Tensorflow JS is to wrap Tensorflow APIs with tf.tidy(), a utility that cleans up any tensors defined in that block after its execution. We also have the option to return needed data from tidy() blocks. On top of this, tf.dispose() can be used to clean up an object containing tensors too.

Tensor: Multi-dimensional array of numbers

A Tensor is just a matrix of numbers represented as a multi-dimensional array.

Tensors, holding these matrices of numbers, are passed through a prediction model, of which Tensorflow JS provide high level APIs for defining and training. Nevertheless, the term Tensor is used heavily throughout the API and documentation, and should not be confused with some fancy type of data — they are just sets of data as multi-dimensional arrays, that are transformed as they are passed (or flow) through a model, hence the name Tensorflow.

Now, before setting up our Webpack environment and implementing our model, let’s briefly go over the solution and what is required.

Solution Briefing

Implementing a model with Tensorflow JS follows the same design pattern as the Python counterpart that we will be implementing as a class, called VehicleEfficiency. We will be predicting the mpg (miles per gallon) of vehicles based on their horsepower.

This type of model has a one-to-one mapping. In other words, we wish to feed one piece of data into a model — horsepower, that will then output one other piece of data — mpg.

In order to do this, we firstly need to define and train a model with a reliable (and optimally large and diverse) data set. Once the model is trained, we can then pass new horsepower data into it, in order to generate a predictive output — mpg.

The execution process resembles the following stages:

// execution of a tensorflow JS prediction solutionget data -> define model -> format data for model -> train model -> make predictions

With this high level conceptual understanding, it already becomes clear how our VehicleEfficiency class can be structured to house these functionalities:

// models/VehicleEfficiency.jsclass VehicleEfficiency {   // class properties to handle configuration of model   // configure the instantiated object within constructor
constructor(config) {
}
// a method that retrieves the raw data we need to train the model
async getData() {}
// a method for defining the layers of the model and model type itself
createModel() {}
// a method to format data into tensors
dataToTensors() {}
// a method to train the model
async train() {}
// a method to make predictions
predict(inputData) {}
// an initialiser method that creates and trains the model, ready for predictions
async init() {
await this.getData();
this.createModel();
this.dataToTensors();
await this.train();
}
}

Note that a couple of functions here are asynchronous, with the async keyword. This is because they will hold promise based functions, both from the Tensorflow JS API and vanilla Javascript. Using await in our main execution function allows us to halt execution until these methods complete. We will break down each of these functions throughout this talk.

VehicleEfficiency will be instantiated within a run() function in our Webpack entry point file, index.js. The next section will document how to set up a dedicated Webpack environment for bundling our app.

This is by no means a comprehensive Webpack tutorial, but will demonstrate the process of installation and configuring Webpack, along with the concepts of plugins to support Javascript features such as classes, class properties and polyfills.

Alternatives to Webpack for Tensorflow JS

Alternatives to a bare-bones Webpack setup include adopting a Javascript framework such as React, with Create React App; doing so will only require the installation of @tensorflow packages the app will use, that can be imported into your classes and/or components.

Tensorflow JS documentation has opted for the <script> tag to include the required dependencies for each tutorial, but this is not realistic for many developers who opt for using a Javascript framework and package manager to maintain dependencies.

On top of this, the official Tensorflow JS tutorials simply define all functions and executions in one file, which is also an unrealistic proposition with today’s modern development tools, that rely on modules, classes, public and privately scoped packages, and more. These issues were some of the main motivations for writing this article, presenting a more viable approach for implementing Tensorflow JS models.

Note: If you are interested in creating a more complete Webpack environment, a dev server, HMR (hot module reloading for real-time page refreshing) and css-loader support are just some Webpack features to explore, but are out of the scope of this talk.

Setting up a Webpack project for Tensorflow JS

As mentioned previously, Webpack will give us exposure to the cutting edge of Javascript, while providing the building tools to polyfill where necessary for the browser. To get started with our Webpack environment, set up a project folder and install the required dependencies:

# create project directory
mkdir tfjs-linear-reg
cd tfjs-linear-reg
# initiate a new package.json
yarn init
#install tensorflow js and webpack dependencies
yarn add webpack webpack-cli \
@tensorflow/tfjs @tensorflow/tfjs-vis
#install babel plugins
yarn add --dev babel-loader babel-polyfill \
@babel/core \
@babel/preset-env \
@babel/plugin-proposal-class-properties
  • The webpack-cli package gives us a means of configuring Webpack from the command line, which we will use to build the project further down
  • @tensorflow/tfjs provide the tensorflow framework itself, whereas @tensorflow/tfjs-vis provide us a graphing solution that is overlaid on top of your app. We will see further down what this panel consists of, which can be expanded and collapsed with ` and maximised with ~.
  • We will insert a Babel polyfill and class properties extension to transpile the final build to ES5 and support class properties respectively.

Note: Babel is the most commonly used Javascript compiler, allowing us to use “next generation Javascript, today” per their slogan. Babel can be used as a standalone package, but also provide plugin support for app bundlers like Webpack — which is what we will be utilising. Find out more about Babel on their website.

package.json build script

Add the following script to package.json, giving us a shortcut for compiling our project:

// package.json{
"mode": "development",
"scripts": {
"build": "webpack --config webpack.config.js"
},

...
}

Within our build script we are running the webpack command, along with the --config flag pointing to our webpack configuration file, webpack.config.js. Running yarn build in the project directory will now compile our project.

Also notice that mode is assigned a value of development. Doing so creates a much more verbose build that is not optimised like a production build would be, that can be useful for troubleshooting post-build.

Let’s now explore the requirements of the webpack.config.js, where our Babel plugins and other project configuration are defined.

Note: You can also build a Webpack project with npx, by running npx webpack.

Webpack configuration in webpack.config.js

Briefly visiting webpack.config.js, the following module makes up all that we need for Tensorflow JS:

const path = require('path');module.exports = {
mode: 'development',
entry: ['babel-polyfill', './src/index.js'],
output: {
path: path.resolve(__dirname, 'dist'),
filename: 'main.js'
},
module: {
rules: [
{
test: /\.js$/,
exclude: /(node_modules)/,
use: {
loader: 'babel-loader',
options: {
presets: ['@babel/preset-env'],
plugins: ['@babel/plugin-proposal-class-properties']
}
}
}
]
}
};

We’ve included two entry points — one for our application index, and another for Babel’s polyfill package. Let’s run down the key points here:

  • Entries in Webpack are files that act as starting points for the bundling process. Modules and assets used are recursively scanned (based on rules) and included in the final build from these entry points.
  • Our output directory is a dist/ folder sitting in our project folder. This is a common convention. The output path requires an absolute path, which is why we are using path.resolve()
  • Module rules dictate what is bundled, and under what conditions. Our only rule here is to include all .js files, that we have defined under the test field as a regular expression. Each rule allows presets and plugins to be set — this is the location our babel plugins are included.

A complete reference for Webpack configurations can be found on their website.

Note: It is common to have a lot more rules, including those to include css files, image files, and more. To keep things bare bones, we only want to bundle our Javascript files for Tensorflow JS tests here.

Project structure

With our configuration now in place, we can set up the project folder structure. Refer to the final project on Github for the completed structure and implementation. The key takeaways are:

  • To have an index.html file with our final build script, dist/main.js, embedded within
  • To have a src/ directory with our index.js file — the entry point of the app
  • To include a models/ directory that will host our Tensorflow JS models:
dist/
src/
index.js
models/
VehicleEfficiency.js
index.html
package.json
webpack.config.json
...

With our project now set up ready to run Tensorflow JS, let’s jump into VehicleEfficiency.js and examine how the class is constructed.

Tensorflow JS VehicleEfficiency Class

VehicleEfficiency (here on Github) handles every aspect of our model: gathering and formatting data, creating and training the model, and making predictions. Let’s examine some of the Tensorflow JS APIs, and key Javascript utilities used in this class.

Class properties have been used to store basic values, such as the remote raw data URL, as well as some model configurations and placeholders of objects our model will overwrite as it is running.

Note: Javascript does not have a final implementation for private class properties. If you are concerned about accidentally accessing properties that should not be, a Typescript implementation may be a more preferable option.

The tfvis property is a boolean that toggles the usage of the tfjs-vis package, that displays graphing tools as our model is being trained.

class VehicleEfficieny {
tfvis = true;
...
}

Within our class methods, this just equates to an if statement, where we will execute our tfvis functions if set to true:

// toggling tfjs-vis graphs with this.tfvisif (this.tfvis) {   // generate graphs here
tfvis.render.scatterplot(
...
);
}

tfvis.render.scatterplot() is just one analysis tool in an impressive collection that can be browsed through at the tfjs-vis official API docs. With tfvis set to true, graphs will be generated as the model is being trained and predictions being generated that will be displayed on the right side of your app, overlaying existing content:

how tfjs-vis looks in the browser

tfjs-vis embeds a collapsable sidebar that hosts our model analysis. It consists of a rich API for generating a range of tables, graphs and analysis tools.

Note: It is most likely you will want tfvis to run all the time in development mode. You may wish to refer to the NODE_ENV environment variable to determine whether to execute it, or remove tfvis completely at build time to cut down on the bundle size of your app.

Gathering data

Within getData(), a list of car details are fetched from a remote URL as a JSON object. This data contains a few properties for each car, but we are only interested in horsepower and mpg. Therefore, after fetching the complete list, we use Object.map() to reconstruct a cleaned object with just the two properties. In addition, filter() has also been called to remove entries where either mpg or horsepower are missing:

// fetch data
const carsDataReq = await fetch(this.dataUrl);
const carsData = await carsDataReq.json();
^
format result as JSON object
// map data to create `cleaned` object storing mpg and horsepower
const cleaned = carsData.map(car => ({
mpg: car.Miles_per_Gallon,
horsepower: car.Horsepower,
}))
.filter(car => (car.mpg != null && car.horsepower != null));
^
only include entries where mpg and horsepower are defined

Cleaning data is a common convention in ML — preparing data in general is a big task that becomes an exponentially larger one as your data set grows. Ensuring values exist is critical for training models, where one missing or anomalous value could skew predictions.

Luckily, the data set provided for this example has also been labelled for us. labelling datasets is another task in and of itself — each feature of a model, and every value, needs to be labelled and formatted in a consistent manner. This may equate to labelling a range of facial features for a library of images for facial recognition, or a more advanced VehicleEfficiency model where 1000’s of car statistics are fed into the model.

As we can see from above, Javascript contains useful functions built-in for reformatting, filtering and mapping objects, as well as its support for JSON objects.

The TensorflowJS data APIs give us an idea of the wealth of methods used to collect and store data for ML — although diving into these methods is for another talk. The key takeaway here is that these tools make Javascript quite an ideal solution for preparing a diverse range of data for ML training.

Creating the model

The model creation for linear regression tasks is quite simple — we are adopting a sequential model and feeding our data through two layers.

A layer is simply a computational layer in our model, the input data of which being our raw data, formatted as tensors. Each layer transforms this data and outputs the result within the same tensor. In a sequential model, tensors flow through each layer whereby the previous layer determines the input of the next.

Concretely, a sequential model is a setup whereby the output of one layer will be the input of the following layer — no layers are skipped, and data is not dynamically moved to other branches of layers.

Note: The alternative to tf.sequential() is tf.model(), a more generic model that makes less assumptions to how data flows between layers.

Although this may sound complex, the creation of the model is rather straight forward, and is defined in createModel():

// creating a sequential model with two layerscreateModel () {   // instantiate a model
const model = tf.sequential();
// hidden layer
model.add(tf.layers.dense({ inputShape: [1], units: 1, useBias: true }));
// output layer
model.add(tf.layers.dense({ units: 1, useBias: true }));

// assign model as class property
this.model = model;
...
}

Hidden layers are intermediary layers before the final output layer. This terminology simply states that the output of this layer will be passed to the next layer as an input. The Output layer on the other hand produces the final predictive value.

Let’s break down this syntax:

  • A dense layer in Tensorflow hosts “weights”: a matrix of numbers that are trained to find an ideal transformation for the incoming tensor data. The more vast and large a data set is, the more “trained” the weights become, theoretically being able to make more accurate predictions.
  • units sets how big this weight matrix will be in the layer. By providing a value of 1, we are configuring the layer to to have 1 weight for each of the input features of the data. In our case, this translates to 1 transformation to the horsepower input.
  • Along with weights, a bias is also generated and added to this weighting transformation result, providing another mechanism for generating more accurate predictions.
  • Bias is true by default, so the useBias argument is not actually needed above. Our inputShape of [1] defines the dimensionality of the data we are passing into the layer. As we are only passing an integer, horsepower, the inputShape is 1 dimensional, hence our value of [1].

We are already seeing that Tensorflow JS is very flexible in the range of arguments it supports in the model framework, that highlight just how generic the framework is.

Machine Learning is very diverse, from its use cases to the various neural network structures and mathematical operations applied to them. This is reflected in Tensorflow JS’s approach to providing flexible APIs, and not assuming too much about what kind of problem you are tackling. This may be confusing for the newcomer, leaving ambiguity to what configurations to actually use for a problem, but there is no real fix for this beyond studying the documentation and familiarisation with solutions, and the reasoning behind them.

Note: For a higher level API, ml5.js attempts to provide a simplified API to work with Tensorflow JS without having to work with tensors. The package is currently lacking documentation and limits Tensorflow JS’s diversity, arguably being the main strength of the framework, so I do not see this package being a game changer in the future. As mentioned earlier in the article, simply providing a Javascript implementation to Tensorflow will be enough for many developers to jump into the framework.

Up Next

We have now discussed the project setup, how data has been fetched and cleaned, and how our Tensorflow model has been defined. Part 2 will discuss the formatting of our data into tensors, training the model, and finally making predictions for that trained model:

Programmer and Author. Director @ JKRBInvestments.com. Creator of LearnChineseGrammar.com for iOS.

Get the Medium app