AI Essentials: Working with Matrices

Understanding common matrix operations used in deep learning

Ross Bulat
11 min readDec 11, 2020


Introduction to Matrices

Matrices are a fundamental concept in AI, especially when working with neural networks and the majority of sub-fields of machine learning, such as image processing and synthesising, natural language processing, prediction — just about all types of deep learning models rely on matrices to contain and manipulate numerical collections of data.

Matrices are multi-dimensional arrays of numbers. They, and neural networks in general, are often referred to as “black boxes”, as they are notoriously hard to analyse for patterns or effectiveness.

Matrices: Black boxes of multi-dimensional data

Matrices can be very large and can exist in a multitude of dimensions, typically ranging in the thousands if not millions of elements in medium to large-scale projects. Although they are very good at adjusting to network parameters while they find their way to a resulting output, it is rather challenging for us humans to understand what is manipulated within these vast collections of numbers, at each layer of the network. Indeed, model analysis and evaluation is a sub-field in and of itself within the larger scope of AI.

Think of an input of a neural network, perhaps a set of 1000 normalised features. These features would be passed into the network as a matrix of numbers, and manipulated at each layer of the network (by other matrices!) until the output layer is reached.

Neural networks, being functions that learn how to be efficient and output reasonable data, are black boxes in and of themselves — the programmer or engineer can give the network data to work with, but ultimately the complex formulation of matrix manipulations within the network define its effectiveness.

What this piece will cover

Matrices are blocks of data (often floats) that can exist in 2 or more dimensions (think multi-dimensional arrays), the values of which are often in normalised or trained form. But how are they manipulated exactly? E.g. how would we add two matrices together — or multiply them?



Ross Bulat

Programmer and Author. @ Parity Technologies, JKRB Investments