Skip to content

Matrix Multiplication

  • Imagine 5 different schooling systems across the world.
  • How did they learn addition and multiplication?
  • Apples, Bananas and Oranges

We will learn matrix multiplication the same way.

Now, I am quite sure you were told by your teachers that you cannot compare apples and bananas.

Why is this the case? In 'scientific' terms, the 'units' do not match. But if we cannot compare apples and bananas, how about comparing 'unit's of apples and bananas?

apples and bananas

Why do we need vectors?

To ask that question, we must have an answer to the question why do we need numbers?

If we assume that we need numbers to describe things around us like:

  • the number of fingers we have
  • the chance of rain
  • the height of the tallest mountain

Then, what are the things we cannot describe with numbers?

  • the color red
  • the scenery of the sunset
  • the feeling of using a new shoe
  • the amount of democracy in the country

Relation to ML

Vectors in other contexts are also known as:

  • features
  • descriptors
  • embeddings
  • encodings

In these cases, engineers approximate these things with a bunch of numbers:

  • the color red: how much of red, green, blue is measured by a sensor
  • the scenery of the sunset: a bunch of numbers indicating the amount of red, green, blue measured by an array of sensors
  • the feeling of using a new shoe: the ratings from various online reviews?
  • the amount of democracy in the country: left to reader's discretion

This bunch of numbers, AKA vector, is now a description for that real world thing.

Implications

We can now compare these various things, as long as we measure them the same way!

For example, we can measure the/some distance from red to blue and that from red to green, etc.

Similarly, we can compare aggregate feelings of completely random strangers (or robots) with completely different preferences when using various new shoes

TODO: Excessive, leave to reader Question: How do we compare sunsets now though? Maybe transform that bunch of RGB numbers into another bunch of numbers with various cinematic metrics or critic scores?

Scalers and Stores

Let us start of by renaming a few common terms:

name etymology our lingo
Scalar Ladder Scaler
Vector Carrier Store
Value Strength Apple/Banana/...

Scalers

Scalers

Scalers simply scale. It is exactly the same as multiplication.

To 'understand' multiplication at a deeper level, note that they can be divided into some cases:

  1. Interval: \((-inf, -1)\)
  2. Value: \(-1\)
  3. Interval: \((-1, 0)\)
  4. Value: \(0\)
  5. Interval: \((0, 1)\)
  6. Value: \(1\)
  7. Interval: \((1, inf)\)

Each of these intervals 'behaves' differently, and this involves getting familiar with things like

  • if \(b = a \times c\) where \(c = 0.5\), then is \(b > a\)?
  • if \(a = 0.5\), \(b = 0.5\), is \(a \times b < 0.5\)?

Stores

Stores

A Store is one 'solid' group of things. Each row has a 'unit'.

We can 'scale' stores by multiplying by scaler. When we do that, each item of the store scales in proportion.

Scaling Stores

Scaling Stores

Note that the ratios of fruits are constant. That is what defines the 'unit' and makes it a store.

Also note how the scalers 1 and 0 are special. (maybe -1 as well)

Dividing Stores

A question to test your understanding:

How do I divide a store by 5 (I only know how to multiply stores by scalers)?

Dividing Stores

Answer

Multiply by \(1/5 = 0.2\)

Since we know how to 'divide' scalers, we don't need to know how to 'divide' stores.

\(\newcommand\vec[1]{\begin{bmatrix}#1\end{bmatrix}}\)

Matrices

Now let us visualize matrix multiplication as equations with scalers and stores.

For this, we introduce a new term, mix, as explained in the following examples.

As always, we start with the minimal case and build up from there.

One dimension

  • Store A \(S_A\) sells apples by sets of 1 apple each
  • I want 3 apples =>
\[[1 \, \mbox{apples}] \times 3 = [3 \, \mbox{apples}]\]
  • Pay attention to the units
  • the units of the scaler 3
  • the units of the lhs store \([1 \mbox{apples}]\)
  • the units of the rhs result \([3 \mbox{apples}]\) -> question: is it a store?

Two dimensions

  • Store A \(S_A\) sells apples by sets of 1 apple each
  • Store B \(S_B\) sells bananas by sets of 1 banana each
  • I want 3 apples and 2 bananas =>
\[[1 \, \mbox{apples}] \times 3 + [1 \, \mbox{bananas}] \times 2 = [??]\]
  • But what about the units? We cannot combine apples and bananas, so why not:
\[\vec{1 \, \mbox{apples} \\ 0 \, \mbox{bananas}} \times 3 + \vec{0 \, \mbox{apples} \\ 1 \, \mbox{bananas}} \times 2 = [??]\]
  • Now all the units work out, and we have:
\[\vec{1 \, \mbox{apples} \\ 0 \, \mbox{bananas}} \times 3 + \vec{0 \, \mbox{apples} \\ 1 \, \mbox{bananas}} \times 2 = \vec{3 \, \mbox{apples} \\ 2 \, \mbox{bananas}}\]
  • Leaving out the units:
\[3 \times \vec{1 \\ 0 } + 2 \times \vec{0 \\ 1 } = \vec{3 \\ 2 }\]
  • In words:
  • 3 times a set of (1 apple and 0 bananas)
  • plus 2 times a set of (0 apples and 1 banana)
  • equals a set of (3 apples and 2 bananas)

  • Again:

  • \(3 \times \vec{1 \\ 0}\) from store A
  • \(+ 2 \times \vec{0 \\ 1}\) from store B
  • \(= \vec{3 \\ 2}\) what we finally get

  • Giving us a mix of stores:

\[3 \times \vec{1 \\ 0 } \mbox{from store A} + 2 \times \vec{0 \\ 1 } \mbox{from store B} = \vec{3 \\ 2 }\]
  • Putting the mix each in its own row:
\[\begin{bmatrix}1 & 0 & \mbox{apples}\\0 & 1 & \mbox{bananas}\end{bmatrix} \times \vec{3_{\tiny\vec{1\\0}} \mbox{ from store A} \\ 2_{\tiny\vec{0\\1}} \mbox{ from store B} } = \begin{bmatrix}3 & \mbox{apples} \\ 2 & \mbox{bananas}\end{bmatrix}\]
  • \(\begin{bmatrix}1 & 0 \\0 & 1 \end{bmatrix}\) -> Stores; \(\vec{3 \\ 2}\) -> Mixes:
\[\begin{bmatrix}1 & 0 \\0 & 1 \end{bmatrix} \times \vec{3 \\ 2} = \begin{bmatrix}3 & \mbox{apples} \\ 2 & \mbox{bananas}\end{bmatrix}\]
  • Matrix multiplication discarding our connection to the real world:
\[\vec{1 \\ 0} \times 3 + \vec{0 \\ 1} \times 2 = \begin{bmatrix}1 & 0 \\0 & 1 \end{bmatrix} \times \vec{3 \\ 2} = \begin{bmatrix}3 \\ 2 \end{bmatrix}\]

Linear independence

Now, imagine a scenario where I keep tabs of how much I bought from each store. Also imagine I am a careless individual who lost the tabs.

Basis

All I know is:

  • There are only two stores in town
  • Store A sells 1 apples as a set.
  • Store B sells 1 bananas as a set.
  • I finally have 3 apples and 2 bananas.

How many sets did I buy from Store A, Store B?

Possible Mixes

The times are changing, the deal is now thus:

  • Store A sells 2 apples and 1 bananas as a set.
  • Store B still sells 1 bananas as a set.
  • I finally have 10 apples and 8 bananas.

How many sets did I buy from Store A, Store B?

Impossible Mixes

War is upon us, apples are no longer in stock, the deal now is:

  • Store A sells 1 bananas as a set.
  • Store B sells 1 bananas as a set.
  • I finally have 10 apples and 8 bananas.

How many sets did I buy from Store A, Store B?

Infinite Mixes

  • I was kidding, of course I dont have any apples, but I do have 8 bananas.

How many sets did I buy from Store A, Store B?

Ratios and rations

For the following, say if possible/impossible/infinite case? Assume you can buy or sell fractional sets, but always as a set.

Case 1:

  • Store A sells 6 apples and 2 bananas as a set.
  • Store B sells 3 apples and 1 bananas as a set.
  • I finally have 12 apples and 6 bananas.

Case 2:

  • Store A sells 6 apples and 2 bananas as a set.
  • Store B sells 3 apples and 1 bananas as a set.
  • I finally have 12 apples and 10 bananas.

Case 3:

  • Store A sells 2 apples and 1 bananas as a set.
  • Store B sells 3 apples and 1 bananas as a set.
  • I finally have 10 apples and 4 bananas.

Is that the only solution?

Case 4 (some subversions call for inversions):

  • Store A sells 2 apples and 1 bananas as a set.
  • Store B sells 3 apples and 1 bananas as a set.
  • I finally have 10 apples and 8 bananas.

Finally, write down the matrix equations for the above cases.

The dawn of the vector space

Draw the points.

Power of matrices

  • Collapsing of operations - displacement vs distance
  • Structure of matrix - fast fourier transform
  • Checking invertibility
  • Eigenvalues - signal and noise
  • Absorbing the constant - homogeneous coordinates