Binary Classification Comparison using Neural Networks and Random Contrast Learning

1. INTRODUCTION AND BACKGROUND

Although Deep Learning has enjoyed tremendous success and growth in recentyears, it continues to be costly and time consuming. Computations in deep learning are so intensive that a suite of hardware processors has been adapted or developed to accommodate its computation needs. Most notably, Graphic Processing Unit (GPU) cards developed by NVIDIA and Tensor Processing Unit (TPU) cards developed by Google, continue to handle most of the deep learning loads in data centers around the world due to their ability to perform large numbers of calculations faster and more efficiently than computer CPUs (Central Processing Units).

As demand for more compute power and capacity grew, so did the demand for computer chips needed to manufacture GPU and TPU hardware. This caused GPU and TPU hardware prices to skyrocket in recent years which resulted in a rise in deep learning operating costs. Although GPU and TPU prices have come down slightly in the past year, they are still relatively high.

Described as the “alternative to Deep Learning”, Random Contrast Learning or LuminaRCL, is a novel approach to supercomputing and machine learning developed by Lumina AI (formerly, Lumina Analytics), headquartered in Tampa, Florida. Training LuminaRCL models does not require costly GPU or TPU hardware. They are trained on CPUs which are abundant and less expensive and promise to outpace neural networks in training and inference performance, as well as accuracy. In this work, we will test RCL on binary medical image classification and compare its results to neural networks.

2. DESCRIPTION OF DATASETS

The datasets used in this study were acquired from public sources. The followingthree datasets were selected for our experiments:

- Breast Cancer Biopsy Dataset (7,500 samples, 2-class, balanced)
- Lymph Node Cancer Biopsy Dataset (600,000 samples, 2-class, balanced)
- Brain Tumor MRI Scan Dataset (9,561 samples, 2-class, balanced)

2.1 Breast Cancer Biopsy Dataset Overview

The Breast Cancer Biopsy Dataset contains 7,500 images in two classes: benign and malignant. The images are in the PNG format and are relatively large at 700×460 pixels. The dataset is balanced.

2.2 Lymph Node Cancer Biopsy Dataset Overview

The Lymph Node Cancer Biopsy Dataset contains 600,000 images in two classes: benign and malignant. The images are in the PNG format and are relatively small at only 96×96 pixels. The dataset is balanced.

2.3 Brain Cancer MRI Scan Dataset Overview

The Brain Tumor MRI Scan Dataset contains 9,561 images in two classes: normal and tumor. The images are in the PNG format and are average size at 256×256 pixels. The dataset is balanced.

3. IMAGE CLASSIFICATION METHODS

We will attempt to train two different methods to classify the samples in the selected datasets:

- Random Contrast Learning Classifier (RCLC) – an untrained alternative todeep learning language translation model developed by Lumina Research
- Four-Layer Convolutional Neural Network (CNN-4) – an untrained deep learning convolution neural network

Several experiments are conducted using each method, but only the best results will be reported in this work. Trained models will be evaluated based on their test accuracies.

3.1 Random Contrast Learning Classifier (RCLC)

Lumina AI describes Random Contrast Learning (LuminaRCL) as “a new approach to supercomputing and machine learning. LuminaRCL employs a novel use of randomness that may enable it to outperform neural networks in training speed, inference speed, and accuracy.”

LuminaRCL was developed by the Lumina team in 2022. We will use the RCL Classifier (RCLC) module in our experiments, which were conducted on Windows computers with INTEL i9 or Xeon Silver processors. The RAM on the computers ranged from 64 GB to 512 GB. RCLC is branded as PrismRCL on Windows.

3.2 Four-Layer Convolutional Neural Network (CNN-4)

The four-layer neural network used in the experiments is a simple network comprised for four convolution layers, with max pooling and dropout, as well as a final dense layer at the output. The final layer uses ReLU activation followed by Sigmoid activation.

The neural network experiments were conducted on Google Colab, which is a cloud-based Python Notebook environment with a lot of built-in support for machine learning. The Colab account used was configured with 54 GB of GPU and a similar amount of CPU RAM.

4. IMAGE DATASETS PRE-PROCESSING AND PREPARATION

LuminaRCL does not require much pre-processing or data preparation. The algorithm has two main requirements:

1. The images must be in the PNG format. The images for one of the acquired datasets were in JPG format, but using a simple Python script, we were able to convert the images to PNG easily.

2. The raw images can be fed to the LuminaRCL algorithm in their class folders, even if they are of different dimensions. LuminaRCL accepts training and testing image data as follows:

Parent_Folder/

Class1_Folder/img1000.png, img1001.png, …

Class2_Folder/img2000.png, img2001.png, …

Note: it is important to make sure that image names are unique across all class folders.

For the neural network, the image datasets must be converted to the NumPy array format. The training images and their class information are typically saved in separate arrays. Also, all images must be of the same shape and size. NumPy arrays can be saved to disk and used in future experiments, unless the dataset structure is modified, or if data is added or removed. In that case, the arrays must be generated again.

5. IMAGE CLASSIFICATION EXPERIMENTS

In this section, we briefly describe the experiments that were carried out, starting with the four-layer neural network then RCLC. Although many experiments were conducted on each dataset, using both RCLC and the neural network, we present only the best set of results from all three datasets and both sets of experiments.

5.1 Four-Layer Convolutional Neural Network Experiments and Results

Setting up a neural network training and testing pipeline in Python requires a decent amount of programming knowledge and dozens of lines of code. Assuming all your images have the same dimensions, first you need to load the training data, convert it to NumPy arrays and save those arrays to disk (for future use). Then you need to set up your neural network of choice, and define your training parameters like number of epochs, batch size, learning rate, early stopping rules, etc. Once training is complete, you then have to evaluate your model using the test data, before finally determining if you ended up with a good model. Table 5.1.1 below shows the test accuracies obtained on our three datasets.

Table showcasing test accuracy results with CNN-4 on three medical imaging datasets.

5.2 Random Contrast Learning Classifier Experiments and Results

Compared to a neural network, there is no doubt that RCL is much simpler and easier to use. Although it is a native Windows application, PrismRCL does not have an interactive user interface. It is run via the command line, usually requiring only one line of code to start a training or an inference session. No programming experience is needed. Before initiating your training session, you will probably want to optimize your training parameters, which can be done via a single line of code as well. It is also a multi-threaded application that takes advantage of all the CPU power a computer has. Table 5.2.1 below shows the results of the best training sessions conducted on all three datasets included in this study.

Showcasing near perfect results achieved using RCLC on three medical imaging datasets.

It is evident that in these experiments, RCLC exceeded the performance of the neural network and produced accuracies that are near perfect. Add to this the simplicity, ease of use and modest hardware requirements, and you’ve got promising technology in LuminaRCL.

6. RANDOM CONTRAST LEARNING BENEFITS AND FEATURES

When compared to neural networks, the current state of the art, LuminaRCL has shown advantages in the following areas:

- CPU-based Training – AI-specific chips such as GPU, NPU, and TPU are limited in quantity and cost-prohibitive for most organizations. LuminaRCL’s use of CPU hardware minimizes capital expense and allows for democratization of machine learning technology.
- Generalization – LuminaRCL can generalize on smaller datasets than required by neural networks. We can see from Table 6.1 below that RCLC’s test accuracy exceeds 90% on small portions of the training data.
- Speed of Data Preparation – LuminaRCL can ingest PNG images, tabular data and text without the pre-processing work required by neural networks. While deep learning algorithms require all images to be of the same shape, LuminaRCL accepts PNG images of different sizes.
- Sensitivity to Weak Signals – LuminaRCL’s use of randomness as a filter allows for patterns to be illuminated at their earliest point of significance.
- Combining Models for Inference – LuminaRCL appears to have the ability to combine models that have been trained separately and thus make the larger, disparately trained models accessible through a single improved model.

Table that showcases results for all three datasets, and percentage of training data needed to achieve high test accuracies.

Supported Data Types:

LuminaRCL supports three types of data:

1. Image data in PNG format stored in class folders. Images in other formats can be easily converted to PNG.
2. Text data stored in text files which are in turn stored in class folders. Each text file can contain one or more lines and will be treated as one sample.
3. Tabular data, space separated and stored in text files which are in turn stored in class folders. One feature vector per text file.

Automatic Parameters:

Before you start training LuminaRCL on your data, you will need to set some parameters that will optimize the training and produce the best possible model. these parameters are: evaluation, rclticks, boxdown, and imaginaryslice. LuminaRCL offers a special feature that can find the optimal training parameters for your dataset automatically. We call it auto-optimize. No more trying random parameters endlessly. Let LuminaRCL do the work for you!

Features:

Access to the LuminaRCL algorithm is available through a public API as well as a Windows Desktop application that is very simple to use. Here is what you can expect from the API and the Desktop application:

Simplified Dataset Management – The API comes with a frontend web application that allows you to manage your datasets. Upload your dataset once, use it as many times as you need. Delete your datasets if you no longer need them.
Intuitive Job Monitoring – The web application provides an easy way for you to monitor your training and inference jobs. Re-use your API requests through an easy-to-use copy and paste feature.
Model Management – The API application will store your models for future use. Train once. Run inference as many times as needed. Delete the models you no longer need.
In-Memory Inference – Load your favorite model into memory once and run inference on it through the API as many times as you need.
Full Control – With the desktop application, branded as PrismRCL, you can do all the above and much more. Run it on your hardware using single line commands. Schedule training jobs or run them in batch mode. The application is built to run from the Windows terminal so you can automate your jobs and run them unattended. More importantly, keep your datasets safe and secure. No uploading or sharing.

7. SUMMARY AND CONCLUSIONS

In this work, we ran binary medical image classification experiments on two systems: RCLC, an untrained alternative to deep learning, and a four-layer convolutional neural network. The imaging datasets that we used ranged from 7,500 images for the Breast Cancer Biopsy dataset to 600,000 images for the Lymph Node Cancer Biopsy dataset. RCLC, which does not require a GPU, and runs on any CPU (via a public API or natively on Windows), offers very simple parameter optimization and training, and yields results that exceed the best performance of the neural network used in our experiments, all without the benefit of any transfer learning. We also demonstrated the rate at which RCLC’s test accuracy improves using only a fraction of the whole dataset, as shown in Figure 7.1 below.

Graph that shows how quickly RCL reaches near perfect test accuracies on all three datasets.

RCLC’s best test accuracies were 99.1% for the Brain Tumor MRI scans, 99.5% for the Breast Cancer Biopsy and 99.8% for the Lymph Node Cancer Biopsy. Using the neural network, the accuracies were as follows: 97.3% for the Brain Tumor MRI scans, 94.2% for the Breast Cancer Biopsy and 94.5% for the Lymph Node Cancer Biopsy. RCLC was run on Windows computers running with INTEL i9 or Xeon Silver processors, with memory ranging from 64 GB to 512 GB or DDR4. Our four-layer neural network was run in the Google Colab environment which is equipped with an NVIDIA A100 or V100 GPU card and 54 GB of GPU RAM.

Download this White Paper (PDF)

Interested in Learning More about Lumina's Medical Imaging Work?

Learn how our products can be added to your existing machine learning workflows to increase accuracy with less data.

Learn More