4
Now what remains is for the decision boundary to
be drawn using a general background removal algorithm
called alphamatting [15]. But any other algorithm could
be used. This is the auto-labeling step in the process. The
mask is thresholded to suit a binary segmentation architec-
ture and cleaned up using morphological operations. The
resulting binary image is treated as ground truth as shown
in Figure 6.
As can be seen, this process results in a high-fidelity
mask. However, the auto-labeling engine used currently is
a numerical one that requires a color image as well as the
generation of a trimap. All of these factors result in a system
that is infeasible for online, real-time rock segmentation
in the mine. The pipeline can be used to generate a large,
diverse dataset with training labels for a binary, semantic
segmentation network.
B. Developing the Color Segmentation Network
The semantic segmentation network is built using a U-Net
architecture [16]. The network uses an encoder–decoder
structure with four successive convolutional layers. This
extracts the salient features from the image and the ground
truth mask. These are combined to minimize a binary cross-
entropy loss [17] using the Adam optimizer [18].
The data set is divided into batches of four and trained
for 15 epochs using an NVIDIA GeForce RTX 3070 graph-
ical processing unit. A validation set is set aside to test the
model at the end of each epoch. The final validation accu-
racy of the trained network is 98%. The accuracy and loss
curves are plotted in Figure 7. Several strategies were used
so as to prevent over-fitting [19]. Firstly an early-stopping
strategy was used to stop the training process before valida-
tion variance started rising as can be seen by the validation
curve in Figure 7. Secondly, the training and validation set
was shuffled at each epoch.
C. Domain Transfer
The next step of the process is to be able to leverage this
trained network to predict masks for event-based images.
This will allow for real-time, low latency semantic segmen-
tation as shown in Figure 8.
It can be seen that the mask predicted from the color
image is of higher fidelity, with the holes in the strap being
recognized by the network. Therefore, this can be used to
validate the predictions from unseen event images.
Simultaneously, this architecture can be used to vali-
date these predictions. The extrinsic calibration parameters,
discussed in the following section, are used to map the
color images into the event camera frames. Both the event
and warped color images are input to the network. The pre-
diction from the input color image is used as ground-truth
labels and compared to the semantic prediction from the
event image. Now, the domain transfer loop can be closed,
as event-based images can be chosen and added to the data-
set along with their predicted masks. Additional training
regiments can be run with this multi-domain dataset so
that the network becomes more generalized over time.
Figure 6. Shows the result of the alpha matting algorithm
where the boundary between the rock and strap has been
drawn. This mask can be processed to eventually be ground
truth
Figure 7. Shows the loss curves generated during the
training of the network. An early stopping strategy was
utilized so as not to overtrain the network
Previous Page Next Page