MINEXCHANGE 2024 Preprints page 4

4
A. Training Part
The training part of the proposed framework consists the
following main components.
1. Data Augmentation: Data augmentation is the pro-
cess of creating more training images to increase
the performance of the deep learning model. In
computer vision tasks, data augmentation is imple-
mented using spatial transformation and intensity
transformation [29].
2. Color to gray-scale: This image processong is used to
adapt the deep learning model so that the trained
model can be used to predict masks for images
that are captured by a regular camera and to pre-
dict masks for images that are captured by an event
camera.
3. UNet Model: The semantic segmentation model
is implemented using UNet architecture by [28].
UNet is one of the most popular convolutional
neural network used for image segmentation. The
architecture of UNet model consists of two main
parts: the encoder and the decoder. The encoder is
a conventional convolutinal neural network which
consists 3×3 convolution layers, linear rectified
function, and 2×2 maximum pooling. The encoder
down samples the input images at a constant spa-
tial rate with a strides of two at each pooling layer
which reduces the computation cost of the model
during the training. The decoder up samples the
extracted features by 2 × 2 from the encoder part.
The output layer is 1 × 1 convolution layer to clas-
sify each extracted pixel.
B. Testing Part
The testing part includes the trained semantic segmenta-
tion model, the color to gray-scale component, and the
median filtering. The color to gray-scale prerocessing is
used to predict masks for test images that are captured by
regular cameras while the median filtering preprocessing is
used to predict masks for accumulated images that are cap-
tured by event cameras. The median filtering on the accu-
mulated events images is the key component to adapt the
color-based deep learning model to to be used for the seg-
mentation of images from traditional cameras and images
from event cameras. The median filtering removes the noise
from the accumulated images and preserve the edges.
EXPERIMENT
Data has been collected from Edgar-Mine in Idaho Springs,
Co by two types of cameras an L515 LiDAR-camera by
Intel ® RealSense ™ and two DVXplorer Mini event cameras
by Inivation. The sensors setup is as shown in Figure 5. The
L515 LiDar-camera produces color images of size 480×680
which are used for training and testing of the proposed
semantic segmentation model. The event camera produces
accumulated images of the same size as the color camera.
The hyper-parameters for the deep learning model are as
given in Table 1.
PERFORMANCE EVALUATION METRICS
Since the segmentation task is a pixel level classification,
the data to be analyzed to evaluate the performance of the
proposed model is represented by four variables which are:
Figure 4. Proposed event based machine learning framework for semantic segmentation. Images from traditional camera are
used to train the model while the model can be used to segment images from traditional and event cameras.

Previous Page Next Page

MINEXCHANGE 2024 Preprints Page 4 (257 of 800)

MINEXCHANGE 2024 Preprints resources

Help