6
source was the same as the first set. Ego motion was a wide
variety of speeds in each of the six degrees of freedom.
RESULTS
A. Training Set One
When projected to depth, this data set had ||e||F =5mm
± 1mm. This means that on average the Root Mean Squared
Error(RMSE) between the prediction and the ground-truth
is 5 mm.
B. Testing Set One
The figures below show predictions from the net- work.
The input images are taken in a different area of the mine,
i.e., unseen to the network prior to testing. The images are
taken under different lighting conditions.
DISCUSSION
This work was able to generate virtually unlimited amounts
of training data. Using the PSMNEt-Stereo network, event-
based images were used to generate a 3D surface representa-
tion of the roof of the mine. Even though each mine can
have unique lighting conditions, this pipeline can be used
to collect data from the field quickly and cheaply. This data
can be immediately added to the training set, without any
pre-processing and a quick training regiment can be run.
As can be seen in Figure 9, the features detected by the
tightly scoped network, trained on Training Set One, are
predicted with high resolution and the right distance scale.
The widely generalized model was able to get the right dis-
tance scale but was not able to learn the features to the
same fidelity. This implies that the fidelity of the model is
correlated with the resources devoted to the training of the
model.
As Figures 9, 10, and 11 show, the event images can
be used to produce a depth image of acceptable fidelity.
The straps have holes to allow for bolting. The network can
predict the depths of these through holes. Additionally,
the outline of the strap is detected with the appropriate
20mm depth difference between the strap and the roof,
where ground truth was available. This work shows that the
LiDAR sensor perfor- mance can be replicated when the
rock is smooth. However, the neural network exceeds the
performance of the LiDAR in certain areas of the mine.
Specifically, the support straps appear to absorb the wave-
length generated [21] by the RealSense L515. The effect of
the attenuated return signal is that straps in depth images
appear to have the same depth. This introduces a signifi-
cant measure of noise into the measurement. This is not
Figure 9. Top Row (Left to Right). Input event based
image, LiDAR based depth image. Bottom Row. Predicted
depth image. Shows the depth map predicted by the trained
network. We can use the computed focal lengths and the
baseline distance between the two event cameras to generate
the real-time depth map from the perspective of each of the
event cameras
Figure 10. Top Row (Left to Right). Input event based
image, Predicted depth image. Shows the prediction of the
network from an input event image with the strap partially
in a shadow. The 20mm difference in depth between the
strap and the roof is captured
Figure 11. Top Row (Left to Right). Input event based
image, Predicted depth image. Shows the prediction of the
network from a well-lit input event image. The difference in
depth between the holes in the strap and the strap itself is
captured
source was the same as the first set. Ego motion was a wide
variety of speeds in each of the six degrees of freedom.
RESULTS
A. Training Set One
When projected to depth, this data set had ||e||F =5mm
± 1mm. This means that on average the Root Mean Squared
Error(RMSE) between the prediction and the ground-truth
is 5 mm.
B. Testing Set One
The figures below show predictions from the net- work.
The input images are taken in a different area of the mine,
i.e., unseen to the network prior to testing. The images are
taken under different lighting conditions.
DISCUSSION
This work was able to generate virtually unlimited amounts
of training data. Using the PSMNEt-Stereo network, event-
based images were used to generate a 3D surface representa-
tion of the roof of the mine. Even though each mine can
have unique lighting conditions, this pipeline can be used
to collect data from the field quickly and cheaply. This data
can be immediately added to the training set, without any
pre-processing and a quick training regiment can be run.
As can be seen in Figure 9, the features detected by the
tightly scoped network, trained on Training Set One, are
predicted with high resolution and the right distance scale.
The widely generalized model was able to get the right dis-
tance scale but was not able to learn the features to the
same fidelity. This implies that the fidelity of the model is
correlated with the resources devoted to the training of the
model.
As Figures 9, 10, and 11 show, the event images can
be used to produce a depth image of acceptable fidelity.
The straps have holes to allow for bolting. The network can
predict the depths of these through holes. Additionally,
the outline of the strap is detected with the appropriate
20mm depth difference between the strap and the roof,
where ground truth was available. This work shows that the
LiDAR sensor perfor- mance can be replicated when the
rock is smooth. However, the neural network exceeds the
performance of the LiDAR in certain areas of the mine.
Specifically, the support straps appear to absorb the wave-
length generated [21] by the RealSense L515. The effect of
the attenuated return signal is that straps in depth images
appear to have the same depth. This introduces a signifi-
cant measure of noise into the measurement. This is not
Figure 9. Top Row (Left to Right). Input event based
image, LiDAR based depth image. Bottom Row. Predicted
depth image. Shows the depth map predicted by the trained
network. We can use the computed focal lengths and the
baseline distance between the two event cameras to generate
the real-time depth map from the perspective of each of the
event cameras
Figure 10. Top Row (Left to Right). Input event based
image, Predicted depth image. Shows the prediction of the
network from an input event image with the strap partially
in a shadow. The 20mm difference in depth between the
strap and the roof is captured
Figure 11. Top Row (Left to Right). Input event based
image, Predicted depth image. Shows the prediction of the
network from a well-lit input event image. The difference in
depth between the holes in the strap and the strap itself is
captured