4
E. Sensor Calibration Setup
The relative position of each sensor is computed with refer-
ence to the geometric center of the mobile rig using pose
computation using a fiducial marker as in Figure 5.
Homogenous transforms can be chained together
to map a depth image from the Lidar to be centered
around each of the event camera axes. This is used as the
ground-truth depth image, from the perspective of the
event cameras, EI.
E I =E TL L I (2)
where L I is the depth image taken by the LiDAR, L I is the
ground-truth depth image from the perspective of each of
the event cameras, and E TL is the transform between the
depth camera and each of the event cameras found from
extrinsic calibration.
Through this approach, the generation of a diverse,
large training data set is automatic. Thus, any super- vised
learning approach can be used as the constraint of the com-
plication of labeling a data set is removed.
DATA SET COLLECTION
The sensors used are two iniVation DAVIS (Dynamic and
Active-pixel Vision Sensor) event cameras and one Intel
RealSense L515 (Lidar-Camera). These are mounted rig-
idly as shown in Figure 7.
We evaluated the entire pipeline by collecting both
training and testing data in the Edgar Mine in Idaho
Springs, Colorado. The hypothesis is that disparity maps
can be predicted at a higher speed and with higher fidelity
than when computed with traditional stereo vision tech-
niques, during an active drilling operation. To map the real-
time disparities to depth, Equation (1) is used again. Using
just the stereo setup, two-time synced images are input to
the trained network as shown in Figure 8.
As can be seen, the input images have a significant
amount of noise that would disrupt traditional stereo
vision [20] that is based on feature matching. When the
Figure 4. Shows the loss curves generated during the
training of the network. An early stopping strategy was
utilized so as not to over- train the network
Figure 5. Shows the calibration rig used to extract the
relative sensor poses. The fiducial marker can be swapped
with a checkerboard to get the intrinsic matrix, K, for each
of the sensors
Figure 6. Shows the process used to locate the three sensors
relative to each other
E. Sensor Calibration Setup
The relative position of each sensor is computed with refer-
ence to the geometric center of the mobile rig using pose
computation using a fiducial marker as in Figure 5.
Homogenous transforms can be chained together
to map a depth image from the Lidar to be centered
around each of the event camera axes. This is used as the
ground-truth depth image, from the perspective of the
event cameras, EI.
E I =E TL L I (2)
where L I is the depth image taken by the LiDAR, L I is the
ground-truth depth image from the perspective of each of
the event cameras, and E TL is the transform between the
depth camera and each of the event cameras found from
extrinsic calibration.
Through this approach, the generation of a diverse,
large training data set is automatic. Thus, any super- vised
learning approach can be used as the constraint of the com-
plication of labeling a data set is removed.
DATA SET COLLECTION
The sensors used are two iniVation DAVIS (Dynamic and
Active-pixel Vision Sensor) event cameras and one Intel
RealSense L515 (Lidar-Camera). These are mounted rig-
idly as shown in Figure 7.
We evaluated the entire pipeline by collecting both
training and testing data in the Edgar Mine in Idaho
Springs, Colorado. The hypothesis is that disparity maps
can be predicted at a higher speed and with higher fidelity
than when computed with traditional stereo vision tech-
niques, during an active drilling operation. To map the real-
time disparities to depth, Equation (1) is used again. Using
just the stereo setup, two-time synced images are input to
the trained network as shown in Figure 8.
As can be seen, the input images have a significant
amount of noise that would disrupt traditional stereo
vision [20] that is based on feature matching. When the
Figure 4. Shows the loss curves generated during the
training of the network. An early stopping strategy was
utilized so as not to over- train the network
Figure 5. Shows the calibration rig used to extract the
relative sensor poses. The fiducial marker can be swapped
with a checkerboard to get the intrinsic matrix, K, for each
of the sensors
Figure 6. Shows the process used to locate the three sensors
relative to each other