4
definitions, a data lake, an integrated virtual database, and
a model layer. Working together, they provide a robust,
AI-ready solution.
Business Context
Traditional databases typically do not store business context
about the data they are holding. This makes it hard to do
reasoning in either the classical sense or the LLM sense.
For generative AI, context helps generate higher-quality
answers. Having business goals and business processes well
documented and integrated can help provide this context.
[11]
Business goals are the company’s high-level objectives.
Being profitable is a key example of a business goal, but this
will have many facets and other related goals. For example,
a goal might be to improve safety, reduce energy use, bet-
ter utilize resources, meet production targets, or reduce
instances of unplanned maintenance.
Business processes are descriptions of how business is
currently being done. These are descriptions of the cur-
rent processes of one aspect of the business, such as drill-
ing, hauling, loading, etc. There are established standards
for this, such as Business Process Model and Notation
(BPMN.)
Mining companies often have already invested time
and energy to document these sorts of things, from process
documentation to corporate objectives, but have not inte-
grated it with their operational, AI, or analytical systems.
By including it, AI systems can have a much clearer under-
standing of the broader implications of their decisions and
generate more relevant answers.
Data Lake
Mines are constantly generating data from many differ-
ent sources in many different formats at many different
cadences. This can present a challenge, as databases and
tools often require transforming data into their preferred
format to bring it in. This can cause data loss if the two for-
mats don’t share common definitions. On the other hand,
a data lake keeps data in its original format, like how a file
system can store files of all types. This means data isn’t lost
as it’s brought into the system, and multiple rich views of
the data can be created and updated over time, building
solutions off this strong base.
Data Lakes are often hosted using Object Storage in
the cloud, such as Amazon’s S3 storage or Azure’s blob stor-
age, which is storage optimized for storing large amounts
of rarely changing data. These systems can hold a huge
amount of data, and access speeds have gotten quite fast,
so they’re becoming a go-to solution for analytical systems.
[12] Object storage works like a file system, so it can handle
everything from CSV files to videos.
There are several proven techniques to load data into a
data lake using data ingestion pipelines. For some data, it
makes sense to do a one-off import, like exploration results,
but some data may require continuous updates, such as
sensor data. Data pipelines handle these different cases,
ensuring data from all types of sources makes it into the
data lake.
Having all the data accessible in a data lake is a good
start towards being able to do machine learning, but it’s not
quite ready in this format. There’s still a lack of organization
at this point, which is covered in other parts of the system.
Data Integration, Cleaning, and Augmentation
Data in a data lake doesn’t have to follow any rhyme or
reason. Different sources might refer to the same thing with
different names, units of measure can be different, and so
on. This can be seen with grade measurements, where one
system might use percentages while another uses parts per
million or with different naming conventions for the same
mining locations.
Fixing these problems is a data integration task.
Traditionally this is handled by having a series of data
pipelines that take data from a data lake into a database,
transforming it and standardizing it into one shared for-
mat. There are downsides to this approach, however. The
database must now be maintained as a new copy of the
data. Instead, it’s much more useful to have a system that’s
an enriched view of the data lake.
A “View” in the database sense has a lot of benefits. It
can be thought of as a window into the data rather than a
copy. The original data is used instead, with strategic cach-
ing done to maintain performance on the fly. Views update
automatically as underlying data changes, which for sensor
data or data that changes frequently in operations can help
a lot. The view can enrich the data with new calculations as
well, such as aggregations like shift key performance indica-
tors (KPIs) or different equivalent calculations for different
units of measure.
These views can be created using rules in the classic
symbolic AI sense. These are logical statements that define
relationships -Data A in the data lake implies the existence
of Data B available in a view. These rules can standardize
things, fixing issues like the naming problem, where the
same concept might be referred to differently in multiple
places due to differing languages, standards, limitations
from a source program, or using jargon. Rules can also be
created that define relationships between views and other
views, enhancing them, such as calculating KPIs for a shift
Previous Page Next Page