Practical approach

Knowledge within an organization is dynamic and becomes updated in real time. Currently, in Big Data we have the ability to collect data from customers as well as gather market data, social media data, system logs, and sensor data.

In Big Data, there are millions of records and far more, which is why it is important that the entire data analysis process can run automatically. By applying automation, employees do not have to waste time manually organizing all the collected knowledge. Modern Knowledge Management Systems (KMS) can leverage machine learning, predictive analysis, and various recommendation algorithms. This approach helps uncover hidden relationships in company data, market trends, and anomalies. For example, a knowledge management system may detect that products with a specific composition currently have higher sales.

Table 1. Overview of the most popular KMS systems for Big Data and their features.

KMS Type Features
Hadoop with ecosystem (HDFS, Hive, Spark)Open Source

Stores and processes vast amounts of data, enabling analysis and knowledge extraction from logs, IoT data, and social media.

Apache SolrOpen Source

Indexes vast collections of data and documents, supports NLP, classification, and recommendations. Also creates something like Google within the organization. It's a tool for content searching, not for management.

Apache AtlasOpen Source

Data management within the organization, audits, documenting data flows, and data cataloging. It enables integration with Hadoop (storage), Hive (searching), Kafka (streaming), and Spark (processing). It's a tool for data knowledge.

Elastic StackOpen Source

A highly advanced knowledge search engine, text analysis, logs, documents, and knowledge dashboards. Enables fast searching and visualization of knowledge.

DatabricksCommercial

Data warehouse, supports ML, NLP, and predictive analysis. Enables generating knowledge from data.

Microsoft Fabric/Azure SynapseCommercial, Cloud

Integrates data, analytics, ML, and reporting. It allows building knowledge models and automatically generates knowledge from operational data.

Google BigQueryCommercial

Very fast searching across petabytes of data. Characterized by extremely rapid knowledge generation from Big Data.

IBM Watson Knowledge CatalogCommercial

Catalogs knowledge and data, automatically tags and classifies them. It's integrated with AI and ML.