Practical approach
Knowledge within an organization is dynamic and becomes updated in real time. Currently, in Big Data we have the ability to collect data from customers as well as gather market data, social media data, system logs, and sensor data.
In Big Data, there are millions of records and far more, which is why it is important that the entire data analysis process can run automatically. By applying automation, employees do not have to waste time manually organizing all the collected knowledge. Modern Knowledge Management Systems (KMS) can leverage machine learning, predictive analysis, and various recommendation algorithms. This approach helps uncover hidden relationships in company data, market trends, and anomalies. For example, a knowledge management system may detect that products with a specific composition currently have higher sales.
Table 1. Overview of the most popular KMS systems for Big Data and their features.
| KMS | Type | Features |
|---|---|---|
| Hadoop with ecosystem (HDFS, Hive, Spark) | Open Source | Stores and processes vast amounts of data, enabling analysis and knowledge extraction from logs, IoT data, and social media. |
| Apache Solr | Open Source | Indexes vast collections of data and documents, supports NLP, classification, and recommendations. Also creates something like Google within the organization. It's a tool for content searching, not for management. |
| Apache Atlas | Open Source | Data management within the organization, audits, documenting data flows, and data cataloging. It enables integration with Hadoop (storage), Hive (searching), Kafka (streaming), and Spark (processing). It's a tool for data knowledge. |
| Elastic Stack | Open Source | A highly advanced knowledge search engine, text analysis, logs, documents, and knowledge dashboards. Enables fast searching and visualization of knowledge. |
| Databricks | Commercial | Data warehouse, supports ML, NLP, and predictive analysis. Enables generating knowledge from data. |
| Microsoft Fabric/Azure Synapse | Commercial, Cloud | Integrates data, analytics, ML, and reporting. It allows building knowledge models and automatically generates knowledge from operational data. |
| Google BigQuery | Commercial | Very fast searching across petabytes of data. Characterized by extremely rapid knowledge generation from Big Data. |
| IBM Watson Knowledge Catalog | Commercial | Catalogs knowledge and data, automatically tags and classifies them. It's integrated with AI and ML. |