SoftwareOne logo

Knowledge Mining Microsoft Azure - Effective use in practice

SoftwareOne blog editorial team
Blog Editorial Team
A person holding up a camera lens with a sunset in the background.

It is one of the hottest topics in AI. Microsoft is actively developing it, and we offer a solution based on this service. It's time to see Knowledge Mining in practice!Today, we want to tell you about our solution and how we used it as part of the implementation for one of our clients.We have been working with this technology for some time now. It's important to note that as we heavily use Microsoft Azure services, this is not a finite and out-of-the-box product. We can easily change and adapt it every time, to suit the individual business needs.

The main components of SoftwareOne Knowledge Mining solution

For this case, based on our previous projects, we have composed a solution which consists of two modules. The two components are Intelligent Search and Content Extractor.You can modify and enhance both parts. They work seamlessly together as they are based on the same architecture, however they can be separated if needed as well.

Intelligent Search

Intelligent Search is the main pillar of our solution. It is suitable for companies which hold huge amounts of data, where most of it is not stored in a structured way or easily searchable.For example, if an employee wants to find relevant information about an urgent or important topic, they have to manually go through documents, images and videos. Even if they are available in a digital format!Intelligent Search uses an AI-based set of tools to efficiently store, index and inspect files in every form.

Content Extractor

The second component is the Content Extractor. It is intended for business units processing large amounts of documents from various internal and external sources, which contain an excess of irrelevant content.This is useful for when employees want to quickly extract information of interest and use it in a different format or as input for a separate analysis. Most importantly, this process should not be manual but automatic – with the appropriate parameters defined.Content Extractor uses text mining and machine learning algorithms to locate specified and relevant elements among the entire content.

Knowledge Mining in practice

Enough of the theory – let us tell you how we used these tools to solve real business problems. We worked with the engineering department of one of the largest airlines in the world. They had two very specific challenges to tackle, for which our solution seemed to be a perfect fit.

Helping engineers to react quickly

Let's start with the first problem. The engineering department receives requests or queries, which include a description of a defect to be fixed by the airline's engineers. The inquiries are stated both in printed and handwritten text which makes understanding the content challenging.Engineers then respond to these requests. They give specific recommendations and answers, often using handwriting again, but also adding images, e.g. to indicate the part of the aircraft they are referring to. All these conversations are kept in the same file and contain vast amounts of knowledge and information, as they result in the problem being fixed.Now, the engineering department has more than 10 years of history which amounts to thousands of requests in a form of PDF files. So, several engineers thought – when a new request or defect information is coming, why don't we look for similar cases and a solution among them? After all, not all of them have enough experience to solve the issue on the spot and time is money, especially within the airline industry.There were, however, some challenges related to solving this problem. The airline's data included loads of documents which were kept in multiple locations, content stored in various formats (printed text, images and handwritten notes), and many documents not relevant to end-users at all. This is where Intelligent Search came to rescue!

How did we solve the problem?

We have used the solution to perform the following steps:

  1. Gather all the documents in a centralised storage space for easy access and further analysis
  2. Process the documents with Form Recognizer to extract the pieces we want to search for (for instance: subject of the defect, aircraft type, manufacturer etc.)
  3. Create an index out of the extracted pieces to make them searchable for a business user and retrieve the relevant requests.

This was, however, only part of the successful solution. We have also proposed an intuitive, tailor-made web app interface which employees can operate with ease.

What are the benefits of Intelligent Search?

This service has multiple functionalities. It allows the users to:

  1. Use simple or advanced search to retrieve documents (e.g. simply type "broken landing gear" or select additional filters such as Engine, Code or Model to narrow down the results)
  2. Use tags attached to each document for filtering (e.g. show only documents with "remove" stated as a fix)
  3. Preview the document in its raw and processed form
  4. Download the entire document or only a chapter containing relevant solution
  5. Rank the documents (very important!) according to search phrase and user feedback. Over time, the system has learnt to promote the documents which were more helpful for the users.

By using Knowledge Mining in practice, our client can take advantage of all their historical knowledge, without having to look for the same answer every time. This, in turn, reduces the time needed to find accurate and efficient fixes, meaning that problems can be solved faster.Documents are retrieved in order of relevance to the user, thus further speeding up the process. They can also be categorised and filtered by specific fix types, making finding the right solution simpler.

Automating manual work

Let's move on to the second business challenge, where we have used Content Extractor.Our client periodically receives new aircraft manuals and documentation. These can sometimes amount to more than a thousand pages per document, as they include recommendations for every single aircraft configuration – and, as you can imagine, there are loads of them.However, our client needs data only regarding the fleet they own. They would take this information, place it in their internal document format and redistribute it to engineers. They, in turn, would implement the specific manufacturer's recommendations in the machines they own.This information is crucial, sometimes concerning vital parts of an airplane, so it was essential to extract and transform the relevant content quickly but without any loss of information. Previously, engineers had to extract and transfer the relevant information entirely manually. Since they first had to find only the configuration and groups which concerned their aircraft and then extract text, tables and images, the whole operation could sometimes take hours, if not days. We wanted to minimise this tedious effort, so they could spend their time on more important tasks.Luckily, there are tools for that![caption id="attachment_179295" align="aligncenter" width="1024"] Microsoft Form Recognizer demo (source)[/caption]

How did we solve the problem?

We used different techniques to extract parts of the documents:

  • Text – text mining methods to locate the relevant part of the texts, using regular expressions, substrings etc.
  • Images – image detection and location
  • Tables – a Custom Vision model to teach the mechanism how to determine that a part of the text is indeed a table and extract it accordingly.

The operations above prove that extracting parts of a document is not such a simple task after all. Nevertheless, thanks to Azure Cognitive Services and Azure Functions we were able to implement an efficient solution.

What are the benefits of Content Extractor?

In this case, we targeted and achieved a number of results. We managed to automate mundane tasks of extracting and copying relevant content, saving precious time. We also converted the information coming from different sources to a unified format, making it easier to search through.The solution also allows to extract a section of text indicated using specific keywords for more accurate results. Finally, we were able to transfer content 1:1 without losing any valuable information.Consequently, the service boosted engineers' productivity, reducing the time needed for extracting the information from hours to minutes.

Why choose our solution?

Using Knowledge Mining in practice gives you a range of benefits, but most importantly, you can save valuable time. No more trawling through endless files looking (sometimes repeatedly!) for a single piece of information. Whatever you need is ready to access within minutes.And if that wasn't enough, here are additional features of the solution which make it a great choice for business file search:

  1. Cloud-based: The solution is based on the Microsoft Azure cloud. Therefore, it doesn't require investment in hardware and is easily manageable.
  2. Secure: It is fully secured by Azure AD and the data is stored in highly protected data centres. You can integrate additional solutions, such as Okta, for another layer of security.
  3. Scalable: Using Azure Data Lake for storage and tiered Cognitive Search, the mechanism is very efficient and scalable. Desired storage and computing power can be increased with a single click.
  4. Fueled by AI: SoftwareOne Knowledge Mining solution is based on the newest, state-of-the-art services such as Cognitive Services, Form Recognizer and OCR mechanisms. Microsoft maintains and constantly enhances this suite of tools.
  5. Customisable: Our reference architecture addresses the general problems in knowledge management. However, we can adjust the web app to your needs. We can also include additional mechanisms such as role-based access and managerial Power BI dashboards.

Don't miss out on your free demo

We are already seeing huge interest in our Knowledge Mining solution. What you need to note is that even if your case is different to what we've mentioned, we can easily adapt to your needs.Here are some other cases which we are aiming to tackle:

  • Creating an enhanced knowledge platform, which will not only possess advanced search capabilities but also an advanced uploading and rating process
  • Including snapshots of videos and performing advanced analytics (object detection, sentiment analysis, language recognition) to provide another type of searchable content
  • Extracting the information from unstructured data (.pdf, .xlsx) and transforming it to a searchable and structured form, so it can be further used in reporting and with AI.

Would you like to make more of your information? Just leave us your contact details and we'll be in touch to schedule a demonstration for you!


SoftwareOne blog editorial team

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.