SoftwareOne logo

Modern data warehouse: Everything you need to know

SoftwareOne blog editorial team
Blog Editorial Team
A blue and purple wave background.

Most companies (and most likely yours too) want to be data-driven and/or monetise their vast amounts of data. But before you start using advanced technologies like AI or Machine Learning, this data has to be prepared first. It's a complex process, but from this series, you can learn what it involves.Nowadays, a Modern Data Warehouse is a need for virtually any company. The internet is full of suggestions, reference architectures, and recommendations on how to do the job well. The theory is a good starting point, but it doesn't always correspond with reality.In this brand new article series, we will share our experience from real Data Warehouse modernisation projects. We will cover many aspects, not just those related to development.

Which topics will we cover?

When it comes to large projects, development is not the biggest challenge, especially when you have a mature and experienced team. So, over the course of 6 dedicated posts, we will shed some light on the following topics:

  • Modern Data Warehouse architecture and challenges
  • Our Data Domain Framework (DDF)
  • Project organisation, tools, process structure, infrastructure and environments, automation
  • Master Data Management.

Today, we will focus on the Modern Data Warehouse architecture. We will also touch upon other areas that we will elaborate on later in the series. Let's begin.

What is a Modern Data Warehouse?

First, we need to clarify the concept of data warehouse. According to Wikipedia:"a data warehouse...is a system used for reporting and data analysis...Data Warehouses are central repositories of integrated data from one or more disparate data sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise."(Source)Data Warehouses are well-known and widely used. But how is a modernised version different?

Why do you need a Modern Data Warehouse and what does it allow you to achieve?

Here are the two main reasons why you need to consider an upgrade:

  • You have Big Data and unstructured / semi-structured data that must be integrated with structured data
  • You want to use modern public cloud data services (keep reading to find out why this is a good idea).

Microsoft describes this solution in the following way:"A modern data warehouse lets you bring together all your data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics for all your users... It combines all your structured, unstructured and semi-structured data. In other words, a Modern Data Warehouse can handle much larger volumes of data and perform complex operations on multiple types of data, giving you in-depth insights.The Microsoft concept of a Modern Data Warehouse is based on multiple Azure cloud services:

  • Azure Data Factory
  • Azure Data Lake Storage
  • Azure Databricks
  • Azure Synapse Analytics
  • Azure Analysis Services
  • … and others.

Why do you need a Data Warehouse modernisation?

It's hardly possible to find an enterprise that doesn't possess some kind of Enterprise Data Warehouse (or EDW for short) solution, or at least some elements of it.Every business needs some sort of reporting and/or dashboarding. Typically, it also uses many different systems to conduct its day to day operations, so the data must be acquired, cleansed, transformed, and integrated beforehand.However, more often than not, an existing EDW is a result of years of constant development due to the evolution of a company itself, and adaptation to an ever-changing external business environment.A Business Intelligence project is never finished (we know that BI is not a very popular term right now! These days, we have either a Big Data or an Advanced Analytics solution aided by Machine Learning and/or Artificial Intelligence).Unfortunately, the pre-cloud era in IT did not support an agile and flexible approach to any system development or deployment. But let's be clear, it does not mean all past deployments are a complete mess. However, the combination of:

  • the ever-changing business and technical requirements
  • serious scalability challenges of on-prem hardware and software platforms, and
  • not being suitable for the waterfall approach

makes most EDWs look a bit like monsters: unfriendly, not useful, not eager to cooperate, yet fighting to maintain the status quo.

Why is the change important?

You have probably heard this already. "Data is the new oil of the digital economy" We even wrote it ourselves once before. And yet, data professionals, engineers, and scientists point out that only a small amount of the available data is being utilised. And that's despite the fact that companies are exposed to an unbelievable explosion of information.Therefore, with public cloud services, availability, and maturity of the Platform as a Service (PaaS) computing, and considering the data processing needs of virtually every business, now is the best time to modernise the existing data warehouses.

Why is PaaS the right choice?

We've already mentioned the scalability and deep insights available with this solution. What's more:

  • PaaS computing requires less administration and management from the infrastructure personnel as there are fewer maintenance responsibilities (version updates, patching, etc.)
  • PaaS offers better scalability, easy and fast provisioning, and configuration
  • High Availability (HA) and Disaster Recovery (DR) features are available out-of-the-box
  • From a financial point of view, the public cloud offers a low entry cost and a pay-as-you-go model.

There are of course many concerns that need to be taken into account, such as:

  • privacy and compliance
  • no previous public cloud experience, or
  • the lack of knowledge with on-prem/public cloud integration (many of the existing data sources, mainly Line of Business systems, are on-prem solutions).

How does the Azure cloud help?

Modern Data Warehouse can't exist without modern infrastructure. As a Microsoft Partner, we work within the Microsoft ecosystem, especially the Microsoft Azure cloud. A lot of our development is done using Microsoft stack.We focus on Azure PaaS data services whenever possible, our intention being to reduce overhead related to infrastructure maintenance. Of course, during a project like this, we have to touch upon many other systems beyond the Microsoft ecosystem.Azure PaaS is crucial here. It helps us save hundreds, if not thousands, of working hours.Selecting and buying the right hardware, then setup, tuning, high availability and disaster recovery setup, troubleshooting, etc. – it's usually a nightmare. And this is just the tip of the iceberg compared to what you have to take care of in on-premises environments.Moreover, you can't be 100% sure that your hardware estimation is accurate. Therefore, for safety reasons, you buy more than necessary and your infrastructure might lay idle.On the other hand, within a few months, it might not be sufficient. Then you can try to scale up/out. No matter what you do in this case, it is not something that you can achieve within a few minutes or on demand. 

What are the advantages of using the cloud?

It is extremely difficult to estimate the project scope, and business users' needs and requirements for such a comprehensive project, especially at the beginning of a digital transformation in a large organisation.These kinds of engagements don't just take a couple of weeks. Many assumptions will change from phase to phase. Internal and/or external factors (like COVID-19) influence the business. The cloud gives you a number of advantages. In today's world, we operate in a continuously changing environment. You need to be flexible to adjust to a new reality fast. The infrastructure needs to be flexible to meet demand, business needs, increasing data volume, increasing queries, users, etc. Fast adoption is a must.Thanks to the cloud environment, we can now react instantly. Need more computing resources? No problem, let's scale up the service or provision more database instances to meet the demand. Without this ability, any advanced data project is likely to fail.Another thing worth noting is the environment provision and configuration. With the Azure public cloud, we can build in an automated manner a set of environments (development, test, and production). With a single button click, we can provision the whole environment from scratch using automation pipelines.Additionally, all environments are consistent, which means we avoid issues related to software versions when deploying the solution to test and production.The conclusion? If you are not restricted (by law, or regulations) – move to the cloud. Do not experiment with on-premises, and save your time and money.

What does a typical project involve?

In the next article, we'll start explaining how to implement a Modern Enterprise Data Warehouse, based on the projects we delivered. But before we do this, let's add some background information from those engagements.

What are the frequent reasons for modernisation?

From our experience, clients decide to undertake Data Warehouse Modernisation projects due to many business and technical challenges they faced with the previous versions of the solution. Here are the most important reasons:

  • The data is from different sources, taking it out takes hours, days, and weeks. This results in missed opportunities
  • There is no single view on spending habits, which leads to costly procurement
  • It is hardly possible to join social data with the company's own. Consequently, marketing campaigns are less effective and inefficient
  • It isn't possible to identify the most valuable customers and find out what would allow to retain them
  • Data quality isn't good enough and leads to unstable reporting from the business perspective
  • The import of actual data packages from Line-of-Business (LoB) systems often fails due to scalability and availability issues
  • The window for data acquisition, reconciliation, recalculation, and reporting is often too short to deliver.

Last but not least, most clients want to transform their business with digital services aided by Machine Learning and Artificial Intelligence. The first step to achieve this ultimate goal leads through an Enterprise Data Warehouse modernisation project.We'll end here for now. In the next article, we start looking at what conducting such a project involves. Click here to go to the next article. 

See other articles in our Modern Data Warehouse series:

  1. How to design a Modern Data Warehouse architecture
  2. How to use the cloud to deliver actionable business insights
  3. How to run a Modern Data Warehouse project effectively
  4. How to keep development under control
  5. How to incorporate Master Data Management

Author

SoftwareOne blog editorial team

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.