SoftwareOne logo

Data Warehouse Project - Reliable Project Delivery Framework

SoftwareOne blog editorial team
Blog Editorial Team
A blue and purple wave background.

Modern Data Warehouse development and deployment should be treated not as a single project but rather a program, which consists of multiple parts. During the entire lifecycle, it is important to work according to a defined standard. This is why we developed the Data Domain Framework (DDF), which is the focus of this article. A Modern Data Warehouse project does not finish after the first release, it continues forever exactly like every Business Intelligence engagement. The reason for this lies in constantly changing business requirements and an ever-evolving business environment. Also, a Business Intelligence project touches all the organisation and every department. The subsequent stages are developed and iteratively extend solution capabilities.

Why is it important to have a common standard for a data project?

Imagine a team of 10 developers whose task is to integrate 10 relational database systems. Each one has achieved their goal, the systems are integrated. Therefore, from a business perspective, the requirement is met. Nevertheless, if you look under the hood, there would be probably a lot of discrepancies as to how the solution was developed and delivered. You could say: „So what? It works and business departments are satisfied. You are just looking for problems!”. I agree, user satisfaction is crucial, but the focus here is on the development team who deliver the solution. A situation like this would impede their work and further development, and consequently, delivery time. Therefore, business users would get the requested solution later. In the future, things like adding extensions, implementing updates, performing maintenance, etc., will become more complex and difficult. At some project stage, there would be no other way than to rewrite the solution. This might be a catastrophic scenario for the project, its budget, and timeline, and it should be avoided at all costs.

HOW TO MANAGE A DATA PROJECT PROPERLY?

To keep consistency and avoid discrepancies, you need to define a unified approach to development. If an integration of 10 relational database systems is taking place, then this is something repeatable. Some things can be automated, while others can be shared and reused. This is not just about writing common, reusable code. It is also important to define naming conventions, folder structure, and layouts, etc. Standards should be defined from two perspectives.Project setupThis concerns the key elements of the project. An example can be a naming convention for the source systems. Throughout the project, they will be referred to multiple times at different stages. A good example is how should a data warehouse or other source systems be called and referred to, e.g. dw, dwh, or wh?ToolsThis perspective specifically concerns used services. For example, if Azure Data Factory is used to orchestrate the data pipelines, then how are the pipelines named and placed into folders? In addition, how are Azure Data Factory datasets named? How should we define linked services?

WHICH TOOL CAN YOU USE TO HELP YOU MANAGE A DATA PROJECT?

An answer to this is a proper framework. And, luckily, we have one ready-made! "In computer programming, an application framework consists of a software framework used by software developers to implement the standard structure of application software...a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software. It provides a standard way to build and deploy applications and is a universal, reusable software environment that provides particular functionality as part of a larger software platform to facilitate development of software applications, products and solutions." (Source: Wikipedia). Using these principles, we invented and developed the Data Domain Framework. It provides standards and guidelines for building and deploying Business Intelligence solutions in general, and specifically for Data Warehouse Modernisation projects. To some extent, it also provides some generic functionalities like Testing Framework and Adoption Measurement.

What is the Data Domain Framework and why is it useful?

The Data Domain Framework (DDF in short) can be defined as a set of rules, standards, and patterns that describe and indicate the path on how Data Domain projects should be organised, structured, and conducted. It allows you to:

  • Organise the project in a logical, clean and consistent way
  • Take a coherent and consistent approach and actions
  • Reuse repeatable patterns (instead of reinventing the wheel)
  • Increase and assure high quality
  • Speed up the delivery process
  • Stand out from the crowd.

Note: The Data Domain Framework is a part of a wider concept at the SoftwareOne level. DDF is a subset of our Centre of Excellence (CoE), where unified approach and standards are elaborated for each of our activity areas i.e. Apps, SecureInfra, DevOps, and Data. During each project, we always apply the best practices and standards in every area. For example, the infrastructure is not the main specialisation of Data Domain, but we always design and build Azure architecture according to standards defined by our CoE. Additionally, each new SoftwareOne employee is trained and committed to working according to these standards.

The lifecycle of a data warehouse project

As mentioned at the beginning, a Modern Data Warehouse project is a continuous and long-term engagement. The DDF works according to a similar principle. It also has its lifecycle and operates in line with MDW engagements. During the MDW lifecycle, new challenges arise, e.g. new systems are integrated, and new solutions are used. For them, we need to define the new standards, new rules, and new naming conventions, and unify them with the existing framework. It is not possible to define everything in advance, but strong fundamentals give huge benefits and advantages during project development. Here are the roles and processes which govern the framework, and, consequently, data projects.

Roles

The Data Domain Framework consists of three main areas, which are:

  • DataOps – Modern Warehouse and Big Data
  • MLOps – Machine Learning and Artificial Intelligence
  • VAOps – Visualisation and analytics layer.

The person responsible for each of the listed areas is the Area Administrator. They lead, create visions, and plan standards that should be developed and introduced in a short- and long-term future. There are also additional roles, such as:

  • DDF Global Administrator – responsible for general maintenance and development of DDF, not related to any specific DDF area
  • Area SME (Subject Matter Expert) – a dedicated person(s) responsible for a specific topic (e.g. Database Project standards), cooperating closely with Area Administrators.

Process

A process developed 6 months ago might be obsolete or need updates. To cope with the dynamically changing situation, the DDF includes a definition of the lifecycle process.This way, we can introduce changes or new features to the framework in an organised and controlled manner. Depending on the situation, the change might be small and can be introduced within 30 minutes. In more complex cases, or even building something totally from scratch, it might take several iterations to introduce it. During dedicated meetings (elaboration workshops), new standards are designed or updated. The crucial element is to involve the right people. Except the Area Administrator, people who have experience within a specific area should be engaged in brainstorming sessions. It is always good to look at the topic from multiple perspectives. Once the first version is ready or an update is finished, the new standard is added to our CoE repository and made available within the organisation. All subsequent projects are run according to the new standard. As for ongoing projects, the decision stays with Project Owners. This is not the end of the story. The next step is the propagation of new knowledge within the organisation.

HOW TO ADOPT THE DATA DOMAIN FRAMEWORK WITHIN THE ORGANISATION?

To guarantee successful DDF adoption within an organisation, several steps and actions were taken. The whole process is strictly related to our organisational structure.We put a lot of effort into building awareness of the framework. In our opinion, this is the first step to successful adoption.

The DDF is a set of rules which adapt on a daily basis and are not fixed. It can be extended, changed, or aligned to project needs. However, any changes or extensions should be applied according to the above process. Frameworks like SoftwareOne Centre of Excellence or DDF allow us to build a unique, agile, and fast-acting team. If you want to build strong competence within Data Domain, adopting this approach is the necessary step. The business world is continuously changing, and fast adoption is essential to stay ahead of the competition. The DDF lets our team communicate and work in a clear, unified way, to easily accomplish it.

HOW TO ADOPT THE DATA DOMAIN FRAMEWORK WITHIN THE PROJECT?

Organisational measures are a good start, but it’s the project where the framework has the greatest impact. It is not possible to monitor everything manually during a project. This is especially problematic for large engagement. It might be a tedious, time-consuming, and error-prone task. Especially if a large amount of code is produced every day, and someone has to verify a large number of rules and requirements. Therefore, to improve efficiency, we introduced a dedicated automation processes. It verifies whether standards are applied correctly. Automation is applied during a code pull request to the repository. Appropriate policies are introduced to monitor if any rules are broken. In case of any policy breach, the code is automatically rejected.

TESTING FRAMEWORK AS PART OF DDF

A Testing Framework allows you to define your test code, which will be triggered by an Azure Data Factory pipeline, and log its results in a persistent SQL table. All core functionalities are covered by the framework, but it's perfectly possible to build custom logic on top of it, to automate data quality assessments even further. The original implementation was included in data reloading of ADF pipelines during one of the data warehouse projects. As a result, after each deployment, all relevant data quality test where ran fully automatically and provided easy to interpret results that where pointing directly to possible issues and errors in the updated ETL logic.

What else can a Testing Framework be useful for?

Assuming that each developer is responsible for providing tests that verify the quality of their work, the need to manually maintain a library of up-to-date testing scripts is gone with the only requirement being to assure the proper configuration of the framework's workflow. Another core assumption is that Tests consist of a finite number of steps, and the Test itself passes successfully only when they all finish without errors. Consider this when writing your logic, to make sure that the correct result is passed outside of your Stored Procedure or Notebook. Tests also allow the use of Parameters which in essence are JSON files containing key/value pairs that allow the use of generic code in a different context. For example, you can write a query that accepts a dynamic list of tables to calculate row counts. The said list can be provided for each execution within a JSON, thus allowing you to maintain a single Stored Procedure that can run your data quality checks against an endless amount of table combinations. Of course, this example is simplified on purpose – each business case you will try to support can require a JSON parameter value in different shapes and forms.

How to check the results of the Tests?

The current version of the framework comes with a built-in Power BI report that displays an easy to digest summary of all tests run across all your environments. The example metrics include:

  • A summary of your framework's implementation. You can see the total amount of defined tests, the average time of execution, count of logged execution, and a percentage ratio of tests that passed their success criteria
  • The split of test outcome across environments, dates, and a summarised view per each test object (stored procedure, Databricks notebook)
  • A simplified version of the error log with descriptive information on what failed and why that happened. This should point you to the issues at hand.

ADOPTION MEASUREMENT AS PART OF DDF

Adoption Measurement helps to understand how your organisation utilises the available data models and dashboards. It allows you to gain better knowledge on which reports are most popular, what is the peak usage time, which analytical tools are most used, and more. On the other hand, you can also see which reports are not used and maybe should not be maintained anymore. Adoption Measurement is built based on several Azure services. The solution collects user activity logs from all indicated Azure Analysis Services instances and Power BI services. The entire data history is stored and organised within a dedicated database. On top of the mentioned data, the dedicated Azure Analysis Services model stores the most crucial and important measures and attributes. With that, power users can do a more sophisticated analysis. However, within the Adoption Measurement solution, there is also a dedicated dashboard that provides comprehensive insight on user behaviour from the three perspectives: Azure Analysis Services, Power BI Service Activities, and Power BI Service Users.

Summary

This, in essence, is the Data Domain Framework. Adopting a standard like this at the outset of your Data Warehouse modernisation project will help you maintain a unified standard throughout the engagement. This will help you avoid any incompatibility or inconsistency issues further down the line.Check out the next article,where we describe step by step how to organize your project, from setting up your team to preparing your tools, to designing your infrastructure.

See other articles in our Modern Data Warehouse series:

  1. Introduction to Modern Data Warehouse
  2. How to design a Modern Data Warehouse architecture
  3. How to use the cloud to deliver actionable business insights
  4. How to keep development under control
  5. How to incorporate Master Data Management

Author

SoftwareOne blog editorial team

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.