Modern Data Warehouse - Design for Analytics & Data Security

Blog Editorial Team

Welcome to the third part of our data journey. Now you know what a Modern Data Warehouse is and how to successfully migrate data to the cloud, we can talk about getting our insights!We will continue going through a typical Modern Data Warehouse project. If you missed out on the last part, get up to speed here. Today we will find out how to analyse data, create user-friendly visualisations, and keep data safe for your users. Let’s begin!

How to fit large amounts of data into Azure Analysis Services? Challenge 4

Once the dimensions and facts are calculated, data is exposed for analysis using a dedicated service, such as Azure Analysis Services. AAS provides additional analytical features that let users slice and dice data from different contexts, define aggregation measures, play with time intelligence, and look at the data from a hierarchical perspective. In general, analytical databases form another abstraction layer that enhances your data analytics and reporting capabilities.

What are the storage capacities for Azure cloud services?

When it comes to the available data storage services on the Azure cloud, most of them have a sufficient amount of available space. For example, Azure SQL Database Hyperscale can store up to 100 TB of data, and Azure Synapse Analytics up to 1 PB. In the case of Azure Analysis Services, the biggest available service tier can hold up to 400 GB. The important thing here is that you should leave some memory free (about 20-25%), otherwise data processing will fail. Therefore, the available memory space is effectively around 300 GB. As a result, we had yet another challenge – how to fit several terabytes of data into ~0,3 TB space. The answer was quite simple. With the removal of one unnecessary column, we were able to reduce the model size by 70%. The implemented model stores 5 billion records and occupies 200 GB of storage. This might not be a tremendous amount of records but still, without a few adjustments, it would not even fit within 1 TB. Great work here is done by the VertiPaq engine, which compresses data at a high ratio. However, this is not everything. When it comes to very large analytical models, you need to analyse the problem from a wider perspective. If you successfully fit the data inside the model, you're only halfway there. Still, a good model performance is another milestone that has to be reached.

How to build a fast and responsive dashboard? Challenge 5

Fast and responsive reports are probably the most crucial element of the whole Modern Data Warehouse engagement. This is the major connection point where the users gain business insights from all the data collected. Anything behind reports and dashboards is hardly visible, yet it probably consumes 90-95% of the time and effort to provide and display results. However, it is the remaining 5-10% that determines the success or failure of a project. If your reports contain errors and do not refresh within a few seconds, then your project will fail and 100% of your effort will land in the trash. Users will not be interested and will go back to the old way of reporting, however old-fashioned it might be. Model optimisation is keyA large amount of data always leaves quite a mark on query performance. If the model is not optimised and measure definitions are poor, then query performance might decrease exponentially as the data volume increases. However, a well-designed model will not have any significant impact on performance even if data grows from 2 to 4, to 8 billion records. Once the model is done, it might still need additional improvements to produce a fast and responsive report. Each selected visualisation and related query might require additional analysis and optimisation, or changes at a model level. What is also important is how many visualisations you have and how they are organised. Sometimes it is better to combine multiple visualisations into one. This step reduces the number of queries sent to the model and lowers the number of queries. In one of the cases, the users needed Power BI dashboards with a live connection to Azure Analysis Services. The reports were used by people at every organisational level, starting with employees located at a particular location, all the way up to top management. In the end, they were utilised by several thousands of users daily. The expectation was to get the reports refreshed within 3 seconds after an interaction, such as changing a filter value. Following the tips above, we were able to achieve this result even though the queried model contained 5 billion rows. Note: A well-optimised and responsive analytical model is one of the required steps to achieve success. The next stage is an eye-catching and well-developed report. Creating one requires two different sets of skills (development and graphic design). The final step is adoption within the organisation. A completely different story and a topic for another blog series.

How to secure data for a large number of users? Challenge 6

It seems that everything is done. Data flow pipelines are stable and process data at set time intervals. Insights are available on time and can be consumed from dedicated reports or through analytical models. But there is one more important thing to do. We need to grant access to this information for a large number of users. Moreover, access levels should be different, depending on the role within the organisation. In one of our cases, the daily number of active users stood at several thousands. The expectation was to grant access to data based on the organisation's hierarchical structure. Each user could have a unique set of requirements. Altogether, there were around 500,000 different security combinations that needed to be applied. More importantly, with over 25,000 employees, the company had a lot of personnel changes. Some people would leave the company, others would join it, someone would be promoted, etc. Security measures needed to take these constant changes into account. They had to be reflected in the security model, and all adjustments should be done automatically.

How to manage user lifecycles automatically?

Creating dedicated AD groups for this many cases is not a good idea. There might be some external solutions that can handle and automate them. We went for a more convenient option.Our solution was to implement row-level security in the analytical model, combined with the organisational structure. Security is driven by the place and position to which a particular employee was assigned. We used the central employee database as a data source.As a result, security requirements are evaluated automatically, based on the changes within the organisation. Furthermore, additional efforts were made to manually add exceptions. In some cases, we needed to add different security rules. Nevertheless, everything is managed and driven from one central place. Using this approach, every employee has access to the information they need. An employee at a local branch sees data related to that branch. District Managers can see all data from their districts, and Brand Managers see all data related to their brand.

Don't miss part 4!

And that’s a wrap on the first stage of our journey! Now you know what a Data Warehouse is, why it’s important to modernise it, and how the Azure cloud can help with it. We’ve also gone over an MVP phase of a Data Warehouse modernisation project, and how to tackle the most common challenges.Click here for part 4, where we go over a Data Domain Framework, to show you how you can make managing data projects much easier.

See other articles in our Modern Data Warehouse series:

Author

Blog Editorial Team

We analyse the latest IT trends and industry-relevant innovations to keep you up-to-date with the latest technology.