Introduction
Time is money, especially today, when seconds sometimes making the difference between failure and success. Improving data refresh time with Power BI can reduce those seconds significantly and over time make a huge difference.
Our client has finance reports built in Power BI, with data coming from various sources:
• D365 connector for Power BI
• Data Lake export from D365
• Excel files
With every new report, entity relationships became more complex as more data was loaded. This has caused report refresh performance to decrease.
Proof of concept
Suggested approach
To consolidate all data in one source and to improve performance, Pontis team suggested moving the data to an SQL database and connecting it to Power BI. Here are some advantages and disadvantages of this approach:
Pros
• Faster refresh time
• Better Power BI performance
• Possibility of enabling hourly refresh during client working hours
• Faster reports development process
Cons
• Increased complexity of the architecture
• Increased costs for Azure resources
Agreed approach
After consulting with our client, we have decided to try a proof of concept using MS SQL but without the Azure synapse, as it might increase Azure costs. Moving data from Data Lake to MS SQL will be done using scripts and SSIS services.
POC implementation
For POC, the Profit and loss report was used. Two of the largest tables, with 92,000 and 31,000 records, were imported from Data Lake to MS SQL.
The loading of General Journal tables from Data Lake takes 7 minutes. When these tables are related to other tables required for reports, this time can increase significantly. Sometimes it takes 30-45 minutes to refresh data.
On the other hand, the loading of General Journal tables from MS SQL takes less than one second. When these tables are related to other tables required for reports, the loading time also increases, but now it takes less than 10 seconds.
Solution 1: This architecture proposal involves transferring data from Data Lake to MS SQL server. The source data in Data Lake consists of CSV files, with each table having its own CSV file. These CSV files will be periodically copied to the MS SQL server file system. SSIS packages have already been developed for importing these CSV files to the MS SQL server for the purpose of the POC.
Solution 2: Complete architecture proposal for transferring all data from D365 to Data Lake. Currently, there are still 7 tables being used directly from Dynamisc365 with a legacy connector, which is the slowest part of the reporting system. For the tables in D365, a corresponding Data Lake table should be identified. If it does not exist, it should be included in the export to Data Lake. The same fields in these tables should be identified and all data should be consolidated in MS SQL. This would complete and speed up the system.
Conclusion
In daily report work, data refresh is performed at least 10 times a day because each time a report is opened and changes are saved, the data needs to be refreshed. Based on a rough estimate, in the current system, data refresh alone takes up half of the working time. While the data is being refreshed, no other tasks can be done, making such a development environment very expensive and inefficient.
By transferring data to MS SQL and by improving data refresh time with Power BI, the data refresh time is negligible! This makes the environment cheaper, more efficient, and better suited for developers’ work.
Did you know Cloud development can significantly reduce your costs? find out more here.