Building A Cloud Data Platform For Analytics And AI

Artificial intelligence is a trending topic that more and more companies want to use. Intelligent and automatic data evaluation, is of particular interest. However, successful use of machine learning requires pervasive data sets because the AI model is trained with these over many iterations to reliable output results ultimately.

But what does the IT architecture behind it have to look like? After all, it must be able to process the sometimes vast amounts of data and scale quickly. This is anything but a trivial matter, so a conventional architecture is no longer sufficient. Instead, innovative data platforms are needed for this new type of digital application. Below we present an overview of the structure of such an architecture, which we developed in a customer project using the Google Cloud Stack.

Challenges In The Introduction And Application Of AI-Supported Data Analysis

The first challenge is the scaling of the IT infrastructure concerning the amount of data. In the next three to four years, an increase of about fivefold is expected. The IT infrastructure that is to house an AI solution for data analysis must therefore be designed for growth from the outset. The increase in continuous data streams—up to 25 percent overall—makes stream processing preferable to batch processing. This often entails a technical change.

To keep up with this, companies have to set a new course concerning the IT architecture and the entire organization. To benefit sustainably from data analyzes of business processes, it is not enough to examine the data pools of isolated silos. Instead, the organization must adapt to a “data culture,” connect previous silos and feed data from all company areas to the AI.

A large part of the data that will flow into analysis processes in the future will be unstructured data – for Example, images, video and audio files, or continuous text. It makes sense to store and process this data using non-relational (or NoSQL ) databases such as MongoDB or CouchDB. However, structured data in SQL databases will by no means lose their validity in the medium term. Therefore, the unstructured data must be combined and merged with structured data, representing an additional challenge.

In addition to all these challenges, know-how and human resources in AI/ ML represent a bottleneck. The organization and infrastructure must generate as much output as possible from as few input hours as possible. This works best with a central Enterprise Data Warehouse (EDW), the structure of which is shown in the next section. For the customer project mentioned, an EDW was introduced with this methodology.

A Central Enterprise Data Warehouse Accelerates Technological Change

To successfully move from a silo to an EDW infrastructure, the following approach has emerged:

Migration of the existing data lake or data warehouse to the cloud: A cost estimate for various architecture models for the EDW was prepared before the project. This concluded that a migration to the cloud could reduce the total cost of ownership (TCO) of a data warehouse by more than half compared to the on-premises option. From an economic point of view, it is also interesting that no capital investments are necessary, but only operating and minor administration costs are incurred for the cloud. Predefined migration scripts help make the transition easy – in our example project from an on-premises solution with Teradata to Google BigQuery.
Breaking down the silo structure, exposing the analytics capabilities, and building a data culture across the organization: Companies generate data in various silos and channels. The fragmentation of the silo landscape is constantly increasing in the course of digitization because each department uses its software. These are often also obtained via a software-as-a-service model so that the data has to be transferred from the provider’s databases to the provider’s systems via interfaces. The data from the silos must first be centralized in the EDW and then made available to all of the company’s stakeholders in a decentralized manner. To enable AI and data-supported business decisions at all levels, employees throughout the company also need the appropriate access. In the central platform, all processes are bundled and examined holistically,
Introduction of context-related decision-making in real-time: Two factors are decisive for a profitable business decision: On the one hand, the execution time or latency, on the other hand, the data context. Above all, spatial data – for Example, where a request comes from – is essential for understanding the analyzed events. Using geographic information systems (GIS) combined with AI was a necessary goal in our implementation example with BigQuery. The advantage of this approach is that data can be streamed into BigQuery and further into a SQL database in real-time. During the streaming process, AI analyzes are possible in real-time.
As with almost all software solutions, it is also necessary with AI to decide between an in-house development – on the Example based on open source frameworks – and purchasing a ready-made solution on the market. However, buying pre-trained AI models makes little sense because they usually do not cover the desired use case. All bids should be scrutinized to ensure they meet the required performance criteria. In principle, integrated solutions can save a lot of time and effort that would otherwise be necessary to develop interfaces between different services.
Unleash data-driven innovation by deploying an appropriate AI solution: Finally, the AI platform brings valuable insights from the data. It makes sense to divide them into three types. “Out of the box” AI is well suited to optimizing data-related business processes in a Customer Interaction Center (CIC). However, these are standard solutions that do not offer significant competitive advantages. Although not yet wholly individual, the second type is more likely: an AI model assembled from ready-made module blocks. This usually fits the task of generating insights from the company’s data. The third type is the most demanding, namely the individual AI model. This is trained from scratch using your own data sets. It takes a lot of time and effort to do this. However, the procedure developed here is unique and can open up a noticeable competitive advantage. The division into these three described types of AI makes it possible to distribute scarce human resources sensibly.

Once all five steps have been completed, the user company receives a powerful solution to gain decision-relevant knowledge from all data streams.

Legacy Systems And Data Quality And Access Are Common Obstacles

A few obstacles usually need to be cleared on the way to the EDW. First, there are legacy systems, which are relatively expensive to modernize and maintain. This limits scalability so that the infrastructure cannot withstand the rapid growth of data. Therefore, the question must be asked: Are the existing systems even able to support AI and ML solutions? Is the effort involved in running and “tuning” them reasonable given the insights they end up generating?

But not only in the infrastructure but also in the process of data collection, some obstacles need to be overcome. Excessively restrictive data protection and security regulations can significantly limit the necessary consolidation of data streams. In addition, the data sources are often not suitable for permanently storing or feeding in the current data. However, AI insights are only as good and extensive as the available database. Therefore, data quality is the fundamental success factor for any AI strategy.

Building A Scalable Data Platform With AI

Our practical Example of a data platform that enables AI analysis functions is based on Google Cloud. However, it could also be built on a comparable provider’s cloud stack, such as Amazon Web Services (AWS) or Microsoft Azure.

The platform is orchestrated according to Continuous Integration / Continuous Delivery (CI/CD) principles. This overcomes previous integration problems so that the developers involved can seamlessly integrate their code into the existing one. Automation comes into play in almost all phases of application development. The following diagram shows what this can look like in practice:

Such a CI/CD pipeline creates a continuous stream of data that leads to insights for the relevant decisions. The solution can react to changes in nearly real-time and take into account feedback loops. For Example, this makes it possible to implement “early warning systems” that enable decisive action in the event of rapid changes.

Finally, it should be mentioned that business analytics is not a purely technical task and that AI/ML models do not lead to results “by themselves.” The contextualization of analysis results and their understanding as a basis for decision-making are still with people – more precisely, with management.

Nevertheless, companies that invest in the appropriate infrastructure today will be able to use the insights from AI analysis for themselves sooner. Over time, your competitive advantage over those competitors who do not want to or cannot raise the data treasure in their company will continue to increase.

ALSO READ: Companies Should Pay Attention To When It Comes To Radio Technology