Data Engineering
The Challenge
With the advent of digitization into mobility, the car has become an information and communication platform. New challenges arise in data processing and data transmission, within the framework of this paradigm shift from driver to autopilot with the goal of fully automated and networked driving. The modern car collects information on location, speed, road users and traffic signs, among other things. In order to process this high volume of data and to play back analysis results in the car, an almost real-time system is necessary. Therefore, the technical solution space for such an analysis system should be evaluated and implemented in the project. The goal was to identify the most powerful stream processor and design a virtualizable environment that simplifies deployment of stream processors and optimizes their resource utilization in the Hadoop cluster.
- Use case
German Car manufacturer
Data processing of vehicle and traffic data
- Aim
- Streaming route comparison and operation system
- Tasks
- Developing a big data platform
- Virtualizing the platform environment
- Comparing and evaluating the stream processors
- Technology
- Spark
- Samza
- Storm
- Docker
- Kafka
- Hadoop
- HDFS
- YARN
- ZooKeeper
The Solution
To identify the most suitable stream processor, a local environment was set up in the first step, which allows the different stream processors to be configured and scaled. The individual process steps of data processing were implemented as independent, reusable components. The advantage is that the stream processors can be used unchanged and thus remain comparable. Appropriate performance indicators (KPI) and user stories were defined as prerequisites for the evaluation. The KPIs refer to customer-specific requirements, such as latency, throughput or functionality in monitoring and the provision of streaming tasks. The user stories also cover the “soft skills” of the stream processors. The subsequent evaluation provided information on weak points and possibilities for optimizing data processing with the various stream processors and their individual components. In the next step, the developed proof-of-concept was established on the customer’s own Hadoop cluster. TIQ Solutions implemented this on the basis of a virtualized environment using containers and an easily scalable big data architecture.
The Result
A big data platform was provided to test and analyze stream processors. After completion of the evaluation, the customer received a concrete recommendation for the most suitable stream processor and the optimal configuration of the streaming data processing path. Our customer can now independently create and test data processing lines between his vehicles and the analysis system according to the application. In particular, it is possible to configure and scale the process steps with regard to validation, transformation and process flow (dispatching). By virtualizing the individual components with Docker, the user can easily create new test routes or copy and modify existing routes.
- Customer benefits
- Independent creation and optimization of streaming routes
- Easy provisioning of streaming components by containers
- Scalability through big data architecture
- Optimal use of cluster resources
- Transparente Auswertung der Effizienz des Streamings