Data Engineering
The Challenge
Digitization is well advanced in communications services. Today, landline telephony, television and internet are being offered from a single source via the same network infrastructure (triple play). Sound, image and data are transmitted IP-based via data packets. Telekom Deutschland has a nationwide, complex communication network. As a result of digital progress, new technologies and hardware manufacturers are constantly being introduced into the existing network. In order to keep pace with this dynamic, Deutsche Telekom uses a software system to analyze the quality and availability of its services and to monitor the utilization of the network infrastructure. In addition, Deutsche Telekom also planned to further expand the scope and sound quality of its HD programs and the functionalities of its services. The resulting increase in data volume would no longer have been controllable for the existing analysis system. TIQ Solutions, as a long-term consultant for the existing solution, therefore recommended switching to a new big data system.
- Case of application
Telekom Deutschland GmbH / T-Systems International GmbH
- Goal setting
- Processing of large amounts of data
- More up-to-date, flexible and broadened view of the data
- Reduction of operating costs
- Improved analysis possibilities on historical data
- Generic processing processes
- More efficient data analysis
- tasks
- Conception of the new system architecture
- Implementation of PoC
- Migration of business logic
- Migration of historical data
- Implementation of the security concept
- Technologies
- Hadoop
- Cloudera
- Hive
- ZooKeeper
- Oozie
- Hue
- HDFS
- YARN
- Parquet
- individuelle Erweiterungen mit Java
- Bash Scripting
- Enterprise Architect
- QlikView®
The solution
The architecture of the new big data system and its technological implementation were developed in workshops lasting several days together with the IT and the specialist department. For the first time, big data’s powerful processing mechanisms made it possible to separate the technical data model from technical barriers during loading. TIQ Solutions also specified the necessary applications, their rights and the routing between the nodes in the cluster and the external QlikView server providing the visualization. The Oracle database has been replaced by Hive. Here, TIQ Solutions developed a generator that creates the various Hive table objects including their attributes for the more than 30 heterogeneous data sources, thus generating the database semi-automatically. Data processing was implemented with the Oozie Workflow Scheduler and replaced Informatica. For the Hadoop cluster, a rights and roles concept based on Kerberos was then designed and implemented. The historical data was initially migrated to the Hadoop cluster. For the business users, the usual access to the raw data via SQL statements was established via the HUE web frontend hue. During the migration of the business intelligence application QlikView®, TIQ Solutions primarily advised on optimizing the loading logic.
The result
The existing classic, purely relational database system was successfully transferred to a big data system. The new options eliminate aggregations and filters for limiting data volumes. This greatly increased the resolution of the data (granularity), the scope of the history and the consistency of the observable periods. This gives Deutsche Telekom a more precise picture of the state of its network and the current quality of the IPTV and VoIP services offered (Quality of Service). The resulting long-term observations are now more comprehensive and help to detect creeping state changes. The distributed processing in the cluster has made it possible to increase the load of raw data during the day and gives the department a quicker overview of the current transaction data. The use of the Cloudera Hadoop distribution and the integrated applications for data integration and data retrieval provided a consistent system. The operating costs of the software solution and the hardware could thus be reduced many times over.
- Benefits
- Reduce subscription costs
- Verringerung der Kosten für Datenhaltung
- Improved informative value of the analyses
- Minimum expansion effort
- Efficiency for growing data volumes