Get out of the ETL data-filtering trap with the use of Big Data/Fast data solutions and AnalytiX DS

Why is Big Data/Fast data interesting for regular companies? The simple answer is that the Big Data ecosystem provides new opportunities at lower cost – meaning that solutions and services can now be provided that simply were not feasible before because costs were too high.

Many would argue that it is also because the technology did not exist. However for most large corporations there were ways to get it done with regular technology, but it would have been more complex and certainly too expensive, except for the likes of Google or Facebook.

Many of the open source software packages in the Big Data space exist simply because continuing with traditional software solutions did not scale and was too expensive to be a viable option - both from software license and hardware cost perspective. So, how can a large corporation such as a bank, retailer, Telco operator or utility benefit from using these Big Data software concepts?

One example is getting out of the “ETL data-filtering trap”. ETL tools are used for Extracting data from source systems, transforming it and loading it into a target system, in many cases a data warehouse. The issue is that only what is extracted will reach the data warehouse. Hence, it must be known beforehand what data is relevant and what to be extracted. The business analyst must have great knowledge about the source, and know what data he or she needs before making a request to the ETL developer to provide the data. A lot of opportunities are lost, simply because you cannot make discoveries outside the boundaries of the data that is available to you.

ETL processes typically run at a certain interval, e.g. once a day, or once a week. Hence, the data can be analyzed, but the company can only act upon it in retrospect.

AnalytiX DS will improve and accelerate the Data Integration processes by providing unique capabilities for:

Data Integration: Centralised repository for Meta-data, which can be leveraged, to generate design mappings, and automated code generation, maximising your ETL platform’s value
Centralised Data Mapping: Source to target data mapping process across the enterprise and different operational systems – 50% faster, and 100% more collaborative than spreadsheets!
Reports Generation: End to End forward & Reverse Graphical lineage, Impact Analysis, Truncation Alerts, and many more!
Pre-ETL Automation: Auto Generate ETL jobs for the leading ETL tools (Informatica, Data stage, ODI, Talend, and SSIS etc.)
Integrated Dev Environment CATfX: Code Automation Template FrameworX or CATfX – script in Java, Groovy, JRuby, XSLT - output to any format! Instant ROI through automation
Robust Application Collaboration: Web based applications that bring transparency and collaboration across global team members
Application flexibility: Open repository framework that can easily connect with Add on modules and third party (like Modeling, Analysis, BI tools, etc)
Globalisation: Built in Internationalization and multi-language instructions framework.
Security: built in role-based security architecture within the application framework.

Another example is to use the Kafka software to turn the ETL process upside down. Instead offetching the data as an ETL system would do, Kafka “sends” - streams the data continuously, not in intervals, allowing you to create a data transport layer sending more data faster than before to a data warehouse or “data lake” in Hadoop or Cassandra. This gives the data mining analyst the opportunity to make discoveries from larger sets of data. Once patterns have been discovered the company can act upon events in real-time using real-time event processing and analysis tools.

Operators, banks, retailers and utilities know they have an abundance of data that is useful but has not been used thus far. An oil well that has so far been too expensive to be exploited because the price of production is too high. With new technologies it is now possible to make the oil well highly profitable.

Operators have an enormous amount of data created in the network that is currently not used to drive their business. This data can now be transported, analysed on a per-event basis and acted upon in real-time.

The same is true for utilities. Except that for the utilities it is the power grid, monitoring equipment, sensors and smart metering that are creating the data. Previously it has been simply too costly to collect and use this data.

Banks and retailers have the same opportunities with regards to data collected from their transaction platforms and their online self-service platforms. These platforms gives data about purchasing patterns, customer behaviour patterns, risk exposure etc. Banks can create richer data warehouses that can provide faster insights into risk analysis and can be used as a means to achieve compliance. Retailers can become smarter in their promotions and create stickiness to their brands.

However, a number of questions arise. As an analyst, decision maker or as an IT manager I would like to:

Know where the data originates from
Understand how reliable the data is
Ensure that changes in the source systems are catered for in target systems
Version control and track changes in data structures over time
Establish traceability
Follow data lineage between systems in my architecture

United Vanning Consulting is partnering with AnalytiX DS to provide a solution to these needs.

AnalytiX Mapping Manager® Big Data Edition® enables organizations to reduce costs & accelerate project delivery by automating the big data mapping process thereby making it faster, more manageable & collaborative, establishing version control, tractability and automation.

For more information about United Vanning Consulting or to explore how our expertise can help you please contact us at www.unitedvanning.com/en/contact, or, go to our partner's website analytixds.com/

INSIGHTS

Get out of the ETL data-filtering trap with the use of Big Data/Fast data solutions and AnalytiX DS

Recent Posts

CONTACT

© UNITED VANNING 2024