Data pipeline architecture: Building a path from ingestion to analytics
Data pipelines transport raw data from software-as-a-service (SaaS) platforms and database sources to data warehouses for use by analytics and business intelligence (BI) tools. Developers can build pipelines themselves by writing code and manually interfacing with source databases — or they can avoid reinventing the wheel and use a SaaS data pipeline instead. To understand how much of a revolution data pipeline-as-a-service is, and how much work goes into assembling an old-school data pipeline, let's review the fundamental components and stages of data pipelines, as well as the technologies available for replicating data. Data pipeline architecture Data pipeline architecture is the design and structure of code and systems that copy, cleanse or transform as needed, and route source data to destination systems such as data warehouses and data lakes. Three factors contribute to the speed with which data moves through a data pipeline: Rate, or throughput, is how much data a pipeline can process within a set amount of time. Data pipeline reliability requires individual systems within a data pipeline to be fault-tolerant. A reliable data pipeline with built-in auditing, logging, and validation mechanisms helps ensure data quality. Latency is the time needed for a single unit of data to travel through the pipeline. Latency relates more to response time than to volume or throughput. Low latency can be expensive to maintain in terms of both price and processing resources, and an enterprise should strike a balance to maximize the value it gets from analytics. Data architecture is not limited to standard on premise tools alone but have a very potential in terms of successful integration with Cloud services.
Satya RavinuthalaSenior Technology Engineer
State Farm Insurance
As an IT professional, my work has run the spectrum from oversight of full software lifecycle activities to systems architecture to data migration. I have 18+ years of IT experience in the fields of Financial, Insurance, Manufacturing, and Retail and know my diverse skills and broad background will be an asset for the team. Over the past few years, my role has focused on leading Cloud application development support, data enablement and problem resolution. I have worked on a variety of initiatives such as below which have been aligned to Scrum and Agile methodology along with integrating DevSecOps principles.