PROJECT DESCRIPTION:
advana.io is a pioneering data and DOOH advertising marketplace catering to major brands seeking syndicated data and point-of-sale advertising solutions. Leveraging AWS Cloud Services, I architected an ETL engine to streamline data ingestion and normalization processes. Complemented by a frontend portal, the platform seamlessly visualizes data through Power BI and Azure integrations.
Challenges
Thousands of Data Sources, Unclean Data from Legacy Systems
Reconciling disparate data sources into a unified framework posed a significant challenge. With data flowing from over 30,000 self-service retail kiosks nationwide, we encountered inconsistent and unnormalized data streams. The prevalent use of legacy systems required extensive manual data wrangling, resulting in unclean, unreliable data. Additionally, connectivity and reliability issues underscored the need for a comprehensive overhaul.
Solutions
Streamlined Data Ingestion and Normalization:
Implemented an ETL engine using the Databricks platform, facilitating the seamless ingestion and normalization of data from diverse sources. Leveraged AWS EC2 instances and the Databricks Data Intelligence Platform to ensure efficient data processing and management.
Enhanced Data Processing and Analysis:
Developed custom startup scripts for Linux-based POS machines, enabling near real-time streaming of raw data to internal AWS S3 buckets. Data engineers leveraged the Databricks platform and internal EC2 instances to conduct comprehensive data transformation and cleaning processes, improving data integrity and reliability.
Optimized Data Storage and Accessibility:
Utilized designated AWS S3 buckets and integrated Google Big Query data warehouse for centralized data storage and distribution. Enabled marketing managers to access visualized dashboards on Azure-powered PowerBI and internal data portals, facilitating data-driven decision-making and targeted advertising campaigns.
Improved Security Measures:
Implemented a secure VPN for internal access, ensuring data privacy and security for remote data engineers and marketing managers. Enhanced data protection measures to safeguard sensitive information and maintain regulatory compliance.
Outcome
Transformative Solutions Drive Business Growth and Efficiency
The solutions implemented led to significant improvements in data processing and analysis efficiency. Previously, data that required weeks for extraction, transformation, and loading (ETL) can now be processed within the same day, enabling faster decision-making and responsiveness to market trends. Data engineers now spend more time on value-added tasks such as writing custom functions to transform clean data, rather than investing days in cleaning unreliable data. Additionally, marketing managers gained access to a new suite of tools and data, empowering them to offer clients more valuable insights. This enhanced capability contributed to a significant increase in the company's bottom line, demonstrating the tangible benefits of the implemented solutions.