AWS Native Serverless Data & ML Pipelines Implementation

About the client

The client has been building a data platform for data science use cases. The goal was to ingest vast amounts of data from disparate sources, transform and enrich ingested data for unified representation, as well as provide the result to machine learning models for training and prediction.

The data flow comprises the following components:
– Ingestion Layer: The set of tools and pipelines that enable data acquisition from different sources, such as relational databases, REST APIs, semi-structured file formats, studies, and other documents.
– Data Lake: The centralized storage for all data assets with complete governance including inventory, provenance, access control, and audit.
– Batch Layer: The set of pipelines that cleanse, format, enrich, and label data for further ML model training.
– Feature Store: The single place to keep, curate, and serve features to machine learning (ML) models.

Challenge

The client had high costs for development, deployment, and, most importantly, operation of the data platform including Data Lake, Ingestion, and ML Pipelines. The pipelines were mostly running in EC2 instances, which led to the increased cost of operations and required a significant amount of time to deploy and test the pipelines in lower environments.

Solution

The proposed solution split the monolith ingestion service and ML data preparation pipelines into several small, autonomous serverless functions.

The first stage is the Ingestion Layer where custom-developed AWS lambda functions ingest data from different sources and drop it to the Data Lake as unstructured data. Then, the set of custom AWS Step Functions retrieve unstructured data from the Data Lake and transform it into a format that could be consumed by the ML model.

High-level solution diagram

Technologies

AWS Lambda, AWS Step Functions, Terraform, GitLab CI.

Result

The solution allowed to significantly reduce the cost and time spent on development and deployment. It was achieved by introducing infrastructure as code that quickly and seamlessly deploys infrastructure on multiple environments (Dev, Stage, Production). Besides, during the implementation, the team developed the set Lambda and Step function which triggers data processing by the event that excludes cases where EC2 instances are idling and created cost-efficient Data Lakes.

clients’ Success stories

Infrastructure & Devops Services for Fintech Product

The client experienced the need to automate the deployment process of microservices and simplify the release process. One of the requirements was to migrate Docker containers to AWS ECS and move existing infrastructure to Terraform.

View success story

Cloud & Devops Services for Cloud Native Product

The client had a manual multi-environment infrastructure deployment аnd no continuous integration & continuous deployment automation. That significantly impacted the speed of product development, release, and delivery cadence of new application versions in general. Besides, it required a considerable part of the development team capacity to process it manually.

View success story

Cloud & Devops Services for Real Estate Product

The client faced the need to automate multi-environment continuous integration and deployment process of microservices, move the front-end part to CDN as well as deploy multi-environment infrastructure as a code.

View success story

Serverless Deployment Automation

The client came with a request to automate and unify the deployment process of serverless applications on AWS Lambda. Having manual deployments before, the customer was facing inconveniences and difficulties, including non-uniform environment setup (versions of Serverless, Python, Node, and so on were inconsistent) and inability to control the environment changes in one place.

View success story

Data Science Infrastructure

The client needed to strengthen collaboration within the data scientists team by moving research capabilities into cloud workloads. Besides, they wanted to automate and unify the deployment process of AWS resources. That, in turn, would decrease the time and effort required for a team of data scientists to build and test their models.

View success story

CI/CD for Cross Platform Mobile Application

CI/CD (Continuous Integration/Continuous Deployment) offers a seamless bridge from development to deployment. Further, in the evolving world of mobile app development, efficiency and adaptability are key.

At Matoffo, we specialize in customizing CI/CD solutions that cater to cross-platform mobile application development. We focus on improving the technical workflow and enhancing our client’s overall business performance and user experience.

View success story

OUR FOCUS IS
CLOUD SOLUTIONS

Our primary goal is value delivery for the client by resolving technical challenges and assisting them to achieve their objectives. AWS cloud solutions are great toolset which we use to make it happens, primary by outstanding agility, where companies could scale up cloud resources as needed and significantly decrease time from idea to market as well as seamless adopt rapidly growing tech capabilities to transform their businesses to be competitive in the market.

Tell us about your business needs

    I agree to periodically receive relevant information, news, and event notifications from Matoffo.

    Our website uses cookies to help personalize content and provide the best browsing experience possible. To learn more about how we use cookies, please read our Privacy Policy.

    Ok