Case Study: Serverless Step Functions for Continual Data Retrieval and Processing

ArtsVision

About ArtsVision: ArtsVision is the most comprehensive and powerful planning and resource management application available to the classical performing arts community today.

The Challenge

ArtsVision's platform provides their clients with a robust RESTful interface for securely accessing client data. However, the RESTful interface is not condusive for certain client use cases, e.g. heterogenous reporting, internal SLA access requirements to ArtsVision data, or bulk data analysis and cleansing. ArtsVision clients needed a secure and cost efficient mechanism to receive continual data updates which can be stored within their own RDMS systems.

  • Implement a solution which maintains a low operating costs.
  • Allow the solution to be run continually and only process "new" data changes.
  • The solution must be able to run securely without allowing for the possibility of data compremises.
  • The solution should be auditable and include operational metrics and alerting mechansisms in the event of a runtime failure.

The Solution

Kickstep was able to meet ArtsVision's needs through the use of AWS Services, AWS SAM (Serverless Application Model) Framework and the application of AWS Well Architected Framework principles.

  • Build discrete functional logic components within AWS Lambda.
  • Utilize AWS Step Functions to orchestrate AWS Lambda functions to perform services such as One-Time Full Data Retrieval and OnGoing Replication of Data Changes.
  • Schedule AWS Step Function recurring execution through AWS CloudWatch Events
  • Leverage Amazon S3 for temporary data storage and Amazon Data Migration Services (DMS) for data tranlation and loading from S3 to the clients RDBMS system.
  • Manage all infrastructure and application components through AWS SAM, allowing for one-click deployment of the entire solution stack.

The Benefits

Benefit 1 - Operating as a serverless solution, the recurring monthly operating costs are under $50 per month for even the largest client of ArtsVision.

Benefit 2 - By following AWS SAM best practices, the benefits of a strong DevOps environment are realized, including maintaining similarities between development and production environments as well as eliminating requirements for humans to manually configure AWS infrastructure and services.

Benefit 3 - Building the solution around AWS Data Migration Services allowed the solution to work with almost any client RDBMS environment while also preventing any customization of the application logic per install.

Solution Architecture Overview - One Time Data Load

1 - The step function is initiated either through the AWS CLI or through the AWS console.

2-4 - The step function is supported by a series of NodeJS 10 Lambda functions which querty the 3rd party REST server to:

  1. Determine what domain entities are avaiable
  2. Determine what the schema is for each of those entities
  3. Retrieve a fixed number of data rows from the REST server
  4. Determine if more rows are available to be retrieved
  5. Write an appropriatly formated JSON file to S3 for eventual DMS import
  6. Create a "last ran" file for each enttity type to be used by the CDC (continual data collection) step functions

5 - Perform a one time full import from an S3 source to an RDS SQL target. Each entity has it's own folder in S3 and it's own source endpoint in DMS.

Solution Architecture Overview - Recurring Time Data Load

1 - The CDC step function is initiated either through the AWS CLI, through the AWS console or through a time-based event in CloudWatch.

2-4 - The step function is supported by a series of NodeJS 10 Lambda functions which query the 3rd party REST server to:

  1. Determine what domain entities are avaiable
  2. Check the CDC folder for each entity and get the associated meta file
  3. Retrieve a fixed number of data rows from the REST server, supplying the 'last modified' query input from the value stored in the meta file.
  4. Determine if more rows are available to be retrieved
  5. Write an appropriatly formated CDC JSON file to S3 for DMS CDC import
  6. Update the CDC (continual data collection) meta file for each entity

5 - The DMS tasks is configured to perform CDC functions using the entity's CDC folder in S3 as a source and the SQL RDS environment as the target.

About Kickstep Technologies, Inc and Amazon Web Services

Kickstep Technologies is a leading technology firm providing cloud consulting, software development, data analytics and machine learning services. We use our experience to ensure clients have the best technical solutions to solve their business challenges and deliver value for their organization. We are fully committed to the success of our clients and commit to always putting their needs and interests before our own.