AWS migration from local data warehouse

0

I work at a small business where our data warehouse is built locally and synced using Dropbox for shared access.

Current architecture:

  1. Data Collection: Python & R scripts to collect data from various marketing and sales channels. For context, we use social media APIs like (tiktok business API, facebook graph API) to collect ads data. Amazon Ads and Selling Partner APIs to collect amazon data and various other API integrations to get the data on our local system and store in a .csv format (flat file)
  2. Data Aggregation & Transformation: Once the source data has been we transform the data as needed
  3. Combine relevant data from all source in master csv files. These files combine data from individual data sources (TikTok, Meta, Amazon, Ecommerce Data etc. ). For example, one file will contain ads data aggregated from all social platforms. We have hundreds of these master files that we later use for reporting in Excel.

Our data warehouse consists only of csv flat files.

Now I want to migrate our entire architecture in AWS, may be like creating lambda functions for data collection on a daily basis. Store in a much more robust format may be like a database. Just wondering how would our solution be built in AWS.

Thanks in advance for your advice.

1 Answer
1
Accepted Answer

Many possibilities, for example this one:

The data collection may be done using AWS Lambda functions indeed (perhaps using step functions for orchestration), and AWS Glue for transformation. Keep in mind the limitations of the AWS Lambda though (like 15 minutes maximum run time).

The resulting data doesn't absolutely have to be in a database - what's frequently done is storing in S3 instead as structured data, for example as Parquet files (Glue can also be used to do that by the way). This will be cheaper than a database and very durable. Once on S3, the data can be queried using Athena or other mechanisms, both AWS (Quicksight) or external.

Regards

AWS
answered 2 months ago
  • Thanks! I will test this out on a smaller scale. Although some of the scripts do need more than 15 mins of execution time, but it wouldn't be roadblock atleast in the initial stages.

  • @AWS-ADolganov "I was able to test syncing between AWS S3 and Dropbox, and it worked fine. However, I'm wondering if AWS is the best approach going forward.

    Currently, all of our local files are synced with Dropbox, and developers manually run scripts to keep things updated. Moving to AWS would automate this process, but we would still need to sync S3 with Dropbox and then Dropbox with the local system for shared access.

    Are there other alternatives to AWS that would allow us to automate the synchronization of local files without requiring physical presence? Specifically, I'm looking for a solution where the scripts can run automatically and keep the data up-to-date, even if all developers are on vacation."

    Are there any other alternatives to AWS?