Queue processing best practice

  1. Let's say I have a db of 100k records (users).
  2. Every hour I need to pull records from that db that match specific criteria. At some hours this will be 5000 records at a time.
  3. Then I need to process those 5000 records in under an hour's time so that it doesn't overlap with the next batch of data.
  4. Processing each entry takes about a second (a couple of external API calls). So this is 5000 seconds which is 83 minutes and already more than an hour.

What would be the most efficient way to speed up the process?

What I am doing right now is faking multi-threading:
1. I am running a task every 30 seconds that gets 3-4 pages of the records, 60 entries per page.
2. I've created an endpoint that calls the processing function.
3. I hit that endpoint via a lambda function that has a timeout of a few seconds so that the task continues to run and initiate other 'threads'.

This is working but the result of this is a downgraded performance of all endpoints and functions. Even when the db/API load is below 50%.

Any suggestions? Thanks.

Other
7 replies