Anyone have good strategies for monitoring and managing your background tasks? I'm working on an application that makes extensive use of background tasks for data processing and ingestion, as well as handling more user-driven large data manipulation (e.g. CSV exports). We've been experiencing issues including:
A task that has worked well for ages suddenly goes crazy and never "ends", showing up consuming memory and CPU for days (though I can tell the work it was supposed to do completed successfully by examining the DB).
Tasks that were thoroughly tested will hit a data case we hadn't expected, and just error out (unhandled exception style). We have limited ability to re-diagnose these especially with larger input payloads.
If many users submit requests near to each other, the response times for task completion get slow - we have no way to monitor or set expectations about this.
Because there's limited debugging or even logging capabilities for the background tasks, we're struggling to effectively diagnose and solve these kinds of issues. I've been starting to add things like Try/Catch blocks that email the developers details (or save them to a DB table) but am interested if others have employed alternative strategies.