Stop Making Users Wait: Why Your API Needs Background Workers
How to move long-running API work to asynchronous workers with Celery so user-facing endpoints stay fast, resilient, and scalable.
If your API does OCR, report generation, AI inference, or heavy data transforms inside a request-response path, users are waiting too long.
The fix is architectural: accept quickly, queue work, process asynchronously.
The Synchronous Trap#
Long operations inside request handlers create recurring failures:
- Timeouts at client/load-balancer boundaries
- Thread/process blocking under concurrent load
- No reliable retries on transient failures
- Duplicate requests when users refresh
A 2-minute task does not belong in a 30-second HTTP window.
Task Queue Pattern#
The async pattern is simple:
- API validates request.
- API stores task metadata.
- API queues job and returns immediately.
- Worker executes job out-of-band.
- Client polls status or receives callback.
Minimal Celery Setup#
from celery import Celery
celery_app = Celery(
"worker",
broker="redis://localhost:6379/0",
backend="redis://localhost:6379/1",
)
@celery_app.task(bind=True, max_retries=3)
def process_document_task(self, document_id: str):
try:
document = get_document(document_id)
result = analyze_document(document)
save_result(document_id, result)
except Exception as exc:
raise self.retry(exc=exc, countdown=60)Queue from API:
@router.post("/upload")
async def upload(file: UploadFile, db: AsyncSession = Depends(get_db)):
document = await save_file(file, db)
process_document_task.delay(str(document.id))
return {"id": str(document.id), "status": "queued"}Run worker:
celery -A app.tasks worker --loglevel=info --concurrency=4Production Benefits#
- Responsive UX: immediate acknowledgment
- Independent scaling: workers separate from API pods
- Fault tolerance: retries and re-delivery
- Resource isolation: CPU-heavy jobs away from request path
- Prioritization: queue classes for critical work
Trade-offs to Plan For#
Background workers add distributed-system complexity:
- More services to run (broker/result store)
- Visibility needed across queue and worker states
- Idempotency required to avoid duplicate side effects
For very small apps, lightweight alternatives can be enough. For reliability at scale, Celery remains a practical default in Python ecosystems.
Implementation Checklist#
- Add idempotency keys
- Add retry/backoff policies per task type
- Store task status in DB for frontend polling
- Add dead-letter strategy for poisoned messages
- Track queue depth and worker lag in monitoring
Related Reading#
Written by
Niteen Badgujar
AI Engineer specializing in Agentic AI, LLMs, and production-grade machine learning systems on Azure. Writing to make complex AI concepts accessible and actionable.