5 Cron Job Failures That Will Wake You Up at 3 AM
Every one of these stories is based on real incidents. The names have been changed, but the cold sweats were real.
1. The Phantom Backup
The Story
Marcus set up automated PostgreSQL backups six months ago. The crontab was clean, the script was tested, and the backups ran faithfully to S3 every night at 1 AM.
Then the company upgraded from Ubuntu 22.04 to 24.04. The upgrade reset the crontab for the postgres user. No backup ran for 17 days until a disaster recovery drill revealed empty S3 directories.
The Fix
Heartbeat monitoring. If the backup script doesn't ping a monitoring endpoint after completion, alert immediately. Marcus would have known within hours, not weeks.
2. The Overlapping Stampede
The Story
A data pipeline job was scheduled every 15 minutes. Normally it took 3 minutes. One day the upstream API slowed down, and the job started taking 20 minutes. Multiple instances started overlapping, each consuming more memory and database connections.
The Fix
Use a lock file or flock to prevent overlapping. Monitor job duration alongside completion. CronPulse's ping history shows when runs take longer than expected.
3. The Timezone Trap
The Story
A billing job ran at 0 9 * * * — 9 AM server time. When the team migrated to a cloud provider in a different region, the server timezone changed. The billing job now ran at 9 AM UTC, which was 2 AM Pacific. Customers received invoices in the middle of the night.
The Fix
Always use UTC in crontabs and convert to local time in your application. Monitor that jobs run at the expected times.
4. The Silent Permission Error
The Story
A cleanup script ran as root for two years. A security audit recommended running it as a service account. After the change, the script silently failed — it couldn't read the directories it needed to clean. The cron daemon logged "Permission denied" to a file nobody was watching.
The Fix
Only ping the monitoring endpoint after successful completion: ./cleanup.sh && curl https://cron-pulse.com/ping/ID. If the script fails, the ping never fires.
5. The Dependency Chain Collapse
The Story
Job A exports data at midnight. Job B processes it at 1 AM. Job C generates reports from it at 2 AM. One night, Job A failed silently. Job B processed stale data. Job C generated wrong reports. The CEO presented incorrect numbers at a board meeting.
The Fix
Monitor each job independently. When Job A fails, you catch it immediately instead of discovering the problem downstream. Chain your monitoring: if A doesn't ping, you know B and C are also affected.
Prevention Checklist
- Every cron job gets a heartbeat monitor
- Only ping on successful completion
- Set grace periods appropriate to job duration
- Use lock files to prevent overlap
- Always use UTC in crontabs
- Test monitoring after server changes