Hello,
Building a product that will scale out Azure PostgreSQL Flexible Server instances and for which backup availability is a stringent requirement, that I need reporting automation & alerting for.
On my Flexible Server instance at present, all I'm able to see in the "Backup and Restore" page are options for backup retention (1-35days), Backup Redundancy, and Earliest Restore Point - along with a table (name, timestamps, type & actions), detailing however many automated backups I have to restore from.
My searching seems to suggest that Azure publishes no details about these failures, and as such I have no visibility of any issues with this backup service - as it's an Azure "managed service", which you guarantee to work. However, what I really require is visibility & alerting for whenever a backup fails or is not available to me because I have strict reporting & compliance requirements to adhere to ie. if/when a backup is not available. This of course won't be my only backup, but like the other backup options ideally I need to validate and report on them, and for this service I'm surprised this detail seems to be unavailable?
The FAQ for Flexible Server under "Why do I not see a full backup listed for a given day?" says: "Azure Database for PostgreSQL flexible server takes full backups once daily. If a backup fails, our backup service tries every 20 mins to take a backup until a successful backup is taken. You may fail to see your daily backup for a given day if the transactional load on your server instance is high throughout the day."
So although this is a managed service, clearly the backups can fail. My question is, surely there must be detail published, that can be used for automation? Azure CLI is able to grab the backup table details visible in the portal, but it seems nothing deeper than that. With only that detail, I'd need to come up with some pretty hacky to continuously poll that detail and check for backups not occurring. The oldest backup is also cycled out at a time that cannot be controlled, so very quickly you can see this is not an ideal approach. Similarly, there is a metric for 'Backup Storage Used', but much in the same way, this would be unreliable to base any alerts or reporting on for backup availability.
My question being, am I missing something? Surely there is some published detail of a FlexibleServer backup job's status, that could be used to alert when it's unable to complete, fails or completes with issues etc?
This is typical detail for other things managed by Azure Backup. Does it not exist for FlexibleServer? While it may be a backup managed at the Azure layer, it's an extra cost and knowledge of backup success or non-success is paramount to it's purpose. Certainly when one has compliance to deal with.
I seem to have exhausted all approaches with Log Analytics Workspaces and enabling all the logging options available in the 'Diagnostics Options' I see for the Flexible Server, without luck. I can only assume I'm missing something or it's not publicly available at the moment?