How to create an on-call schedule that doesn’t suck.

Here are some tips on how to build an effective and sustainable on-call strategy that your team loves. Just like doctors go on-call during emergencies to ensure that a patient gets the required medical attention, engineers go on call to ensure that the IT services and infrastructure are working fine and that the customers get the required help when needed.


Antivirus Evasion for Penetration Testing Engagements

During a penetration testing engagement, it’s quite common to have antivirus software applications installed in a client’s computer. This makes it quite challenging for the penetration tester to run common tools while giving the clients a perception that their systems are safe, but that’s not always the case. Antivirus software applications do help in protecting systems but there are still cases where these defenses can be bypassed.


5 Server Monitoring Tools you should check out

You work on your software’s performance. But let’s face it: production is where the rubber meets the road. If your application is slow or it fails, then nothing else matters. Are you monitoring your applications in production? Do you see errors and performance problems as they happen? Or do you only see them after users complain? Worse yet, do you never hear about them? What tools do you have in place for tracking performance issues? Can you follow them back to their source?


Smart Cloud Security: Recover from Cloud-based Ransomware Infections

This is yet another post on ransomware – so we’ll keep this short. But, for cloud-first organizations, this post is an important one. Not only is ransomware hard to detect, but also hard to recover from. Recovery cost, whether or not the organization pays a ransom, is high as well. Often, organizations hit by ransomware have little recourse but to negotiate and pay the ransom and hope they can recover their mission-critical systems and data.


Simple/hard metrics that help reduce MTTR when looking for a root cause

Recently there was a mini-incident in a data center where we host our servers. It did not affect our service after all. And thanks to the right operational metrics, we’ve been able to instantly figure our what’s happening. But then an thought came up to me, how we would’ve been racking our heads trying to understand what’s happening without 2 simple metrics.