The complete guide to error monitoring and crash reporting
Software bugs are frustrating for everyone. End users lose patience and leave, developers struggle to reproduce errors, and businesses lose customers without even knowing why.
This is where error monitoring software (or crash reporting software, depending on your vocabulary) enters the picture. This kind of tool is an increasingly essential item in any developer’s toolbox, improving results for their customers, their business and themselves.
In fact, we originally built Raygun as an internal tool to monitor for errors in an earlier software product. It quickly became indispensable for our entire team, and we realized that we might not be the only ones who’d want to use this game-changing tool. One of the most striking revelations from deploying Raygun Crash Reporting into our own products was that only about 1% of our customers reported an issue.
Let that sink in: Only one in every hundred of our users who had a bad experience would report it! It’s no wonder so many teams underestimate the impact of bugs on their customers and their own growth - we’re all flying blind.
What is error monitoring?
I’ve already gotten ahead of myself though. What is error reporting? Is it different to crash reporting?
Let’s start with the terminology. In a classical sense, an error means something went wrong, while a crash means that it went so wrong that the whole application or system crashed. Crashes, obviously, are bad. However, they’re also less frequent than they were in the 90s and early 2000s - it’s more common now for something to just not work quite as intended.
Many developers still use the terms error reporting/monitoring and crash reporting interchangeably, but in reality, it remains important to have a solution that will track both types of fault. Getting full visibility of errors across your whole stack is fundamental to understanding the quality of your entire software solution.
Why do you need error monitoring and crash reporting?
Error monitoring and crash reporting solutions aren’t just helpful or illuminating, they’re necessary. Here’s why:
1. You don’t know what you don’t know
I was shocked to find that only 1% of our customers reported errors and crashes, but it turns out that’s not unusual. Without monitoring, plenty of teams assume that they don’t have a quality problem at all. Helping thousands of businesses improve their software quality has led us to an uncomfortable truth: most of us significantly overestimate the quality of our software.
Fortunately, once you have visibility you can take action. It may be a little humbling to begin with, but it’s far better to know where you stand and improve on it.
2. Getting users to explain how to reproduce an error is hard
Trying to describe a bug is hard enough for any of us, let alone a non-technical end user who was just trying to get something done. If we’ve already delivered a poor experience, asking the customer to then tell us how to recreate it is rubbing salt in the wound.
Most modern error and crash reporting software will capture enough detail (including the actions users took leading up to the error) to help you recreate the issue with ease — and with more precision than a customer could ever provide.
3. Poor software quality damages, or destroys, businesses
Obviously, a developer’s job is to create excellent software solutions for customers. However, when the software isn’t great, the impact extends far beyond the dev department and throughout the whole business.
For example, take an error that prevents a customer from paying. That error is costing the business every time it occurs, and ultimately reducing business performance.
Or consider a bug in the onboarding flow of a SaaS application. That one bug will slow down the process of turning a lead into a paying customer, requiring more assistance from onboarding and support teams, or even causing customers to go with a competitor. This drives up the Cost to Acquire a Customer for the business, and again, reduces business performance.
Software quality is paramount to efficient growth. Addressing bugs that block key customer activities is also an excellent way to demonstrate the power of development to the rest of the business, showing how eliminating errors can boost metrics like customer engagement and conversion rates or reduce support tickets. Proving the importance of error resolution can even help get additional resources allocated to maintaining and improving quality.
4. Great software is becoming table stakes
When was the last time you got an error trying to search Google? Ordering an Uber? In iMessage?
Bugs will always be a fact of life in the software world, but the highest-performing businesses know that smooth software experiences are increasingly becoming a minimum expectation. The competition is only a few clicks away and word of mouth has a silent but significant impact on business performance. Users are more likely to talk about a bad experience than a good one, meaning that poor software quality can hamper any business regardless of brand or size.
What metrics do error monitoring tools report on?
There’s plenty of data to go around, but what are the metrics that are vital to creating great customer experiences? Here are several of the key measures of software quality that you need to track and understand.
Number of users affected by bugs
The actual volume of errors is less meaningful than the number of unique users who have experienced errors. Why is this? Well, one user could generate 100 relatively minor errors that have little effect. However, if 100 users had one error, but that error was preventing a payment, the impact is far greater.
Even if you’re not triaging bugs by their nature, optimizing for customer impact is a smart default. When working on quality, making improvements for the highest number of users is a sensible rule of thumb.
Comprehensiveness of error reporting
While not technically a metric, this one will throw off your metrics if you’re not careful. In an era of cheap computing and storage, it’s worthwhile checking that the crash and error reporter you’re using is capturing all errors. Many will simply sample errors instead of presenting the full picture. While this can help to reduce costs, it also masks the true extent of issues and can lead to poor prioritization decisions. A relatively minor saving in expenses is far outweighed by the potential cost of letting errors remain undetected.
Trending in the right direction
Having instance-level crash data is fine when resolving a bug, but you also need to have a 30,000-ft view of the app. A high-level dashboard that highlights whether error volumes are trending up or down is important for checking the health of your application at-a-glance.
Error volume by version
Not all deployments are major, especially in mobile. You may deploy a new version that resolves all issues, but with old versions still in use, it can be difficult to understand what the crash % is without insight into the particular versions being impacted.
Benefits of error monitoring and crash reporting for web and mobile applications
For web applications
Here’s why error monitoring is increasingly obligatory for websites, applications, and single page applications.
1. JavaScript errors occur on the user’s machine
JavaScript runs inside the user’s browser. If an error occurs, it gets logged to the console of the user’s browser, and… that’s it. This is why many software teams are in the dark, with zero possible awareness of those bugs.
This is where JavaScript error reporting is so valuable - with a simple script reporting back those errors to the development team, they can quickly resolve issues and deploy fixes without the customer having to take any action at all.
2. Single Page Apps are increasing the logic residing on the user’s machine
Adding to the lack of visibility into JavaScript errors is the building trend to shift more and more application logic to the front-end. This means more code in JavaScript and significant complexity moved from the server to the customer’s browser.
This is only accelerating the need to understand where faults lie in the code that goes to the customer.
3. Don’t forget the backend of your web app
While it’s clearly vital to track JavaScript errors, it’s equally important to consider the whole application. This means server-side monitoring and frontend. It also means considering all the other support services, workers and moving parts of a modern web offering. For example, let’s say you run an ecommerce site. Monitoring the JavaScript and the primary web application is sensible, but you can’t lose sight of the supporting cast of software: the tool that emails customer receipts, the search indexing task that runs nightly, the payment processing job, or the Wordpress blog that uses PHP error monitoring to report issues. There are always significantly more bits of software supporting a web application than is immediately obvious.
For iOS and Android apps
1. You can’t easily look on the device
While there’s some basic reporting from the app stores, when something fails on a user’s phone, it’s impossible to see it without some form of error monitoring and crash reporting solution.
Modern error monitoring solutions are smart about how they report on mobile devices. For example, mobile devices do not necessarily have an internet connection 100% of the time, so a modern error reporting solution needs to store reports for transmission when connectivity is restored, so you don’t miss a single fault.
2. Hyper-competitive app store
It’s well known that mobile apps live and die based on how they rank in the app store. Even when customers search you out, they’re offered directly competitive offerings alongside your own. And don’t forget, 53% of users will abandon an app that produces crashes, freezes or errors.
If your users are having a bad time, your ratings will suffer and your users will start looking for alternatives, and your competitors in the app store are on hand to bring them onboard.
3. Complex system errors are the hardest to reproduce
Software is constantly becoming more complex, and a rich mobile application may now have extensive server-hosted services that they depend on. To understand a system entirely, you need to capture error diagnostic information across a complex set of steps.
For example, tracking a network call from a mobile application that returned an error and being able to track that error down to the detail of why it failed on the server is critical to diagnosing issues quickly.
Modern error monitoring and crash reporting solutions will be able to monitor all parts of your stack, across networks, and provide your team with the detail to move quickly, despite the complexity.
4. Symbolification makes things more complex
Symbolification is a process of taking memory addresses and turning them back into human-readable information. Machines might love zeros and ones, but humans do not.
If you’re using an Android crash reporting tool, you’ll want to ensure it supports the modern toolchains used by Android developers. That means built-in support for Proguard, and the ability to automatically reverse-engineer machine code to human readable code. This dramatically reduces the time a developer needs to understand a bug, and helps get a resolution out faster.
Similarly, if you’re looking for an iOS app crash reporting solution, the key word you’re looking for is ‘dsym processing’. A dsym file (pronounced ‘dee-sim’) includes the debug information that makes it possible to reverse out the machine addresses to a human readable version. Make sure that the iOS app crash reporting solution you choose supports dsym processing to help your teams move much faster.
Best practices for using error monitoring and crash reporting
The benefits of error monitoring and crash reporting can extend well beyond the technical, to boost productivity and create unity of purpose between product and dev teams. To get maximum impact from your chosen monitoring solution, you’ll need to integrate your tools into your standard development practices and processes. Here are the most successful ways we’ve seen our customers implement error monitoring.
Notifications to where you work
Business messengers like Slack have overtaken email as the collaborative place for teams needing to work in real time. Integrating crash reporting into your Slack, Teams or other chat tool is a great way to ensure the team sees issues and can discuss them in real time. Full visibility and collaboration helps teams be aware of and responsive to issues.
"We use Slack for our internal team chat and have Raygun error notifications appearing in a dedicated Slack channel. This means we can be alerted of errors and discuss potential causes and fixes right within Slack. It really does make error investigation and diagnosis a breeze.” - Timely
A bug or two per sprint
However you work, find the opportunity to resolve at least one bug (or Error Group in Raygun terms). Some of the largest brands in the world use Raygun, and it can be daunting to discover thousands and thousands of bugs - where do you begin? Fortunately this is where triaging and understanding customer impact plays a hand. Simply sorting by the affected users and assigning the top one or two into each sprint will have a dramatic impact on customer experience. One Raygun customer reported a 90% reduction in faults being experienced by their customers within a couple of months.
Review errors as part of rituals
Any tool, regardless of purpose, is only effective if it’s baked into how you work. We recommend a short triage of errors as part of your standard rituals, as a mechanism for ensuring the team has an eye to quality. A short review to see anything new and discuss is a great way to avoid being surprised by issues.
“The best thing we’ve found about Raygun is the tight integration with GitHub, making it very easy to integrate it into our workflows.” - Cloud9
Why choose Raygun Error Monitoring and Crash Reporting?
Good news: the challenges presented above can be easily tackled with one simple solution - namely, Raygun.
Raygun’s error tracking software works by integrating a small SDK and setting up a few lines of code, and you’re all set. Like a blackbox flight recorder, it sits quietly in the background, but when something goes wrong it collects the relevant information, sends it off to Raygun for analysis, and alerts you about the issue.
While you’re probably pretty informed on the core mechanics of error reporting, it’s often the other capabilities that make a big difference. Things like:
1. Easy set-up
Getting Raygun Error Monitoring and Crash Reporting installed is incredibly easy - we just provide a code snippet to embed in your application. You can find step-by-step instructions for installing Raygun through our language guides. Once you’re sending data, you can learn how to use Raygun features with our product guides.
2. First-class support for multiple languages and platforms
Software solutions are increasingly written using multiple programming languages – some parts on mobile, some on serverless. While there are solutions that focus on just a single programming language, Raygun is a truly modern error reporting solution that will work across your broader team, regardless of programming language or platform choice, to support the complexities of modern coding.
3. Affordable, usage-based pricing
Raygun’s straightforward on-demand pricing means you only pay for what you use. All plans come with unlimited dashboards, deployment tracking, and user experience monitoring.
4. Dashboards
While software teams benefit immensely from the visibility and reproducibility of errors, software isn’t something to be kept to the engineering team. Modern tools like Raygun provide custom visualization to share quality information across teams, from technical to non-technical users. Dashboards are a great way to give executives and managers visibility over software quality, without overloading them with detail.
Wrapping up
User expectations continue to rise relentlessly, making error monitoring and crash reporting more and more of a prerequisite to meet (and exceed) industry standards. It’s not so much a matter of if you need this kind of solution, but which one. It goes without saying that we recommend Raygun’s industry-leading, full-visibility Crash Reporting - but we’d encourage any development team to take this guide into account when selecting the solution that makes sense for your team, your practices and your goals.
See how Raygun makes development easier for our customers, or dive deeper into the product