OpenBMC design process – Event Logs

Overview

Event logs page is one of the most visited pages in OpenBMC. Customers use the logs to troubleshoot any problem in the system. It was critical for us to get the page right.

Any time I look at Event Logs, it is reactionary due to a HW failure or a Call Home alert. I am correlating, so the more details, the better.

–  Enterprise & OpenPower customer

Roadmap and sprint planning

I attend the roadmap and backend design meetings where we talked about the importance of the event logs feature. I found out that we have this feature in our existing build but it is not intuitive at all to meet users need. So a story was created in our github board. During the sprint planning meeting, I was excited to pick up the story knowing how impactful this feature is for our application.

OpenBMC GitHub board

Feature discovery

In my feature discovery process I started with looking at the existing experience. I also looked at what our competitors are providing. Then I arranged meetings the systems matter experts (SMEs). We are moving into a new backend API- Redfish. It has it’s own limitations and advantages. From SMEs I understood what is the output we have now and how we can enhance/add features if needed.

Existing OpenBMC event logs page

I noted the following pain points after talking to the users who used our app frequently.

  • Difficult to scan
  • Unable to sort events
  • Unable to mark event as resolved
  • Limited batch operations
  • Missing key troubleshooting information

Sample Redfish Error Log

Talking with the SMEs, I identified the new features I can design for. These were-

  • OEM extensions are used to pull out key fields needed for service from the PEL. 
  • A link to the PEL is provided in the links section. The HMC will use this to download the PEL associated with the redfish log.
  • A new descriptive message for the Log entry appears in the Message field.

Low fidelity mockup

The next step was to develop a low fidelity mockup based on the information I gathered on the discovery phase. I kept the mockups fairly vague just to spark conversation and understand how users might use the new concept.

Low fidelity mockup

I wanted to improve the following with the mockup.

  • Better scannability for the overall event log page. More logs are visible per page now. 
  • New sorting and filtering feature was added.

I also wanted to find the following information from the users-

  • What do they expect in the content where they see heading like message ID, status and description.
  • What kind of filtering they would perform and find out the use case behind those.
  • How would they want to find out details on a particular log.
  • What kind of batch operations would be useful to them and how would they perform such operations.

High fidelity mockup

Based on user feedback on low fidelity mockups, I created a little more detailed version of the mockup which I iterated on over and over. I solved issues introducing  sorting, custom pagination, clear search and range filtering.

High fidelity mockup for user testing

But the problems we still had to resolve were-

  • The hidden date filter did not work for our users as they use that pretty often.
  • Batch operations were not clear.
  • Severity colors were not prominent enough to quickly distinguish between events.

Working version 1.0

I addressed the date filter issue by making it more visible. Also I made the “Severity” section easier to spot and more accessible. I created custom pagination which users can configure based on their environmental needs.

A version that addresses the issues we found