Shannon Brady is a Site Reliability Engineer at Google working on gLinux, Google's internal Linux distribution based on Debian Testing. She has a long background in working on Linux tech support at organisations both small and large, including Google's internal tech support team.
She has a passion for making Linux approachable and accessible for everyone of all technical skill levels.
Shannon is an avowed cat-person. When she's not working on all things Linux, she can be found off hiking in the mountains.
Offering Linux Support to a large user-base can be a daunting task for SysAdmins who maintain a diverse fleet of user-facing Linux laptops and desktops. There is only so much time a team in charge of maintaining a linux distribution (“the distro team”) has to devote to issues brought to their attention (also called “escalations”). Making linux support scalable can be divided into two categories: solving issues before they even get raised to the distro team, and making sure that the distro team can quickly start working on the issues that are escalated to them.
The gLinux Team maintains Google’s internal Linux distribution based on Debian Testing, supporting a large fleet of machines, composed of diverse hardware. This talk will be given by a current member of the gLinux Team, and will cover some of the tools, strategies and policies that gLinux has implemented over the years to provide scalable Linux support internally. While some of the lessons learned might apply only to large organisations, the goal of this talk is to give SysAdmins with fleets of any size actionable takeaways to improve Linux support in their specific situation.
In this talk we’re going to cover the following:
Preventing Escalations Before They Happen:
- Making Your Machines “Self-Healing”: automating fixes for the most common breakages.
- Flag Common Issues: reduce the amount of troubleshooting necessary by detecting and flagging common issues.
- Train Support Techs on Linux: Ensure all techs have a good baseline understanding of Linux, including hands-on training.
- Document Common User Journeys: Think like an end-user, what would users need to do on their machines?
- To Root or Not to Root: The more permissions your users have, the more likely it is that they can break their machines.
Ensure The Escalations You Get Are Actionable:
- Automate Log Gathering: Make it simple and automated for users and techs to get the correct logs and system information.
- Have A Clear Escalation Policy: Define the scope of issues dealt with by the support team and by the systems administrators. Document the escalation path for users and support.
- Communicate Breakages with Users and Support: Ensure users and support staff are aware of outages and issues that affect them, with workarounds if available.