SMX Senior Site Reliability Engineer (SRE) (2157) in Helana, Montana
Senior Site Reliability Engineer (SRE) (2157)at SMX(View all jobs) (https://www.smxtech.com/careers/)
SMX is seeking a driven and talented Senior Site Reliability Engineer (SRE) to join our thriving Cloud Services business unit and work with some of the best technologists in the market. Senior Site Reliability Engineers provide senior-level implementation support services and subject-matter expertise to SMX clients on IT consulting engagements. This is a remote position supporting a Herndon, Virginia based team. Using knowledge and experience in technical architecture and systems integration, our Senior Site Reliability Engineers are responsible for assisting with the Technology team deliverables including building of dashboards for monitoring metrics on top tier apps, the continuous build/deployment of automation scripts, and maintaining system configurations across multiple environments hosted on the AWS cloud tech stack. In addition, our Senior Site Reliability Engineers work closely with the delivery teams and SMX clients to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership, and to prioritize the timely completion and delivery of these tasks. This individual will bring a passion for technology, a strong technical skill set, and an ability to deploy, employ, operate, and sustain Production-ready solutions, software, and tools for our customers. Our Site Reliability Engineers have working knowledge of continuous integration models, work directly with leads and program managers and exhibit an overall willingness to contribute to the SMX team. This individual will bring experience in infrastructure and operations automation and will provide hands-on experience implementing cloud-native, and automation-centric solutions to drive operation efficiencies with a strong focus on quality, communication, customer success, and results.
Essential Duties and Responsibilities:
Implement application/infrastructure observability solutions and perform maintenance to ensure desired application availability
Real-time service management inclusive of building monitoring for the golden signal SLIs, establishing, negotiating SLOs with the business, building alerting, creating playbooks and runbooks for services in conjunction with development teams, product owners and support
Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
Handle Cloud Operations (Events, Incidents, and Requests) based on a defined, ticket-driven service catalog.
Provide guidance and leadership to the SRE team
Perform internal team technical reviews
Work with customer and SRE team to identify, develop, deploy, and maintain solutions
Be a primary “face to the customer” during the Manage phase of the customer lifecycle – communicating clearly and concisely to identify, triage, remediate, and resolve infrastructure and solution issues when customer needs are greatest.
Take direction from, and provide clear and timely updates to, Project Lead or Project Manager
Proactively identify potential operations and reliability issues and work to resolve
Identify system or performance issues, and develop resolutions using automation
Identify opportunities for automation and implement them to drive operational efficiency and cost reduction
Implement and maintain backup and disaster recovery solution for customers' cloud computing resources
Optimize existing – and identify new opportunities for – monitoring, logging, and management metrics to improve operational effectiveness and customer knowledge
Participate in troubleshooting of infrastructure and/or application related issues
Produce well-written technical project documentation and operational runbooks
Participate in change management processes
Maintain core working hours but remain flexible to support after-hours maintenance and escalations (as necessary)
Participate as a team player capable of high performance and flexibility in a dynamic working environment
Take ownership of issues and act with high sense of urgency when required
Improve CI/CD tools integration/operations and full automation of CI/testing
Identify and support Continuous Improvement opportunities to increase system reliability
Troubleshoot issues with CI/CD pipeline
Deploy and configure cloud services according to best practice (e.g.: Virtual Machines, Virtual Network, AWS AD, CDN, serverless functions, DNS, Monitor, Key Vault, Blob storage)
Achieve and maintain AWS certifications
7+ years of experience in DevOps or SRE
Proven ability to dissect a technical architecture into engineering plans and discrete tasks
Excellent customer facing skills and the calm professional demeanor necessary to bolster customer confidence when stress is highest
Ability to work collaboratively with customers
Scripting Experience, Kusto Query Language, Arm Templates, PowerShell
Strong skillset with AWS Automation, DevOps Pipeline and related AWS tooling
Collaborate with internal dev team to help end-to-end testing
Solid command of standard CI/CD tools (Terraform, Ansible, Git, Jenkins, etc.)
Solid experience with container-based deployments using Docker, working with Docker images, Docker hub and Docker registries. Installation and configuring Kubernetes and clustering them.
Scripting Experience, Kusto Query Language, Arm Templates, PowerShell
Proficiency and proven hands-on experience with AWS IaaS and PaaS Services, AWS Active Directory, and SQL Server Infrastructure.
Experience with AWS Monitoring, Migrate, Log Analytics, AWS SSM, Load Balancer techniques
Experience in monitoring, metrics collection, and reporting using open-source tools
Depth of knowledge in security best practices, tools, and compliance frameworks (NIST, FedRamp, HIPAA, etc.)
Strong written and verbal communication skills
Degree in a technical discipline or additional 6 years' experience in lieu of degree
Desired Skills / Certs:
BS/BA in Computer Science, Computer Engineering or related field or equivalent technical experience
Current operations experience within a Cloud Managed Services Provider (MSP) delivery environment.
One of more of the following certifications are required:
• AWS Certified Developer – Associate (DVA-C01) • AWS Certified SysOps Administrator – Associate (SOA-C02)• AWS Certified Solutions Architect – Associate (SAA-C03)• AWS Certified DevOps Engineer – Professional (DOP-C01)• AWS Certified Solutions Architect – Professional (SAP-C01)• DevOps Institute: Site Reliability Engineering Foundation (SREF)
Our tradition of delivering innovative, technical solutions dates back to 1995, however, you may know us better by one of our legacy company names: Trident Technologies, Smartronix, Datastrong or C2S Consulting Group. With the support of OceanSound Partners, our private equity investment sponsor, we began operating as one business starting in 2019 and became SMX in 2021. We operate in close proximity to our clients around the globe and have core locations in Alabama, California, DC Metro, Florida, Hawaii, Maryland, and Massachusetts.
Today, as SMX, we are one team and together empower government and commercial enterprises to become more effective, innovative, and resilient, no matter what challenges they face.
SMX is committed to hiring and retaining a diverse workforce. All qualified candidates will receive consideration for employment without regard to disability status, protected veteran status, race, color, age, religion, national origin, citizenship, marital status, sex, sexual orientation, gender identity or expression, pregnancy or genetic information. SMX is an Equal Opportunity/Affirmative Action employer including disability and veterans.
Vaccination within 60-days of hire, or an approved accommodation, is a requirement of the position per Executive Order 14042 (unless precluded by State law). If a candidate is not vaccinated, they may request an accommodation once offered the position, and the accommodation must be granted prior to the employee starting in the position. Candidate will have 60 days to get vaccinated.
Selected applicant will be subject to a background investigation.