Senior Site Reliability Engineer

Eliassen Group

Concord, CA, USA

Published: 6/14/2022

Technology

Full Time

Job Description

Job DescriptionDescription:

**Hybrid | Concord, CA**

We are seeking a Senior Site Reliability Engineer (SRE) to join our Digital Platform Engineering team and play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications. This role supports our 24x7 production environments and is instrumental in driving zero-downtime operations across both containerized and VM-based workloads.

In addition to supporting day-to-day operations, this engineer will be a key contributor to two major transformation initiatives:

- A large-scale migration from Tanzu Application Service (TAS) to RedHat OpenShift, requiring deep expertise in container orchestration, traffic management, and workload optimization.

- A major migration from legacy datacenters to next-generation datacenter environments, involving modernization of infrastructure, deployment strategies, and operational readiness.

The ideal candidate is a seasoned engineer with deep expertise in Java application performance, Kubernetes, distributed systems, and observability, and is comfortable operating in complex, hybrid environments.

Due to client requirement, applicants must be willing and able to work on a w2 basis. For our w2 consultants, we offer a great benefits package that includes Medical, Dental, and Vision benefits, 401k with company matching, and life insurance.

Rate: $70 - $80 / hr. w2

Responsibilities:

· Production Support & Escalation: Serve as a senior escalation point for Platform Engineers, providing expert troubleshooting for complex production issues.

· Java Application Performance: Diagnose and resolve JVM-related issues including heap sizing, garbage collection tuning, thread management, and performance optimization.

· Container Orchestration: Manage and optimize high-volume, enterprise-grade RedHat OpenShift or Kubernetes clusters for high availability, scalability, and fault tolerance. Ensure production readiness and operational excellence across complex, multi-tenant environments.

· VM-Based Environments: Support applications running on RedHat Enterprise Linux (RHEL) hosted on virtual machines, ensuring seamless integration with containerized workloads.

· Cluster Resource Management: Configure and monitor Kubernetes namespace quotas, Horizontal Pod Autoscalers (HPA), health probes, and overall cluster capacity. Apply a strong understanding of FinOps principles to optimize resource usage and manage infrastructure costs effectively.

· Traffic Management: Configure and support load balancing technologies such as F5 and AVI Networks, including Global Traffic Management (GTM/GSLB) and Local Traffic Management (LTM).

· Service Mesh: Implement and manage Istio or similar service mesh technologies, including gateway configuration, traffic routing, and observability.

· Monitoring & Observability: Design and implement robust monitoring and alerting solutions using tools like AppDynamics, Elastic, Kiali, Splunk, Prometheus, and Grafana.

· Distributed Tracing: Use distributed tracing tools such as Splunk Observability or Elastic APM to troubleshoot performance bottlenecks and latency issues across microservices.

· Dashboard Creation: Build and maintain dashboards that provide actionable insights into system health, performance, and reliability using tools like Grafana and Splunk.

· Migration Support: Play a key role in the migration from Tanzu Application Service (TAS) to RedHat OpenShift, ensuring continuity, performance, and reliability throughout the transition.

· Datacenter Modernization: Support the migration of workloads from legacy datacenters to next-generation datacenter environments, contributing to architecture design, deployment strategies, and operational readiness.

· OpenShift Onboarding Acceleration: Identify and eliminate friction points in the onboarding process for RedHat OpenShift, automate repetitive tasks, and implement efficiency improvements to accelerate team adoption and reduce time-to-production.

· Performance Testing & Tuning: Design and execute performance tests to validate application behavior under load. Analyze results to ensure applications are properly sized and tuned before deployment to production environments.

· Incident Response: Lead incident response efforts, conduct root cause analysis, and implement long-term fixes to prevent recurrence.

· Training & Knowledge Sharing: Conduct training sessions and facilitate knowledge transfer on troubleshooting techniques, operational processes, and best practices to upskill team members and improve overall system reliability.

· Standards & Risk Mitigation: Help define, report on, and enforce operational standards that promote system reliability, reduce risk, and ensure consistency across environments. Collaborate with teams to drive adoption of best practices and improve overall platform resilience.

· Collaboration & Documentation: Partner with Platform Engineers, Developers, and other stakeholders to ensure smooth deployment and operation of Java-based applications. Maintain clear documentation for systems and troubleshooting procedures.

Experience Requirements:

· Experience: 5+ years in Site Reliability Engineering, DevOps, or Infrastructure roles.

· Container Platforms: Hands-on experience with RedHat OpenShift or Kubernetes in large-scale, highly available enterprise environments. Experience must go beyond lab or small-scale setups and include real-world production deployments supporting mission-critical workloads.

· VM Environments: Experience supporting workloads on RHEL running on virtual machines.

· Java Expertise: Strong understanding of JVM internals, garbage collection strategies, and performance tuning.

· Traffic Management: Experience with F5, AVI Networks, GTM/GSLB, and LTM configurations.

· Service Mesh: Hands-on experience with Istio or similar technologies.

· Monitoring Tools: Proficiency with AppDynamics, Splunk Cloud, Splunk Observability, Prometheus, Grafana, or similar.

· Distributed Tracing: Experience using tools like Splunk Observability or Elastic APM for troubleshooting distributed systems.

· High Availability: Proven experience supporting highly available, distributed systems with zero-downtime requirements.

· Cloud Platforms: Experience with AWS, Azure, or GCP is a plus.

· Communication: Strong analytical and communication skills with the ability to work effectively across teams.

Skills, experience, and other compensable factors will be considered when determining pay rate. The pay range provided in this posting reflects a W2 hourly rate; other employment options may be available that may result in pay outside of the provided range.

W2 employees of Eliassen Group who are regularly scheduled to work 30 or more hours per week are eligible for the following benefits: medical (choice of 3 plans), dental, vision, pre-tax accounts, other voluntary benefits including life and disability insurance, 401(k) with match, and sick time if required by law in the worked-in state/locality.
Please be advised- If anyone reaches out to you about an open position connected with Eliassen Group, please confirm that they have an Eliassen.com email address and never provide personal or financial information to anyone who is not clearly associated with Eliassen Group. If you have any indication of fraudulent activity, please contact InfoSec@eliassen.com.

About Eliassen Group:

Eliassen Group is a leading strategic consulting company for human-powered solutions. For over 30 years, Eliassen has helped thousands of companies reach further and achieve more with their technology solutions, financial, risk & compliance, and advisory solutions, and clinical solutions. With offices from coast to coast and throughout Europe, Eliassen provides a local community presence, balanced with international reach. Eliassen Group strives to positively impact the lives of their employees, clients, consultants, and the communities in which they operate.

Eliassen Group is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.

Don’t miss out on our referral program! If we hire a candidate that you refer us to then you can be eligible for a $1,000 referral check!