Site Reliiability Engineering (SRE) Cloud Architect
- Location-United states
- Company-Oracle
- Job type-Hybrid Full time
- Experience level– Mid-senior level
Preferred Qualifications:
- Minimum 10 years of hands-on operational, development, DevOps or SRE experience
- Experience in a technical leadership role with a history of embracing automated processes, cloud-native application design principles and a CI/CD DevOps model.
- Experience with production operations and best practices for deploying quality code in production and troubleshooting issues when they arise.
- Experience with operational support of containerized, microservice-based application(s) in a production-level Kubernetes environment for a highly available product or service offering.
- Experience deploying, configuring, managing and debugging cloud infrastructure and platform software such as OpenStack, Kubernetes, etc.
- Experience with commercial Kubernetes on-prem products (such as OpenShift, Tanzu, Rancher) or public cloud-managed Kubernetes (such as OCI/OKE, AWS/EKS, GCP/GKE, Azure/AKS).
- Experience with cloud-native administration and monitoring technologies such as Docker, Helm, Prometheus, Grafana, EFK/ELK, Jaeger, or similar technologies.
- Knowledge of Infrastructure as Code (IaaC), Configuration as Code (CaC), GitOps and tools such as Terraform, Argo CD, Flux, etc.
- Experience designing and implementing CI/CD pipelines, platforms and components such as Jenkins.
- Experience and working knowledge in scripting languages like Python, Perl, and/or Shell Scripting.
- Knowledge of orchestration tools like Ansible and Chef.
- Knowledge of version control using Git.
- Knowledge and understanding of REST Architecture and JSON is a plus.
- Experience with application frameworks such as Spring, Helidon, Micronaut, etc. is a plus.
- Experience developing or designing telecommunications software is a plus.
- Experience working in Agile/Scrum development process is a plus.
- Experience in Linux/Unix environment
- Strong troubleshooting capabilities targeting complicated problems in remote systems
Responsibilities:
- Develop and support the SRE framework and automation
- Develop metrics collection of failure events and analytics
- Analyze failure events, identify and dissect failures by infrastructure layers by service stack and by application components and their inter-relationship
- Provide recommendations to improve product development
- Provide support for components going onto Cloud infrastructure
- Provide support on other Dev Test and System Test infrastructure
- Provide best practice on frameworks, automation, methodologies
- Be a team player and encourage cross-learning and cross-functional support
Job Description
Design, develop, troubleshoot and debug software programs for databases, applications, tools, networks etc.
This is an engineering position that will involve working on Oracle Communication’s market-leading Session Border Controller (SBC) product. The SBC is deployed across multiple top-tier operators and numerous large and medium enterprises. This position is to specifically support an initiative to develop a modernized cloud-native SBC product that embraces DevOps automation following an Agile development methodology.