
NOC Engineer - Level 3
- Brasil
- Permanente
- Período integral
- Monitor and troubleshoot distributed edge compute and Versa SD-WAN platforms to ensure high availability and performance.
- Respond to escalated incidents from Level 1/2 teams, performing deep-dive diagnostics across Linux, virtualized, containerized, and cloud environments.
- Perform advanced troubleshooting of complex production systems—e.g., out-of-memory issues, application crashes, packet loss, or routing anomalies.
- Use observability and monitoring tools (e.g., Grafana, SolarWinds, DataMiner) to detect and act on system anomalies; contribute to alert optimization and dashboard tuning.
- Support changes and updates to compute nodes, appliances, and edge services under formal change control processes.
- Participate in service restoration, incident post-mortems, and root cause analysis.
- Collaborate on recurring issue identification and automation opportunities to reduce manual effort.
- Create and maintain operational documentation, troubleshooting guides, and SOPs/runbooks.
- Participate in a global 24/7 shift-based coverage as needed.
- Extensive and strong hands-on experience with Linux system administration and troubleshooting in production environments is a must (e.g., out-of-memory issues, application crashes, connectivity issues).
- Familiarity with Type 1 and Type 2 hypervisors (e.g., KVM, VMware ESXi), including passthrough configuration, resource tuning, and troubleshooting.
- Experience with public cloud platforms (e.g., Azure, AWS), including VM provisioning, virtual networking, SSO setup, and datastore configuration.
- Solid understanding of IP networking protocols and services: TCP/IP, DNS, DHCP, NAT, VLANs, routing (BGP, OSPF, RIP), VPNs, proxies, and firewalls.
- Strong troubleshooting skills using network tools (e.g., ping, traceroute, tcpdump, Wireshark).
- Proficiency with virtual NICs, Linux network stacks, and interface configuration.
- Hands-on experience testing and troubleshooting REST APIs using Postman, Swagger, CURL, and Python scripts. Familiarity with reading Swagger definitions to automate or validate API workflows.
- Exposure to containerized workloads and orchestration, including Docker, Kubernetes, or K3s.
- Familiarity with infrastructure automation tools such as Ansible, Terraform, or Bash scripting.
- Foundational understanding of TPM, encryption, and measured/secure boot is a plus (can be taught internally).
- Experience with tools like ServiceNow, Jira, or other ITSM platforms for case and incident management.
- Strong written and verbal English communication skills with the ability to clearly document issues, escalate appropriately, and collaborate across teams.
- Experience supporting or deploying edge orchestration platforms (e.g., Zededa, Azure IoT Edge, AWS Greengrass).
- Exposure to hybrid or satellite-based connectivity environments, including high-latency or intermittent networks.
- Experience working in a managed services provider (MSP) environment, ISP/NOC, or systems integrator is preferred.
- Familiarity with zero-touch provisioning, telemetry pipelines, or site/network architecture at the edge is a plus.
- Work at the cutting edge of network operations, edge computing, and satellite systems, enabling global connectivity in the most remote and demanding scenarios.
- Be part of a collaborative, fast-moving team that values technical depth, mentorship, and continuous learning.