● Monitor production systems, investigate incidents, and resolve known issues across integrations,
workflows, and data pipelines.
● Troubleshoot complex problems surfaced by Customer Success, Implementation, or Product teams
and drive issues to resolution.
● Coordinate incident response and escalate effectively when deeper engineering fixes are required.
● Support production releases, validate successful rollouts, and assist with post-release troubleshooting.
● Improve system reliability and observability by partnering across teams to strengthen tooling,
monitoring, and operational processes.
Required
● 2+ years of experience in technical support, site reliability, production engineering, or a similar role.
● Strong debugging and troubleshooting skills in distributed systems.
● Experience monitoring and supporting production applications.
● Familiarity with JavaScript-based backend systems, preferably Node.js.
● Comfort working across logs, queues, APIs, and cloud infrastructure.
● Strong communication skills and the ability to coordinate across technical and non-technical teams.
Preferred / Stand-Out
● Experience with AWS infrastructure.
● Experience with healthcare integrations or HL7/FHIR workflows.
● Familiarity with observability tools such as Datadog, CloudWatch, or similar platforms.
● Experience supporting integration engines or message processing systems.
● Experience working in startup environments.