About the Role
Abnormal AI is investing strongly in building and supporting our world-class data pipelines that power our AI-native security platform at massive scale. As the founding member of our Data Engineering function, you will establish the technical and operational foundation for data excellence across the company. Your work will enable Abnormal to continue its steep growth trajectory while delivering enterprise-grade reliability and performance.
You’ll own the end-to-end reliability of the business-critical data pipelines that fuel our AI models, everything from detection analytics to behavioral baselining systems to the data infrastructure backing our threat intelligence capabilities. As we expand globally and onboard increasingly large enterprises, you’ll architect systems that scale gracefully while maintaining 99.9% availability.
This is a high-visibility, cross-functional role. Your work will directly accelerate Data Science innovation, improve product quality for customers, and inform GTM and financial decisions across the organization. You will serve as the connective tissue between our Data Platform team and our Data Guild’s analytical expertise.
What you will do
- Own mission-critical pipeline reliability: Take end-to-end ownership of our production data pipelines processing billions of messages weekly, ensuring 99.9% uptime for revenue-critical pipelines that directly enable sales and customer-facing AI products
- Build self-healing pipelines: Design and implement automated monitoring, testing, and recovery systems for data pipelines that eliminate manual intervention and reduce MTTR from hours to minutes
- Accelerate development velocity: Deploy CI/CD pipelines and self-service platforms that reduce deployment time from 3-5 days to under 2 hours, enabling Data Scientists to safely deploy models without engineering bottlenecks
- Architect for scale: Optimize data pipelines handling exponential annual growth, implementing cost-effective solutions that support regional expansion and compliance requirements (GDPR, FedRAMP, SOC2)
- Bridge technical and business domains: Partner with Sales, Finance, and Product teams to ensure data infrastructure aligns with business needs, making critical trade-off decisions when pipelines impact revenue
- Establish data engineering excellence: Define best practices for dbt, Airflow, Spark usage, PII anonymization, and cross-divisional data sharing while mentoring embedded Data Guild team members on these.
- Enable AI and accessible data consumption: Design and maintain an accessible semantic layer that provides consistent, trustworthy definitions and abstractions, making it easy for stakeholders to consume data and incorporate AI-driven insights into their workflows.
Must Haves
- 6+ years of software engineering experience in backend, distributed systems, or data-focused roles.
- Proven experience designing and running large-scale, production-grade data pipelines.
- Proficiency in our stack: Python, Spark/PySpark, Airflow, SQL, dbt, Databricks, Snowflake, AWS.
- Proven track record of driving pipeline reliability to 99%+ uptime, including SLAs, observability tooling, and automated recovery patterns.
- Strong systems-thinking skills with the ability to debug complex distributed systems, optimize for performance and cost, and make architectural decisions balancing short-term needs with long-term scalability.
- Demonstrated ownership mindset and ability to drive projects from conception to production independently, including on-call responsibilities for critical systems.
- Experience collaborating with Data Science, Analytics, Product, Finance, Marketing, and Sales, along with the ability to communicate technical decisions clearly to non-technical stakeholders and executives.
- Bachelor’s degree in Computer Science, Applied Sciences, Information Systems or other related quantitative fields.
Nice to Have
- Experience building or operating AI/ML data pipelines, including data readiness for training and evaluation.
- Background in high-growth environments where data volume doubles annually, requiring frequent re-architecture and optimization.
- Experience with compliance frameworks such as GDPR, SOC2, FedRAMP, plus familiarity with PII handling and anonymization.
- Knowledge of multi-region data architectures, cellular/multi-tenant systems, or related large-scale distributed design patterns.
- Background in cybersecurity, threat detection, or email security.
- Experience building internal developer tools for data scientists and analysts.
- Track record of mentorship, tech leadership, and driving cross-functional initiatives.
- Advanced degree in Computer Science or related fields.
Why This Role Matters
This is a founding role where you'll establish data engineering excellence at Abnormal and build the foundation that powers our next phase of growth. You'll have the autonomy to make critical technical decisions, the visibility to see your direct impact on business outcomes, and the potential opportunity to build a team and platform from the ground up. As we scale to protect more of the world's largest enterprises, your work will ensure our data pipeline infrastructure matches the world-class standard of our security products.
#LI-NT1
At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons.
Base salary range:
$176,000—$207,000 USD
Abnormal AI is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status or other characteristics protected by law. For our EEO policy statement please click here. If you would like more information on your EEO rights under the law, please click here.