We are developing a next-generation cybersecurity platform powered by advanced data infrastructure, machine learning, and LLM agents. As the Head of Data, you will lead the design and implementation of a scalable, high-performance data platform that processes billions of events and connections daily. This role offers a unique opportunity to define and own the data architecture at the core of a deep-tech product, in collaboration with a highly experienced team.
Responsibilities
- Architect and build a modern data platform from the ground up, including data lakes, warehouses, and real-time streaming infrastructure
- Design and maintain robust data pipelines for ingesting, transforming, and managing massive, complex datasets from multiple sources
- Manage and optimize databases to support high-throughput, low-latency workloads—across SQL, NoSQL, and graph databases
- Ensure scalability and performance of the platform under heavy data loads
- Collaborate with data scientists and engineers to implement ML pipelines, real-time analytics, and LLM integration (including RAG architectures)
- Own data quality, consistency, and governance, including best practices around security, encryption, and compliance
- Lead innovation by staying current with the latest in big data, cloud technologies, and AI infrastructure
- Mentor and guide a growing team, while remaining hands-on in architectural decisions and implementation
Requirements
- Proven experience (5+ Years) building large-scale data platforms using modern cloud infrastructure (e.g., AWS, GCP, Azure)
- Deep knowledge of data modeling and experience designing systems for both real-time and batch processing
- Strong background in backend engineering for data-intensive systems
- Hands-on expertise in stream processing technologies (e.g., Kafka, Flink, Spark Streaming)
- Proficiency in managing diverse databases and optimizing query performance at scale
- Familiarity with data security, privacy, and governance best practices
- Experience working closely with data science teams on production ML pipelines and LLM-based features
- Advantage: experience with orchestration tools like Dagster or Airflow, and infrastructure tools like Kubernetes and Terraform