Join one of the most promising early-stage startups in cyber + AI, founded by serial entrepreneurs with multiple successful exits and backed by top-tier investors including Cyberstarts and Boldstart. We're building cutting-edge tech that merges LLMs, real-time data analytics, and deep cybersecurity research – and we’re just getting started.
As part of our Data Platform team, you’ll help build the scalable foundation that powers our research models, real-time detection engines, and AI-driven features. We're talking billions of events, graph-based insights, and high-velocity pipelines – all in production.
Responsibilities
- Design and build scalable data architectures (lakes, warehouses, pipelines) from scratch
- Develop ingestion and processing pipelines for large-scale, multi-source datasets
- Own and optimize databases for performance at scale (indexing, partitioning, etc.)
- Implement data orchestration workflows and CI/CD for reliability
- Integrate with ML and LLM pipelines (including RAG systems and model serving)
- Work hand-in-hand with engineers, data scientists, and cybersecurity experts
- Stay ahead of trends in data engineering, AI, and security – and apply them fast
Requirements
- Proven experience (4+) building large-scale cloud-based data systems (AWS/GCP/Azure)
- Strong backend development skills for data-driven systems
- Deep knowledge in data modeling, schema design, and query optimization
- Hands-on experience with streaming technologies like Kafka, Flink, or Spark Streaming
- Familiarity with a mix of SQL, NoSQL, NewSQL, and graph databases
- Solid understanding of data security and governance best practices
- Commitment to code quality, CI/CD, and system reliability
- Bonus: Experience with Kubernetes, Terraform, or orchestration tools like Airflow/Dagster