About the Role
We’re looking for a Lead AI Engineer to own the delivery of GenAI and agentic systems end-to-end—from research prototypes to secure, observable, scalable production systems. You will set engineering standards across AI projects, lead technical design, guide LLM orchestration, and drive platform reliability, performance, and cost efficiency.
Key Responsibilities
· End-to-end delivery: Own the full lifecycle (discovery → prototyping → hardening → production → monitoring → continuous improvement) of GenAI and agentic systems.
· LLM orchestration & tooling: Design and implement workflows using LangChain, LangGraph, LlamaIndex, Semantic Kernel or similar. Optimize prompt strategies, memory, tools, and policies.
· RAG & vector search: Architect robust RAG pipelines with vector DBs (Pinecone, Chroma, Weaviate, pgvector), including chunking, hybrid search, embeddings selection, caching, and evaluation.
· Guardrails & observability: Implement policy/guardrails, safety filters, prompt/content validation, and LLMOps observability (tracing, token/cost monitoring, drift detection, eval harnesses).
· Architecture & microservices: Build scalable services and APIs in Python/JS/Java; define contracts, SLAs, and resiliency patterns (circuit breakers, retries, idempotency).
· Cloud & platform engineering: Design for AWS/GCP/Azure using managed services; containerize with Docker, orchestrate with Kubernetes, and automate via CI/CD.
· Security-first delivery: Enforce encryption, secrets management, IAM/least-privilege, privacy-by-design, data minimization, and model compliance requirements.
· MLOps & model serving: Operationalize models via MLflow/SageMaker/Vertex, with feature/data/version management, model registry, canary/blue-green rollouts, and rollback plans.
· Data engineering: Build reliable data pipelines (batch/stream) using Spark/Airflow/Beam; ensure data quality, lineage, and governance.
· Technical leadership: Lead design reviews, mentor engineers, enforce coding standards, documentation, and SRE best practices. Partner with Product, Security, and Compliance.
· Performance & cost: Optimize latency, throughput, token usage, context windows, and hosting strategies; manage budgets and efficiency.
Required Qualifications
· 12+ years overall software engineering experience with 3+ years hands-on in AI/ML/GenAI, including production deployments.
· Strong system design and scalable architecture skills for AI-first applications and platforms.
· Hands-on expertise with LLM orchestration frameworks (e.g., LangChain/LangGraph/LlamaIndex/Semantic Kernel).
· Proven experience with RAG and vector databases (e.g., Pinecone, Chroma, Weaviate, pgvector).
· Proficiency in Python (primary) and at least one of JavaScript/TypeScript or Java.
· Solid foundation in cloud (AWS/GCP/Azure), Docker/Kubernetes, and CI/CD.
· Practical knowledge of guardrails, prompt/context engineering, multimodal workflows, and observability.
· Experience with MLOps/model serving (e.g., MLflow, SageMaker, Vertex AI) and data pipelines (e.g., Spark, Airflow, Beam).
· Security-first mindset and familiarity with compliance (PII handling, RBAC/IAM, key management).
Nice-to-Have
· Experience with function/tool calling, agent frameworks, and structured output (JSON/JSON Schema).
· Knowledge of embedding models, rerankers, hybrid search (BM25 + vector), and evaluation frameworks.
· Exposure to cost/latency trade-offs across hosted vs. self-hosted models; GPU inference (Triton, vLLM, TGI).
· Familiarity with feature stores, streaming (Kafka/PubSub), and data contracts.
· Domain experience in [your industry/domain—e.g., BFSI, healthcare, manufacturing].
· Contributions to OSS, publications, patents, or speaking at AI/ML conferences.