NLP/AI EngineerThe NLP/AI Engineer owns the intelligence logic layer of Aetosky's platform - the models and algorithms that determine what matters in a high-volume stream of multilingual open-source data. This is a dedicated AI/ML role: you design the statistical filters, build the semantic analysis pipeline, architect LLM-powered deep processing workflows, and lay the groundwork for transitioning to sovereign, air-gapped language models. A separate engineering role handles data ingestion infrastructure, allowing you to focus entirely on model performance, prompt engineering, evaluation, and cost-efficient AI at scale. AI-assisted development (GitHub Copilot, Cursor, Claude Code, or equivalent) is the standard workflow - not optional - and will be directly assessed during the hiring process.
ResponsibilitiesCore NLP / AI Responsibilities• Design, implement, and refine text scoring and anomaly detection algorithms for identifying emerging trends and threats across multilingual data sources.• Build and optimize semantic similarity pipelines: embedding model selection, vector-based content deduplication, and clustering for efficient human review.• Develop detection logic for coordinated inauthentic behavior, including timing-based anomalies and content duplication patterns.• Architect multi-step LLM inference workflows for deep analysis: intent extraction, entity identification, relationship mapping, and structured output generation.• Iterate rapidly on prompt design and context management using AI-assisted tooling.
Model Performance & Cost Optimization Responsibilities• Design evaluation frameworks and metrics for NLP output quality: precision, recall, false positive rates, and processing latency.• Implement budget-aware processing controls that gracefully degrade under cost pressure without losing critical signals.• Optimize LLM inference costs through prompt engineering, batching, caching, and token management strategies.• Benchmark and evaluate models (commercial APIs and open-source alternatives) for cost-performance tradeoffs across target languages.
Sovereign AI & Research Responsibilities• Establish the technical roadmap for transitioning from commercial LLM APIs to sovereign, air-gapped Small Language Models (SLMs) for sensitive deployments.• Design data collection and annotation strategies to turn accumulated regional language data into fine-tuning datasets.• Evaluate and prototype candidate SLM architectures for Southeast Asian and Middle Eastern languages and dialects.• Monitor for adversarial data quality issues such as semantic drift and corpus contamination.
Collaboration Responsibilities• Lead the platform's post-launch calibration process, translating analyst feedback on output quality into measurable system improvements.• Collaborate with infrastructure and frontend engineering on data schemas, API contracts, and integration points.• Document model decisions, prompt templates, and tuning parameters to support team scaling and knowledge transfer.
Classifications / QualificationsRequired• 3+ years in NLP, machine learning engineering, or applied AI with a focus on production systems.• Demonstrated daily proficiency with AI-assisted development tools (GitHub Copilot, Cursor, Claude Code, or equivalent) — this will be assessed in the technical evaluation.• Deep hands-on experience with text embedding models, vector similarity search, and clustering algorithms.• Strong LLM prompt engineering: multi-step prompt design, context window management, structured output control, and inference cost optimization.• Strong Python skills with production experience in NLP/ML libraries (spaCy, Hugging Face Transformers, scikit-learn, or equivalent).• Experience designing evaluation frameworks and quality metrics for NLP systems.• Comfortable working autonomously across research and production in a small, high-ownership team.
Preferred• Experience with multilingual NLP.• Experience fine-tuning or training Small Language Models for domain-specific applications.• Background in influence operation detection, disinformation analysis, or social media intelligence.• Experience with semantic drift detection or adversarial data quality monitoring.• Familiarity with government cloud environments and data residency requirements (FedRAMP, ISO 27001, or equivalent).• Published research or demonstrated contributions in applied NLP, information extraction, or computational social science.