We are looking for a Senior Data Engineer with minimum 5+ years of professional experience to design, build, and maintain scalable, high-performance data platforms that power our analytics and machine learning initiatives. You will work in a distributed, data-intensive environment and collaborate with cross-functional teams to ensure data accuracy, reliability, and security.
Roles & Responsibilities:
Design, develop, and maintain scalable data processing pipelines and workflows using frameworks such as Apache Spark, PySpark, and Apache Beam.
Build and maintain microservices in Python that serve data-driven features in production.
Develop internal tools to support CI/CD pipelines, experiment tracking, and data versioning.
Collect, process, and integrate large datasets from multiple sources, including databases, file systems, and APIs.
Ensure data integrity, consistency, and quality through robust validation and monitoring processes.
Optimize data systems for performance, scalability, and high availability.
Implement best practices for data security, access control, and privacy.
Collaborate with data scientists, analysts, and engineers to support analytics and ML workflows.
Lead complex migration initiatives involving the transition from on-premise Cloudera (HDFS, Hive, Impala, HBase) environments to cloud-native platforms (Snowflake or Databricks), ensuring zero data loss and minimal downtime.
Architect and build a large-scale, greenfield data platform from the ground up, including the setup of end-to-end infrastructure, data ingestion layers, transformation frameworks, and high-throughput integrations with Snowflake or Databricks.
Must have:
5+ years of professional experience in software engineering or data engineering.
Strong software engineering skills with Python in large-scale, high-performance production environments.
Hands-on experience with Spark/PySpark and other big data frameworks.
Expertise in data modeling and working with both structured and unstructured data.
Hands-on experience with streaming data platforms, particularly Apache Kafka.
Strong understanding of distributed systems and modern data architectures.
Experience working with cloud platforms, preferably GCP (BigQuery, Dataflow, Pub/Sub, Dataproc).
Excellent problem-solving and communication skills.
Proven, hands-on experience with large-scale data platform migrations, specifically transitioning from Cloudera (CDH/HDP) to either Snowflake or Databricks. This includes re-architecting Hive/Impala workloads, migrating HDFS data, and refactoring legacy ETL processes to leverage cloud-native features (e.g., Snowpipe, Auto-ingestion, or Delta Live Tables).
Deep technical expertise in building and orchestrating a high-performance data platform from scratch, including the implementation of complex integrations (Reverse ETL, CDC, API ingestions) specifically targeting Snowflake (using Snowpipe Streaming, Tasks, Streams) or Databricks (using Auto Loader, Delta Sharing, Unity Catalog).
Nice to have:
Experience with Databricks and real-time data processing frameworks.
Experience with NoSQL databases (e.g., Redis, Neo4j) and data lakes.
Knowledge of ML workflows and algorithms.
Exposure to other cloud platforms (AWS, Azure) and relevant certifications.
Familiarity with ETL/integration tools (Talend, Airflow, dbt, etc.).
Hands-on experience with version control (Git) and CI/CD pipelines.
Job Qualifications:
You have a solid academic background in Computer Science, Engineering, or related fields.
You are passionate about data, enjoy working in fast-paced, collaborative environments, and thrive on solving complex problems.
You are a innovator that thinks about how data technology can solve new and product opportunities.
You understand the ecosystem of data technologies, for example data governance and data integration softwares.
You can manage interactions with potential customers.
Opplane specializes in providing advanced data-focused solutions for financial services, telecommunication, and reg-tech to accelerate their digital transformation journey. Opplane leadership team is comprised of Silicon Valley serial entrepreneurs and experienced executives. Its expertise comes from years of specific industry experience at some of the worldβs top companies, such as PayPal, Xerox Parc, Amazon, Wells Fargo, SoFi in the areas of product management, data technology, data governance, data privacy, security, machine learning, and risk management.
π Global & Multicultural β Diverse perspectives, global collaboration (US, Portugal, India and Singapore offices)
β‘ Startup Energy β Fast-moving, impact-driven environment
πͺ Ownership Mindset β Engineers own what they build
π€ Collaborative & Friendly β Open, curious, and supportive culture