We are a market-leading web intelligence collection platform, providing premium proxies and data scraping solutions for large-scale public web data gathering. Today, we unite over 450 data industry professionals for one purpose: to create a future where all businesses have access to big data and business intelligence, and a work environment where everyone can grow and thrive.
Here’s how our Data Team describes this position:
If you joined this team, you would be solving our clients’ most ambitious challenges – as we are reaching towards being one of the first data extraction services!
We value people hungry for knowledge and eager to apply it, and in return, we provide a constantly evolving environment rich with challenges that we solve with the latest technologies and greatest ideas. We have a seat ready for you!
Your day-to-day:
- Work with challenging tasks to maintain data that helps our teams to gain insights on various aspects of our system.
- Collaborate both with analytics and back-end teams in order to ensure smooth data pipelines and processing.
- Work with data extraction services.
- Building data models and implementing necessary modifications.
- Good understanding of data processing principles.
- Design, implement, error managing, testing, monitor, and optimize our data platforms to meet the data pipeline needs.
- Identify incomplete data, improve the quality of data, and integrate data from several data sources.
Your skills & experience:
- Experience building data pipelines using cloud service providers such as Amazon Web Services (AWS), Microsoft Azure or Google Cloud Platform (GCP).
- Knowledge of ETL/ELT pipeline design for semi-structured data (both batch and real-time), development and maintenance using SQL and/or Python.
- Experience with SQL and NoSQL approaches to database design for evolving schemas, preferably in context of a large data lake or data warehouse (e.g. Google Cloud Storage/Google BigQuery), optimized for either storage or retrieval.
- Experience setting up and maintaining scheduled data processing orchestration (e.g. Airflow, Dagster).
- Experience developing and maintaining the infrastructure and tools around the data pipeline is highly desirable (e.g. managing AWS resources via Terraform, setting up CI/CD, maintaining a Kubernetes cluster).
- Experience working with dbt and Great Expectations is a plus.
- Experience working with or implementing Data Catalogs is a plus.
- Experience with REST API development is a plus.
- Experience with any of the messaging systems (such as Pub/Sub, RabbitMQ, AWS Kinesis or Apache Kafka) is a plus.
- Experience with Apache Spark and/or its Python libraries is a plus.
Don’t hesitate to apply even if you missed some of the criteria!
Salary:
- Gross salary 3300 – 6940 EUR/month. Keep in mind that we are open to discuss a different salary based on your skills and competences.
Up for the challenge? Let’s talk!