Send to friend Save 3 weeks ago

Job Location : Gauteng,

Deadline : October 30, 2024

Quick Recommended Links

About the Role

As a data scientist on our team, you will work on new product development in a small team environment writing production code in both run-time and build-time environments. You will help propose and build data-driven solutions for high-value customer problems by discovering, extracting, and modeling knowledge from large-scale natural language datasets. You will prototype new ideas, collaborating with other data scientists as well as product designers, data engineers, front-end developers, and a team of expert legal data annotators. You will get the experience of working in a start-up culture with the large datasets and many other resources of an established company.

Responsibilities

Evaluate and help maintain our data assets and training/evaluation data sets.
Develop and implement NLP-based information extraction solutions.
Propose and identify trade-offs of various algorithmic solutions.
Interface with other technical personnel or team members to finalize requirements.
Work closely with other development team members to understand moderately complex
product requirements and translate them into software designs.
Successfully implement development processes, coding best practices, and code reviews for
production environments.

Preferred Qualifications

Masters Degree in Data Science, Computer Science, Statistics, Machine Learning or related field
2+ years of relevant work experience
Data Science and NLP Skills
Formal training in machine learning: dimensionality reduction, clustering, embeddings, and sequence classification algorithms
Practical experience in Natural Language Processing methods and libraries such as spaCy,
word2vec, TensorFlow, Keras, PyTorch, Flair, BERT, large language models and prompt engineering
Technical Skills
Strong Python, Scala or Java background
Knowledge of AWS, GCP, Azure, or other cloud platform
Understanding of data modelling principles and complex data models.
Knowledge of relational and NoSQL databases (e.g. Postgres, Elasticsearch/ OpenSearch, AWS
Neptune)
Knowledge of Spark, Ray, or other distributed computing systems highly preferred
Knowledge of API development, containerization, and machine learning deployment highly preferred
Interest in ML Ops/AI Ops highly preferred