Near ingests and houses several terabytes of data every day from various data partners in domains such payments, telecom, real-estate, retail, content publishing etc. Near specializes in blending, managing and analyzing large quantities of data and capturing insights within a popular SaaS platform known as AllSpark. The data scientists will be focused on generating quantitative and qualitative insights from the raw data sitting in its big data warehouse. The insights are primarily used by Near clients for branding and marketing their products, making rational and strategic decisions for the company to maximize profitability and improve customer experience.
- Primary responsibility is to build data science capabilities, automate the data science pipelines and enable data science.
- Understand data quality and leverage open source tools to build out data science capabilities.
- Working closely with the Engineering team to “operationalize” and deploy the models.
- Developing “core” data science models and capabilities - that power the Near Location Intelligence Platform and associated products.
- Advanced data analytics include processing structured (payments, telcom, page clicks etc) and unstructured data in multiple formats (text, audio, video) spanning multiple domains including user profile data, geo-spatial data, network data and retail data.
- Partner with technology and the business team to build a superior data quality pipeline that will feed the models.
- Mentor/share knowledge of data science with other global members of the Near, document and partner with others as a team to deliver the maximum value for the company.
- Understand and prioritize the data science work based on cost effectiveness and leveraging time management skills.
- Attend conferences and organize workshops/meet-ups to be in touch with data science community.
- Must have minimum of 3-5 years of industry experience in developing data science models.
- Must have completed academic projects in data science experimenting with raw data and generating insights, publications are a plus.
- Must have thorough mathematical knowledge of correlation/causation, decision trees, classification and regression models, recommenders, probability and stochastic processes, distributions, priors and posteriors.
- Skilled at scientific programming languages such as Python, Java, R, Matlab, Clojure and writing deployable code into production.
- Understand the model lifecycle of cleansing/standardizing raw data, feature creation/selection, writing complex transformation logic to generate independent and dependent variables, model selection, tuning, A/B testing and generating production ready code.
- Knowledge of Numerical optimization, Linear/Non-linear/Integer programming, Statistics, Combinatorial optimization is a plus.
- Familiarity with R, Apache Spark (Java, Scala, Python), PyMC3/theano/tensorflow and other scientific python/R modules is a plus.
- Need to be comfortable writing code for model building and bootstrap, test and own models through their lifecycle including devops and deploying into cloud.
- We are looking for a data scientist with a Master’s Degree.
- An ideal candidate must have closely worked with engineering teams to design & develop machine learning capabilities for a minimum of 3 years.
- Overall 6-9 years of experience with at least minimum 3 years working experience on any data driven company/platform.
- Candidate is expected to have exceptional problem solving, analytical and organisation skills with a detail-oriented attitude.
- Passion for learning new technologies and be up-to-date with the scientific research community.