Near ingests and houses several terabytes of data every day from various data partners in domains such payments, telecom, real-estate, retail, content publishing etc. Near specializes in blending, managing and analysing large quantities of data and capturing insights within a popular SaaS platform known as AllSpark. The data scientists will be focused on generating quantitative and qualitative insights from the raw data sitting in its big data warehouse. The insights are primarily used by Near clients for branding and marketing their products, making rational and strategic decisions for the company to maximize profitability and improve customer experience.
- Developing “core” data science models and capabilities - that power the Near Ambient Intelligence Platform and associated products.
- Advanced data analytics include processing structured (payments, telecom, page clicks etc) and unstructured data in multiple formats (text, audio, video) spanning multiple domains including user profile data, geo-spatial data, network data and retail data.
- Partner with technology and the business team to build a superior data quality pipeline that will feed the models.
- Research and create intellectual property for the company that will benefit Near and its partners.
- Use nonparametric and probabilistic models to generate insights keeping in mind the bias- variance trade-off .
- Working closely with the Engineering team to “operationalize” and deploy the models.
- Mentor/share knowledge of data science with other global members of the Near, document and partner with others as a team to deliver the maximum value for the company.
- Understand and prioritize the data science work based on cost effectiveness and leveraging time management skills.
- Attend conferences and organize workshops/meet-ups to be in touch with data science community.
- Must have minimum of 3-5 years of industry experience in developing data science models.
- Must have completed academic projects in data science experimenting with raw data and generating insights, publications are a plus.
- Must have thorough mathematical knowledge of correlation/causation, decision trees, classification and regression models, recommenders, probability and stochastic processes, distributions, priors and posteriors.
- Skilled at scientific programming languages such as Python, Java, R, Matlab, Clojure and writing deployable code into production.
- Understand the model lifecycle of cleansing/standardizing raw data, feature creation/selection, writing complex transformation logic to generate independent and dependent variables, model selection, tuning, A/B testing and generating production ready code.
- Knowledge of Numerical optimization, Linear/Non-linear/Integer programming, Statistics, Combinatorial optimization is a plus.
- Familiarity with R, Apache Spark (Java, Scala, Python), PyMC3/theano/tensorflow and other scientific python/R modules is a plus.
- Need to be comfortable writing code for model building and bootstrap, test and own models through their lifecycle including devJDops and deploying into cloud.
- We are looking for a data scientist with a Master’s Degree, PhD is preferred.
- An ideal candidate must have academic experience and must have published a few research papers.
- Overall 6-9 years of experience with at least minimum 3 years working experience on any data driven company/platform.
- Candidate is expected to have exceptional problem solving, analytical and organisation skills with a detail-oriented attitude.
- Passion for learning new technologies and be up-to-date with the scientific research community.