My Journey in the Data Science World: Learnings from my 1st year as a data scientist at GDNA
Note: Article from 2020, which I never posted earlier, just sharing my thoughts now
When DJ Patil and Jeff Hammerbacher coined the term ‘Data Scientist’ in 2008, there were many people already working in the defined role. But the explosion of data, an increase of computational abilities teamed up with ease of programming made data science the sexiest job of 21st century (by HBR). Companies have been scrambling to hire data scientists and data engineers to monetize the available data and grow on the insights generated. Even in the COVID hit world, companies have been urged to hire more data scientists, this can be due to many reasons, few are: Companies don’t want to lose their competitive edge given by the insights generated by data and the automation drive made possible by the Covid-19 generates more and more data. The need and importance of data science is now more than ever. So naturally, the interest in data science and data engineering jobs is also huge.
Over the last 1 year, I have received many requests from college students, recent college graduates and few parents (sigh!) on behalf of their kids asking how I landed in this job, with my physics + manufacturing background. The frequent questions are around-, what is the work I do, and if I can mentor new graduates to land a job in data science. I have decided to write about my experiences in data science to help more who are interested in this field. This article is going to be about my 1st year as a data scientist at Group Data n Analytics (GDNA), Aditya Birla group.

Most of the time people identify the word data science with modelling or machine learning or deep learning. But data science is not just about modelling. The final goal of any data science project is creating business value. Defining the problem, identifying the final key performance indicators, benchmarking the models and the solution, generating insights, and finally understanding when to stop. All these activities have one final goal — , to solve a business problem and create value. In manufacturing analytics, it can be finding a root cause, or predicting when the system is going to fail or developing a recommender system. It could just be a dashboard to generate insights to help a logistics business or it could be detecting shop floor workers wearing helmets and vests inside the factory for safety purposes. The project is useful only if it creates significant business impact.
This is my 1st learning: Business importance precedes everything
I have been working primarily on manufacturing analytics. The projects I have worked on in 1.5 years, including my internship, were in network optimization, creating a data pipeline, extensive exploratory data analysis for root cause analysis, modelling, visualization, and creating a recommender system for optimizing the manufacturing process. This is to explain the vast breadth of data science. The projects can include any to all steps, right from creating a pipeline of data to generate insights and visualization. The role of the data scientist is continuously evolving. Knowing concepts of software engineering would help in creating modular codes and which will be deployed into production without the need for a DevOps re-programming. Few projects I worked on made me learn visualization and deployment tools like streamlit. One of my projects involved converting machine IoT data from an IBA format to H5 format for storage and processing purposes. I took help from a data engineer and understood the process of creating a pipeline from a remote server and making the entire chain of data until pre-processing. The processed data was then used for a dashboard for shop floor operators. Understanding the data pipeline and concepts of visualization helped me in the project.
This is my 2nd learning: Data science is vast and staying connected with the extent of its breadth is beneficial
The team I work with was highly skilled and keeping up with the pace of experienced professionals while still mastering the basics is hard, but my managers have patiently advised me and helped me create a skill updation matrix. A skill matrix is traditionally done by managers to track their team’s competencies and skill level for the various jobs undertaken. But having an individual skill matrix is crucial. A skill matrix can help in understanding the current level of competency. Aligning learnings with job requirements and long-term goals is important. As a fresher, it was important for me to have a skill matrix, as the data science world is vast, and it is easy to get lost in the convoluted realms of deep learning without mastering the basics. Outlining what to learn and when to learn it is vital. There are currently many online platforms Coursera, Udemy, Upgrad, etc; to start your data science journey. I have done a few courses on Coursera but, competitions are a better place to learn more than courses. Kaggle competitions, hackathons, other challenges like prognostics & health management are good to improve learning as well as to display knowledge. One such challenge I took part was PHM (Prognostics and health management) data challenge. The challenge is to predict the remaining useful life of a filter. It is a predictive maintenance problem with a twist, the final score is calculated with the model using 100% data, 75% data, 50% data and 25% data. The model must be stable across all the datasets. This challenge helped me understand various concepts which are usually not taught in the courses. The complexity of real data is quite different from the curated datasets used in teaching online courses. This challenge also aligned with my skill matrix. Taking part in such competitions which also help you improve in the long term, are an effective way of updating skills.
This is my 3rd learning: Courses are good, competitions are better, but creating a skill matrix is the most crucial step of learning

The exciting part about working in a global conglomerate is the team. The team I work with is diverse in terms of academic diversity. There are PhD’-s, statisticians, computer science graduates and MBA’-s. This diversity helped me understand various views on every project and widened my thinking abilities. There are industry leaders from diverse sectors like banking and financial services, information technology, consulting, consumer- retail industry, core engineering services, bio-medical engineering, predictive maintenance and even the armed forces. Each member brings a unique idea to the table. The team works on various projects in areas of sales & operations planning, logistics, procurement, energy, pricing and developing new products using computer vision and NLP. With teams that are highly dynamic such as ours, working on a diverse range of projects, continuous interaction with the team helped me immensely. The short coffee break discussions, asking an experienced team member to review hypothesis or the code, improving on how to communicate the results, these can be learned quickly by observing the experienced.
This is the 4th lesson I learned: Interacting and learning from a diverse set of people widens thinking
Summary
Data science is vast. Wearing multiple hats is always necessary, sometimes even the hat of a consulting job. Questing everything will help to understand the data and the problem better, even when the problem is simple, it gives a different perspective of the client. The answers to the hardest questions are found in the simplest ideas. Basics in math (statistics and algebra) are always helpful and will create a greater understanding of the new concepts. Data science is 70% working on thinking the ideas, working on data and 30% final modelling. A good data science solution is only as good as the data he/she gets, and the business impact created. Always think and ideate with your client. Communication and visualization are important. With this, I am ending the article of my 1st-year journey. I hope my future holds much more challenges and many more learnings.




