To the Aspiring Data Scientists
A lot of times interested folks contact me with a question like this: ‘I want to be a data scientist; can you help?’
Well, I wish I could. The reality is, I won’t be able to unless I really understand what you want and why you want it.
Firstly because guiding someone is next to impossible in this time and era when everything is so convoluted. People come to learn with preconceived ideas and notions. I am not saying you have to come with a blank slate, what I am saying is you need to show real motivation. I repeat, you must have a motivation and you must be able to articulate why you are motivated.
There is no shortage of tutorials, courses, documents, websites, articles today on how to become a data scientist. Sadly, most of the times there is a business mind lurking behind such documents and campaigns. They will tell you only partial truth. They will tell you that you can become a data scientist in 4 months, or in 6 months, or by completing a boot camp and so on and so forth.
In my opinion, if you already have the background of a typical data scientist (have worked with data, have knowledge of scientific process, programming skills in R, Python, SAS, or any relevant language), then yes, you can acquire the knowledge and become familiar with the tools that data science utilizes within a short period of time.
On the other hand, if you are a complete novice, you will not only be deceived by these campaigns but also become frustrated quickly. This is because you were drawn to something for which you were not prepared for. And you did not anticipate the challenges you would face to become a data scientist.
Now let’s come to the main point.
Why do you want to become one
Ask yourself the “why” question and give an honest answer. What is your response?
It is perfectly okay to say that you want to be a data scientist because the market is hot.
But the best approach would be to write a few points on a piece of paper as to why do you want to become a data scientist.
- Point 1
- Point 2
- Point 3
- Point 4
You could list that you are passionate about working with data; you want to make an impact by helping the management or team leaders making the right decision based on data, or you just enjoy solving other’s problems with data.
Whatever may be your reasons, it is important to list them down. Listening makes you look at it by yourself. That helps you to assess whether this is really something you want to do.
Motivation is the key.
Know what is data science and who are data scientists
Before you jump into this field, do some research about the field. Google it, find some blogs to follow, read some whitepapers from reputed analytic organizations and software companies such as SAS, IBM, Google, Microsoft and many others. Spend some time reading. Not just skimming through; actual reading. To me, reading is the best way to broaden knowledge. In this era of YouTube and video lectures, I still find reading somewhat superior, especially for beginners. Once you have acquired a general understanding of the field, listening to lectures or watching short video clips would boost your knowledge quickly.
What do data scientists do
I will write briefly. I am working work on a separate article on it.
Data scientists solve problems with the help of data. That’s what they do.
On paper, Data Scientists do so many things that someone with a title of data scientist in one industry cannot possibly list what another data scientist might be doing in a different industry. That is to give you an idea of how variable the type of works data scientists is involved with.
Simply put, data scientists work with data to bring actionable insight that helps to solve a problem. If the data scientist is working in a financial organization, the problems would be related to, say, banking industry, the stock market, etc. If the data scientist works in healthcare, the problem would be related to healthcare service provider such as hospitals, health units.
Data scientists do not necessarily get involved with building the data architecture such as data warehousing or setting up or maintaining cloud infrastructure for an organization. Those who work in this type of work are called ‘Data Engineers’. You may want to read a little more about this in my earlier post (in Bengali).
Let me clarify one thing–data scientists do not necessarily design the databases. It is the data engineers who do that. Data scientists analyze the data to drive business whereas data engineers develop and maintain the architecture that stores data. Data scientists or statisticians do not build or maintain databases in most of the cases. However, there is often a need to create smaller databases for a specific purpose or a project. A data scientist may need to design, develop and maintain project-specific databases, which are largely different from enterprise-level data warehouse.
Most large organizations have their separate team of engineers who develop and maintain the data-science platforms (DBs, Hadoop, etc.).
How to become a data scientist
Everyone wants to be a data scientist but only some will be successful. The reason is in part, your preparation, in part your effort. If you are not from a quantitative background with some math, and computer programming experience, I would say it would be a daunting task for you to learn what it takes to be a data scientist. I am completely being honest.
At some point in time, you are going to read and write computer programs.
If you’ve never written a computer program, I am not sure how long it will take for you to pick up a language. But I personally feel that it takes at least 6 months to get some basics of any programming language. Another year or so to know a bit more. If you engage yourself full time to learn a language, you might be able to write simple programs and carry out simple data science tasks within 4-6 months.
Two approaches to be a Data Scientist
Learn the tools to become data scientist, or Find the problems you want to solve, and then learn what it takes to solve them
The first approach is the typical approach and I call it bottom-up approach. I will not discuss the first approach because it has been the mainstream approach thus far. This has brought so many discussions, lot of training offerings, online courses, articles in the media, blogs, and vlogs, and much more.
I will focus on the second approach to becoming a data scientist. I call it top-down approach.
Find problems that you want to solve
This is perhaps the easiest way to find out if you really want to be a data scientist. If so, keep reading.
I call this approach a top-down approach, you first find a problem and then solve it.
As I wrote earlier in this article, data scientists solve problems with the help of data. In other words, they have a problem to solve in front of them and they would provide a solution using data.
Now this may sound like, well, problem is there, the data is there, so what’s the big deal?
It’s not a big deal, but its a huge deal.
Having the data is only one step to solving the problem. What if you do not have the data in the right form, or you do not have the data in the first place. If you have the data, it often takes 80% of your time to prepare it for your analysis.
If you do not have the data, then you first need to get the data in the first place.
Where do I find problem?
Well, problems are everywhere, you just have to look around.
I am going to give you some ideas which should give you enough clues to find a problem on your own and discover what it takes to solve those problems. Here goes the list with brief descriptions.
Problem 1: Building a list of most important news of the day
This may sound like an unattractive problem. But think about how you might want to use the information. First, you need to think about a focus area. If you are interested in politics, the possibilities are endless. If you build a list exclusively on “North Korea”, you would find at the end of the year how the things unfolded and you can present the information on a timescale. The importance of such a list is to be able to find temporal relationship between different events which may affect some other event/outcome of interest.
You might be aware that news or sometimes a tweet from an influential person can affect the stock price of a particular company in the US stock market. But this could be useful beyond US market. You will find many potential use cases if you think a bit deep.
Problem 2: Tracking accident statistics
This problem is particularly suitable for resource-limited countries where government record keeping is inadequate. Every day newspapers report accidental deaths due to road crashes. You could select one or two major newspapers and scan their pages to retrieve news about road accidents. This could be done as part of a research project at your institution where you pursue a Bachelors or Masters degree.
Problem 3: Calculating impact of faculty research
This is something you can do for your department, or institution or country. Start with your department first where you have only a handful of faculties who are presumably doing some kind of research and publishing their findings in journals. Think of developing an automated system that would find articles (in Google Scholar, for example) and calculate impacts (from the journal impact factors). You can present the results by department, by type of research, or by institution.
Problem 4: Build an economic performance dashboard
You can use Google’s Data Studio to build a dashboard of economic performance of your country on some key performance metrics. The data could be found on the central bank’s website or government data repositories. For Bangladesh, many economic data sets are available online free of charge. Just visit https://www.bb.org.bd/econdata/ and find your data there.
In conclusion, you can take the easier route to be a data scientist. In this route, you first need to find a problem and you will point your learning path towards solving that problem. The path is not smooth. But you can always ask questions to someone who knows or someone more experienced. It is a good practice to find someone who would be your mentor and will guide you through your learning path. Since you are learning on your own, you have to take the responsibility on your own. If you do not want to learn, nobody is going to push. If you are really interested and motivated, I hope this article has given you some direction.