Every few years, some topic becomes a hype and everyone wants in. Having lived through the dot com boom (hosting was a hype, really), the web app boom (wow you can make an ASP page not just CGI-BIN), the mobile app boom (my mobile icon conforms to Jon Ive’s specs so I’m cooler than you), the Grid boom (what’s it now? HPC still? Cloud?), the drone boom (and we started a company too!), now the ML boom / CV boom (everyone claims that Andrew Ng is their teacher…), I can safely say that a significant number of university students around the world wants to get a job / internship on that topic.
However, much of these hypes aren’t as democratic at this stage. Only the top 1% can get anything reasonable because of a confluence of intelligence, hard work, and also being at the right place at the right time. Probably 80% of the students who ask for a data science job or internship is going to end up realising that:
- They are told to clean data instead of deriving any kind of meaningful insight
- They are recruited as data scientist but end up doing regular engineering work
- They are convinced that they don’t have the qualification to be hired as a DSE
- They are ignored, despite having great skills on related matters
The truth is, most organisations still don’t know how to hire data scientist, despite desperately knowing that they need to to stay in the game. If your interviewer can’t figure out how you will fit into the organization, and figure out the ROI of having you on the team, you chances of getting a job will significantly drop.
Writing as a hiring manager, here are three tips on how you would get yourself setup for a data science job (applies primarily to Singapore-based fresh grads or undergraduates looking for internship). YMMV.
TIP 1. Don’t look for a data science job: Look for a job with data
If you’re not sure if you can get a data science job, you probably can’t. Local universities have recently geared up to teach data science, but anyone who had done any real life work on data science will realise how much “gut feeling” is required through experience, compared to most regular engineering kind of work.
This is not ideal situation of course, but data science is just a new name for statistics and now with a lot of computers. Here’s the definition on statistics:
the practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
Big data / data science is different because the sample can now be EVERYTHING (as we have a lot of compute/storage/network etc), which changes the way we analyse the data.
So why can’t universities “teach” data science? Because the student can’t spend an extraordinary amount of time with one set of data within a semester, to try and fail multiple times, and juggle multiple subjects. By giving the student a sample data and a sample method only teaches the student rule of thumbs that might not work when confronted with a live torrent of data from production.
So if you’re keen on getting your hands dirty, look for jobs that deals with lots of data. Data engineer (business data like your Grab rides), DevOps (backup and store logs), robotic systems engineering (sit on trillion of lines of telemetry), GIS (point clouds, models), and so on. Get stronger in your capability to solve an existing business problem first, and find that job.
Once you’re in, without the other subjects in school to context switch around, you can then spend every waking hour of your life on that data you’re managing, and start applying all the data science techniques you learn in school. I guarantee you, when you demonstrate your first insights into the data to the company, they will straight away promote you into the special task force that’s setup just to figure out how to get data science into their organisation.
TIP 2. Pick one area in data science, and be the best in that one area
Data science is a humungous field. There are so many ways to slice it: class of problems (is it typical regression, clustering, or thresholding for anomalies type problem), data type (computer vision type that deals with image/video, or machine data e.g. telemetry, computer logs, or human behavioural type like clicks, movement, footfall etc.), domain (is it micro-economics, agriculture, astronomy, security, or even internal business processes), and perhaps more.
By picking just one area and spending an extraordinary amount of time and effort on it, you at least stand a chance to be very good at it. Once you’re there, engage your interviewer with this method alone to demonstrate your capability of diving in. If you can dive in this deep for your area of choice, you can probably do the same for the dataset of your employer.
Unfortunately, many students I interviewed, who said that they spent 6 months in a lab with a team of PhDs candidates helping them clean their data to feed into their proposed model, could not explain how that model works, even at the superficial level. Not only were they deprived of the chance to try different inputs (as those were dictated by the graduate students), they simple followed instructions without questioning the validity of their method.
Here’s a corollary: The Deep Learning class of problems today is actually pretty ‘shallow’. Unless you’re publishing papers on ICCV, what you’re probably doing is downloading a bunch of code from Github, fitting your source data to what you found on Github, and viola you get your object detection going, to the delight of your marketing colleagues.
So if you like the Deep Learning class of problems, why not get good at it? Understand all the choices you have, pros and cons, transform your data differently, massage the parameters, until you understand how each of these algorithms behave given the dataset your like, before you interview for a job? You don’t need to invent a better CNN to get a job – you just need to know how to help your employer use it to solve a problem.
3. TIP 3. Demonstrate don’t just talk
Another missing skillset for data scientists in general is the ability demonstrate their work. There’re three levels of demonstration:
- Demonstrating that you know how to do it
- Demonstrating that it works
- Demonstrating that you know when it will not work
Let’s compare this with an applicant applying to work as a web developer. If he claims that he has built a website before, he could:
- Show his code (potentially sharing it online)
- Show the website working (also potentially making it part of his web portfolio page)
- Show how it isn’t complete (eg. explaining that it could have TLS, better protection against XSS, etc.)
I’ve interviewed 150 applicants in the past 3 months, close to half of them asked for a data science job. NONE have showed me any piece of their work in data science. A handful claim that they have a high Kaggle ranking, and yet demonstrated little understanding of how they got there.
It would have been much easier if all data science applicants come prepared with a clearly defined problem statement, a sample of the input data, a quick explanation of the pipeline, and some sample results. By having an actual pipeline, we can discuss some what-if scenario to test the understanding of the model, as well as put into perspective how the pipeline might be related to our business. Use results to explain why you went with supervised learning or unsupervised learning, what cleansing you did to the dataset that you were embarrassed about, how you reframed the problem statement given to you after seeing the result, and so on.
Similarly, those stronger in ML infrastructure can also discuss the performance of various cloud AI services, trials and tribulations of rolling your own TensorFlow, or shopping for GPUs with limited budget. Those stronger in math can also discuss the choice of K in K-means, the impact of certain precision and recall values, or defend your formula in reframing continuous variables into categorical variables.
The bottom line is, you can’t attend a data science interview by regurgitating your textbook. Surprised?
Epilogue: It’s not hard
If you’re there yet, don’t give up. If you really want a job in data science, you just need to do a bit more homework, that’s all.
In God we trust, all others bring data. Good Luck!
A Day in the Life of a Data Scientist (Coursera Blog)
What Data Scientist Really Do (HBR Blog)
Data Science Engineer at Garuda Robotics (GR Careers)