Is Data Science Right for You?
Unless you’ve been living under a rock the last few years, you’ve probably heard a lot about data science, machine learning, AI, etc. There are many roles under the data science umbrella — data analyst, machine learning engineer, research scientist — and, while the definitions differ greatly between companies, they seem to be in high demand everywhere. If you haven’t already, you’re probably considering whether now is the time to learn more about data science. Can you realistically shift your career in that direction? How would you do it? Where would it lead you?
As an engineer who pivoted into the data sciences before machine learning was a household name, I’ve had these kinds of conversations with many engineers over the past few years, especially at AlphaSights where we prioritize learning and applying new skills. I decided to answer some of the most frequently asked questions (and some that should be asked).
This isn’t a typical “how to get started in data science” post that tells you how fun and easy data science is. There’s no shortage of those. Data science isn’t right for everyone; after exploring it for a while, many engineers find that they enjoy traditional engineering more. The purpose of this post is to discuss the side of data science that few talk about, but is important to know if you’re considering shifting your career in that direction.
Should I become a data scientist?
This is the question everyone should ponder before digging too deep into any topic in data science. Many publications tend to suggest there’s an element of magic in data science, particularly in machine learning. For engineers, there’s a clear appeal to this idea. Most of us still remember the feeling of awe when we first used a mobile app, Facebook, or, for us older folks, the internet or a computer. Building our first applications provided a sense of wonder, accomplishment, and hunger for more. Now, after doing it for years, it’s lost some of its luster. Who wouldn’t want that excitement again, especially in a new domain that can seemingly predict the future and disrupt virtually any industry?
It’s very exciting when you build your first working model. Seeing your work enable capabilities that weren’t previously possible is exhilarating. However, there are a couple of elements to the data scientist role that the hype ignores.
First, while data scientist roles differ greatly between organizations, the majority of a data scientist’s time is usually spent cleaning and shoveling data. Managing and manipulating data for 80–90% of your day requires a considerable amount of patience and certainly doesn’t provide the same excitement as building or iterating on a model.
Second, the workflow of a data scientist is very different than that of a software engineer. The roadmap is clear in the typical lifecycle of an engineering project. The engineer feels ownership over the product being developed and continuously iterates on it to make it more scalable, resilient, maintainable, etc. In data science, particularly in more research-oriented positions, results are far less predictable. There are a lot of failed experiments and it’s hard to know how long something will take to build or even if it’ll work. This unpredictability leads to a different mindset —
While engineers optimize for depth (quality of build and speed of completion), data scientists optimize for breadth (speed of failure).
The mental shift is challenging for a lot of engineers, and those who can’t adapt often fail.
But won’t my skills become obsolete otherwise?
No. Engineers who are good at architecture, infrastructure, front end, data, etc. are in high demand and will continue to be in the foreseeable future. You can (and often should) build a software product without a data scientist. You can’t do so without an engineer.
Traditional engineering domains are also better established and more advanced, so the engineering skills you build today are more likely to be useful in 5 years than anything you will learn in data science.
That being said, machine learning is becoming an important part of the tech landscape. You’ll likely be working alongside data scientists in your career so even if you decide that data science isn’t for you, it’s worth understanding the high-level concepts.
You haven’t scared me off yet! How do I learn?
The good news is that there are a lot of resources on this topic. The bad news is that there are A LOT of resources on this topic. Too many. A lot of them are too narrowly or broadly focused and it’s hard to tell what’s worth actually dedicating time to. It’s best to find someone to help you wade through these waters. I suggest asking people in the field which resources they found most helpful.
At AlphaSights, it’s the head of data science’s responsibility to provide guidance to anyone in the organization that’s interested in learning more about the subject. Here’s a list I’ve curated with an engineer’s background in mind:
- Gilbert Strang’s Linear Algebra course: Videos and notes from one of the best traditional linear algebra courses
- Coding the Matrix: Practical engineering view on linear algebra. Textbook from a Brown course (no videos); a shortened version of the course can be found on coursera
- Essence of calculus: An engaging visual explanation of calculus fundamentals
- MIT’s multivariable calculus: A great traditional course to get up to speed on calculus
Probability and Statistics:
- Foundations of Data Analysis: Statistics (and some probability) with a focus on coding
- Introduction to Probability and Statistics: Traditional lecture format course
- Information Theory, Pattern Recognition, and Neural Networks: The first half of this course is a good overview of information theory
- Visual Information Theory: Blog post that provides a quick introduction to many information theory concepts
- Machine learning coursera course: Good introductory overview of a broad swath of ML concepts
- Deep learning for NLP: More in-depth understanding of NLP and neural networks, focused on supervised prediction
- Hacker’s guide to Neural Networks: A long blog post that presents a less mathy approach to neural networks
- Kaggle: A large set of data science competitions with real-world data and other resources to work with
A full (traditional) Linear Algebra class? 50 hours of ML class just for the basics? Is this really necessary?
Think of this as learning a new domain in mathematics, followed by learning a new domain in computer science. Coursework is instrumental for understanding and retaining the fundamentals for a broad swath of this field. These won’t teach you about modern frameworks, the typical flow of a data science project, or the things you really need to be successful, but they’re a good starting point.
If you want to go beyond the fundamentals (which you will), expect to take on a lot more — papers, books, courses, conferences, etc. Being a data scientist requires constantly learning new things. Similar to an engineer, only much more so.
Ok, I took the courses. Now what? How do I get some experience?
There are a number of approaches to gain experience and different people will advise different methods. Many people will tell you to try Kaggle competitions. I suggest avoiding this route. Kaggle competitions are a great way to get some practice building and tuning models. However, as mentioned above, this is a small part of a data scientist’s day-to-day work. As a hiring manager, I see a ton of resumes that list Kaggle competitions as part of a candidate’s experience. I generally ignore these. They tell me nothing about a candidate’s ability to execute — there’s no critical thinking about feasibility, no data munging, no integration with other systems, no peer reviews, etc.
A better alternative is to find a project at your current job and a mentor invested in your success. This is by far your best bet. It allows you to gain some hands-on experience dealing with the issues of real-world data science projects, with an experienced person to make sure you don’t go too far off the rails.
If there aren’t any opportunities at your current job, come up with your own project. There’s a ton of public data available and, unlike Kaggle competitions, you’ll likely still have to go through most of the steps and struggles of a typical data science project lifecycle. Be sure to build it all the way through and have measurable, demonstrable results. A personal project that includes a demo is far more impressive than one without.
I have an idea for a project. How do I convince my manager to let me work on this at work?
You don’t. At least not at first. Some companies provide hackathons (at AlphaSights we have sandbox days every sprint) that you can take advantage of. Otherwise, you should expect to spend your own time on this until you have a working POC to pitch to show that it is worthwhile for the company to invest in the project. If all goes well, you can start transitioning some of your time at work to data science.
This seems like it will take forever! Is this really the only way to do it?
Of course not — every journey into data science is unique. These steps don’t have to happen serially either (usually they don’t). A common approach is to start by ML “black boxing” — training and integrating a model as a block box, then iteratively improving the model as you get a deeper understanding of how it works and what approaches may work better.
Moving from engineer to data scientist isn’t the same as moving from full-stack to backend engineer. It’s a career change. You’ll be competing against people who studied data science in school, often for advanced degrees. It’ll require significant investment on your end. But if it’s right for you and you’re willing to put in the time, it’s worth the effort. Data scientists with engineering backgrounds tend to have a more well-rounded skill set, making them more adept at completing projects from end to end. More importantly, the work is challenging and rewarding. It’s no wonder Glassdoor’s annual “Best Jobs in America” survey has ranked data scientists as the #1 job three years in a row.
Manor Lev-Tov joined AlphaSights in May of 2018 and serves as Vice President of Data Science on our Software Engineering team.