The most important thing for getting your first data science job

Right now (in 2018) the average salary for a data scientist is over $120,000 per year, according to Glassdoor.

 


Source: Glassdoor

 

Considering that the median household income in the US is about $55,000 per year, the average salary for a data scientist represents a pretty large premium. Granted, you won’t start out at $120,000, but the earning potential is very high. Learning data science is extremely valuable.

The question is, how do you go from being a beginner to getting your first data science job?

The answer is a little nuanced, but at a high level, I can give you some advice that I think will serve you well. There’s one thing that you absolutely must have ….

A necessary condition for breaking into the industry.

It’s pretty straightforward actually:

Be good at the f*cking job.

Be good at the f*cking job

Like I said, the full answer is a little more nuanced than this, but at a high level you actually need to be good at the job. You need to be skilled. Skill is the necessary condition for getting a coveted data science job.

This shouldn’t surprise people, but there are plenty of people who act in violation of this idea.

Many students take courses and collect certificates, thinking that listing courses on a resume will get them a job.

This is not correct.

Although I absolutely believe that good data science courses are very valuable, they are only valuable insofar as they give you the right skills. If a course doesn’t help you achieve a high level of skill, then it’s not worth much at all to you.

So it doesn’t matter how many years you’ve studied. It doesn’t matter if you have a stack of certificates. It doesn’t matter that you’re really excited about the field. And it doesn’t matter if you’ll work for free. (Lots of data science hopefuls offer to work for free.) None of those things will reliably help you get a data science job.

The thing that everything hinges on is skill. Skill is the necessary condition for getting a data science job.

What skills you need

The question then is, what skills do you need?

This is where some people get stuck. There’s a wide range of skills in the data world; some are useful and some are not.

As a beginner, you need to be selective. It’s easy to get “lost in the weeds, ” where you start learning things that don’t matter or that you won’t use.

For example, there are over 10,000 R packages. You don’t need to know anywhere close to all of them. As a beginner, you need to focus your time.

More broadly, there are a lot of skills that have a cursory relationship to data science, but not all of these things will actually be useful in a practical data science role.

For example, if you look on Quora for answers about “what to study” to become a data scientist, you’ll find lists of dozens of books and courses that they claim you need to take. One answer on Quora suggested 63 books, courses, and resources that you “need” to study.

63.

Bullshit. You don’t have time for that.

As a data science hopeful, you need to be selective. You need to select the highest value skills that will yield the greatest rewards. You need to work relentlessly to master the skills with the highest return on investment.

Forget learning all 10,000 R packages. Forget studying 63 textbooks and resources before you get your first job.

To get a job as a junior data scientist, you can narrow things down to a small set of skills. There is a minimum skill set that you need. A “minimum viable skillset.”

This minimum skill set includes the fundamental skills that I write about often here at Sharp Sight:

  • Data visualization
  • Data manipulation
  • Data analysis

These skills are the fundamental skills that you absolutely must know cold.

These skills form the core because they are the essential skills for creating deliverables as a junior data scientist. As a junior member of a data type, you’ll typically be responsible for things like:

  • reports
  • analyses
  • data cleaning tasks for more senior team members (I.e., you’ll be a data monkey)

I’ve seen a couple dozen data teams over my career, and 90% of the time, junior members of a data team were restricted to these tasks.

Even in cases where a junior data scientist has higher skill levels and could work on more advanced projects, it’s been my experience that in many cases they will still be working on reports, analyses, and data management. In many organizations, there’s a greater need for reports and analyses than there is a need for advanced machine learning systems.

That being the case, you need to focus relentlessly on learning data visualization, data manipulation, and data analysis. These are the skills that underpin deliverables like reports and analyses. They will form the foundation of your work as a junior data scientist. They will also form the foundation for learning more advanced skills later.

Don’t focus too much on advanced skills

Many people ask if they need to know intermediate to advanced skills to get a job as a junior data scientist.

By “intermediate to advanced skills,” I’m mostly referring to machine learning, but this could also include skills like geospatial visualization and time series analysis.

My response is that you shouldn’t worry too much about those skills. I’m fairly convinced that they are not required for truly junior positions.

In the cases where a job ad for a junior role asks for machine learning skills there’s one of several things going on:

  1. It’s not actually a junior role
  2. The person who wrote the ad doesn’t know what they’re talking about
  3. The person who wrote the ad included more advanced skills as a “wish list” (even though it’s a junior role)
  4. The company or hiring manager has unrealistic expectations about what a junior team member should do.

In any of those cases though, you shouldn’t really need advanced skills. If someone insists that you do need advanced skills for a junior job, then you should be very cautious. Like I said, some people who are hiring for data teams don’t know what they’re talking about or they have unrealistic expectations. Those are not people that you should want to work for.

Now, there might be exceptions to this. In some departments at truly elite companies, they might reasonably expect a junior data science team member to work on machine learning systems or more advanced projects.

Those cases should be the exception, not the rule.

How much skill you need

Now that you know which skills, the question is how much skill.

This is a little more difficult to talk about precisely, but I can explain this in a way that you should understand.

One of the best ways to explain how much skill you need is by way of analogy.

You should be “fluent” in writing data science code

To me, one of the best analogs for data science skill is language skill.

This is because the core of a data science job is actually about writing code.

… Writing code in a computer language.

As it turns out, learning computer languages is very similar to learning spoken languages. Using computer languages is similar to using spoken languages. Therefore, it’s helpful to make a comparison between computer languages and spoken languages.

Let’s say you wanted to get a job as a journalist (or a writer, of sorts). For the sake of argument, let’s just assume that the position is at an American publication, and the writing would be in English.

To perform this role, you would be expected to have at least basic fluency in English. You would need to be able to write sentences and phrases with a fairly high level of speed and accuracy.

Of course, you might make mistakes or forget a few things from time to time. Even the best writers forget some vocabulary words. The best writers might struggle to remember specific grammatical rules. Or they might struggle to find the best way to express something. That’s all normal in the writing process. But a writer would still be expected to have a strong command of English. A strong enough command to be able to produce deliverables (i.e., articles) reliably and on deadline.

Again, a junior journalist would still make mistakes from time to time. They would still need mentorship from senior colleagues. They would need to learn more and develop their skills over time. But they would absolutely need a base level of fluency to perform well in this role.

I really like using the word fluency because it describes how you should be able to perform in qualitative terms. Fluency implies relatively fast performance. It implies relatively accurate performance. It implies a sort of fluidity, mastery, and ease.

Fluency is the word I like to use when talking about the skill level you need in basic data skills. You need to be “fluent” in writing code for data visualization, data manipulation, and data analysis. You need to be able to write code for these things with relative fluidity, mastery, and ease.

So if you want to get a job as a junior data scientist, I recommend that you become “fluent” in the core skills like data visualization and data manipulation.

Like I mentioned above, these are the skills that will enable you to produce deliverables. The deliverables for a junior data scientist are things like reports and analyses. To create these deliverables reliably and on deadline, you need to have a strong command of the tools to produce them.

Just like a junior journalist, a junior data scientist might make mistakes from time to time. Still, it should be expected that a junior data scientist can write the code to produce deliverables with some level of speed and accuracy.

That is to say, if you want to get a job as a junior data scientist, you need to be relatively “fluent” in writing code. That’s the skill level you should be aiming for.

The specific skills you need in R

Let’s get a little more specific.

Like I mentioned above, you need to be fluent in core skills like data visualization, data manipulation, and data analysis.

If you’re working in R, there are a set of packages that you should learn to be able to do these things.

You should learn:

  • base R
  • ggplot2
  • dplyr
  • tidyr
  • readr
  • lubridate
  • stringr

You should also know how to use the tools from these packages together to analyze data and get things done. That means that you should be able to use the pipe operator to combine these toolkits.

To rapidly achieve high levels of skill, you need practice

So at this stage, you know that skill is critical. You know what skills you need (data visualization, data manipulation). And you know the level of skill that I recommend (fluency).

How do you get there?

You’ve heard me say it before: you need to practice.

Repeat your practice activities

The best performers repeat their practice activities over and over. They repeat their practice activities until they can do them without thinking of them.

You’ll sometimes hear people talk about different levels of skill:

  • unconscious incompetence
  • conscious incompetence
  • conscious competence
  • unconscious competence

At the last stage, you can perform a skill competently without thinking about it. That’s unconscious competence. That’s what you should strive for.

To get to that stage, you need to practice your skill until it is burned so deeply in your mind that you don’t think about it anymore.

Again: repetition is the key here.

Don’t try to do it alone

I’ll repeat what I said at the beginning of the post: if you want to get a data science job, you need to be highly skilled.

To become highly skilled, you need to practice.

I won’t lie. Figuring out how to practice data science is hard. Before I cracked the code on how to practice data science, I tried a lot of things that just didn’t work. I tried practice methods that didn’t yield results. I wasted a lot of time.

If you try to figure everything out on your own, you’re dramatically increasing your cost (in terms of time). You’re also dramatically increasing your probability of failure.

There’s a solution to this: work with people who know how to train you.

Use courses that tell you exactly what to learn and how to practice.

Good education and training systems that can help you achieve rapid progress.

Don’t try to do it alone.

A couple thousand dollars spent on the right training program will be worth 10 times the amount you spend.

Good training systems are worth it, because they can accelerate your progress, save you massive amounts of time, and help you achieve the skill level that you need to get a data science job.

Our data science course will open again soon

Just a heads up…

Our premium data science training course, Starting Data Science, will reopen again next week on July 30.

If you are interested in enrolling, make sure you sign up for our email list. Only people who are on our email list will be notified when the course opens.

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

Leave a Comment