How much data science do you actually remember?

How many data science books have you read? 5? 10? A few dozen?

How many free online courses have you taken? A few?

How many blog posts have you read? (I’d be willing to bet: you’ve read dozens.)

If you’re like most budding data scientists, you’ve probably consumed a lot of material. You probably even learned some of it.

The problem though, is that the vast majority of people learn but then quickly forget.

There’s a difference between learning and remembering

Here’s an example:

library(ggplot2)
ggplot(data = diamonds, aes(x  = carat, y = price)) +
    geom_point(color = "dark red") +
    labs(title = "Diamond weight vs price") +
    theme(plot.title = element_text(family = "Verdana", size = 20)) 

If you’re like most aspiring data scientists, you’ll try to learn this code by using the copy-and-paste method. You’ll take this code from a blog post like this, copy it into RStudio and run it.

Most aspiring data scientists do the exact same thing with online courses. They’ll watch a few videos, open the course’s sample code, and then copy-and-paste the code.

Watching videos, reading books, and copy-and-pasting code do help you learn, at least a little. If you watch a video about ggplot2, you’ll probably learn how it works pretty quickly. And if you copy-and-paste some ggplot2 code, you’ll probably learn a little bit about how the code works.

Here’s the problem: if you learn code like this, you’ll probably forget it within a day or two.

This is critical: there’s a big difference between learning and remembering. Learning is actually the easy part.

The hard part is remembering.

To master data science, you need to remember

True mastery requires remembering.

It’s not just enough to “learn.” If you learn some R syntax (or a data science concept) today, it’s effectively worthless if you forget it by next week.

Then what? If you forget what you learn, what will happen in an interview? What will happen in a fast-paced job?

You need to remember data science to get a job

Let’s say that you’re trying to get a data science job. You take a few online courses, buy a few books, and copy-and-paste some code.

Maybe you get a few certificates. You proudly put them on your resume.

And you diligently send out resumes for data science jobs.

Eventually, you get an interview.

When you walk into that interview, you had better know your stuff. You need to really, really know it. Remember it.

Any good employer is going to test you. Great companies will grill you in an interview. They’ll ask you questions. They’ll ask you to write sample code.

You had better remember …

What happens if you forget? What if, when they ask you to make a simple scatterplot in R, you forget? What if you can’t do it?

Will you tell them “I don’t remember how to write the code, but I have a data-science certificate from an online learning course?”

If you can’t write the code fluently, on command, no good company will hire you.

I can’t emphasize this enough: if you want to get a good data science job, you need to really know your stuff. You need to remember how to write the code, from memory, on command. That is how you ace an interview. You show them, in person, that you can write R code with your eyes closed. You show them that you can do the work. That you can get things done.

As long as you’re a decent person and don’t have any major character flaws, your mastery of R and data science concepts will help you nail the interview and get the job.

You need to remember data science to get things done

Ok. Let’s say that you do get a data job.

The company that hired you is “fast-paced.” I’ll point out here that “fast-paced” is a code word for an environment where there is a lot of work, and a lot of pressure to perform. Keep in mind that these days, almost all companies call themselves “fast-paced.” Almost all companies expect people to “do more with less.” That’s just the way that things are right now … most good companies operate lean, and they expect you to perform.

What happens when you get thrown into a demanding data science job? Will you be able to handle it? Will you be able to perform?

I know a few people who, either through luck or guile, have obtained good data jobs at good companies, even though they aren’t good data scientists. These are the people who haven’t even mastered the basics. You give them a task, and they don’t really know how to do it. You ask them to write some code, and they have to search for how to do it on Google.

Now, I want to be clear: every data scientist needs to look things up once and a while. If you’re working on a hard project and using advanced procedures, you’ll need to consult reference materials sometimes. That’s pretty normal.

I’m talking about guys that have to look up the basics. They are “copy and paste” coders.

These people are always the underperformers of the team. They struggle to get their work done on time and at a high level of quality. They struggle to remember how to get things done: how to write the code; how to apply the data science concept.

The problem is that they simply don’t remember the syntax. They don’t remember the concepts and principles. They don’t remember the critical information.

Because they can’t remember the things they need to know, these people are always struggling to perform, especially under pressure. They produce code that’s low quality. They work slowly. They ultimately struggle to pull their own weight, and other people need to do more work because of them.

If you get a job as a data scientist, will you be an underperformer? The guy who doesn’t pull his own weight? One of the data scientists who struggles to get the work done? Who’s always stressed?

Or, do you want to be an elite performer? The go-to person that always gets the job done, quickly and effectively (and is rewarded accordingly).

When learning data science, focus on long-term memory

It’s not enough to just learn. You need to learn and remember in the long run. You need to learn and remember in order to master the material, get the job, and become a top performer.

So, it doesn’t matter how many data science courses you take. It doesn’t matter how many data science books you buy and read. What matters, at the end of the day, is how much you remember. How much you can do “on command.”

You need to “learn how to learn”

To get to a level where you learn and remember data science, you need to “learn how to learn.”

Let me say that again: you need to learn how to learn.

Learning is a skill. If you want to master data science, you need to know how to learn, so that you can learn data science efficiently and effectively. You need to be able to learn and not forget.

Ironically, while people in the tech industry are ecstatic about machine learning, no one is paying any attention to human learning. I won’t explain my complete thoughts on human learning in this blog post (I’ll eventually write a separate blog post), but I think that human learning is one of the most important subjects of this century.

In any case, if you want to master data science quickly, you need to learn how to learn.

You need to learn “how to practice”

Part of learning “how to learn”, is understanding “how to practice.”

Data science has multiple facets. It has a knowledge base that you need to know (a set of concepts that you need to understand). But data science is also a set of skills.

As I’ve already emphasized, you need be a great data scientist, you need to be able to write code.

Learning to write code is quite a bit like learning to play piano or guitar. It’s a skill. And to master it, you need to practice.

This is one of the biggest gaps in data science education today. No one talks about how to practice.

The best guitar players practice relentlessly. When they first start out, they have drills that they run through in order to learn and master basic techniques. As they progress, they move on to other drills in order to learn and master new techniques. As they continue to develop, many of them supplement their practical, hands-on skill (i.e., playing the notes) with music theory. This further enhances their understanding and their skill.

Data science should be extremely similar. There is certainly a theoretical component (similar to music theory), but in practical terms, writing data science code is a hands-on skill. It’s something that you do, not just something that you know.

To really master data science, you need to learn and remember. But not just learn and remember in your head. Data science is something that you do. You need to “remember with your hands.”

To do that, you need to practice.

And to practice efficiently and effectively, you need to know how to practice.

Discover how to learn, practice, and remember data science

If you want to discover how to learn, practice, and ultimately remember data science, sign up for our email list.

In coming weeks and months, Sharp Sight will be writing extensively on data science learning.

If you sign up to the email newsletter, you’ll learn the tips, hacks, and strategies for rapidly learning data science (so that you never forget).

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

9 thoughts on “How much data science do you actually <i>remember</i>?”

  1. Thanks for posting this article. I enjoyed reading it, and I found it thought-provoking. I actually disagree in some ways. Yes, you need to be fluent in data analysis to the point where you know what your strategy is going to be and what tools you will need to execute on it. But I don’t think it’s a good use of time to memorize, eg, the exact syntax of anything but your most bread-and-butter tools.

    As someone who slogged through a quantitative PhD after years in data engineering, this might sound like blasphemy. But my experience has been that every tool has a shelf life. Every single one. We all used Perl in the early 2000s. A couple years ago, dplyr was nowhere to be found. Even just a few months ago, you used lapply where today you might use purrr.

    Any package you are using, no matter how essential or basic it seems now, is going to get replaced with something easier, more elegant, higher-level. Even the way you think and talk about data analysis is going to evolve. What is not going to change is the need for a cold, clear-eyed way of looking at a problem and building a game plan on the fly.

    If I were interviewing a new data scientist, I’d give them 72 hours to do a challenge problem. I would choose a problem that someone skilled could do in 3-4 hours. Then I would ask them about their thought process on site, and throw a couple of new curveballs at them–to do think about in pseudocode. I’d want to hear how they think.

    Reply
    • Very well articulated David. Completely subscribe to what you said. I also believe that Data Science is all about a thinking process, on how you want to solve the problem at hand. R and Python are only means to help you execute your thinking process. If you remember all the syntax of R and Python but lack the thinking process which helps in solving the problem, would you be able to create any value ? Absolutely not.

      Reply
  2. Honestly, I think it is sad that this point has to be stressed. There’s a reason statistics is a multi-year syllabus in its own right, and econometrics is it’s own subjects for economists. There may be some subjects one can bullshit his way through, but being a proficient data wrangler is not one of them.

    As a reply to the other comments: Just like a craftsman, one has to be proficient in the tools one uses. The business side clients need results, not concepts.

    Reply
  3. Ouch. This hit home for me. I’ve been in the data industry for a few years, and recently am really trying to make an effort to learn practical machine learning in order to make a move into a data science role. I’m basically the person you describe in the lead-in, not sure if I’m an underperformer, but gosh, this article is really making me think about how I’m learning.

    Example, I’ve got Programming Collective Intelligence right now and have just started walking through it. All I’ve done thus far is copy/paste code and run it, then when it came to the exercises I sort of half-assed them. Going to resolve myself to go back and really apply what I learned in order to remember. Thanks for the shot in the arm.

    Reply
    • Good to hear that you liked it.

      Remembering syntax is the first step to becoming “fluent” in a programing language.

      Once you have the syntax memorized, writing code to do real work becomes a lot easier. Productivity increases dramatically.

      Reply

Leave a Reply to Thomas Joseph Cancel reply