Explore Our Online and In-Person CoursesView The Courses
If you’re looking to become a professional data scientist, you’re going to need to learn at least one programming language. It’s a smart question to ask: Should I learn R or Python? But how do you decide between the two most popular programming languages for data analysis? If you’re interested in learning about their respective strengths and weaknesses, read on!
As a data scientist, you probably want and need to learn Structured Query Language, or SQL. SQL is the de-facto language of relational databases, where most corporate information still resides. But that only gives you the ability to retrieve the data — not to clean it up or run models against it — and that’s where Python and R come in.
A little background on R
R was created by Ross Ihaka and Robert Gentleman — two statisticians from the University of Auckland in New Zealand. It was initially released in 1995 and they launched a stable beta version in 2000. It’s an interpreted language (you don’t need to run it through a compiler before running the code) and has an extremely powerful suite of tools for statistical modeling and graphing.
For programming nerds, R is an implementation of S — a statistical programming language developed in the 1970s at Bell Labs— and it was inspired by Scheme — a variant of Lisp. It’s also extensible, making it easy to call R objects from many other programming languages.
R is free and has become increasingly popular at the expense of traditional commercial statistical packages like SAS and SPSS. Most users write and edit their R code using RStudio, an Integrated Development Environment (IDE) for coding in R.
As a side note: The charts above and below show the relative popularity based on how many GitHub pulls are made per year for that language. They are based on data from GitHut 2.0, created by littleark.
A little background on Python
Python has also been around for a while. It was initially released in 1991 by Guido van Rossum as a general purpose programming language. Like R, it’s also an interpreted language, and has a comprehensive standard library which allows for easy programming of many common tasks without having to install additional libraries. Python has some of the most robust coding libraries there are. They’re also available for free.
For data science, there are a number of extremely powerful Python libraries. There’s NumPy (efficient numerical computations), Pandas (a wide range of tools for data cleaning and analysis), and StatsModels (common statistical methods). You also have TensorFlow, Keras and PyTorch (all libraries for building artificial neural networks - deep learning systems).
These days, many data scientists using Python write and edit their code using Jupyter Notebooks. Jupyter Notebooks allow for the easy creation of documents that are a mix of prose, code, data, and visualizations, making it easy to document your process and for other data scientists to review and replicate your work.
Picking languages for data science
Historically there has been a fairly even split in the data science and data analysis community. R vs Python for data science boils down to a scientist’s background. Typically data scientists with a stronger academic or mathematical data science background preferred R, whereas data scientists who had more of a programming background tended to prefer Python.
The strengths of Python
Compared to R, Python is a general purpose language
Python is a general purpose programming language. It’s great for statistical analysis, but Python code will be the more flexible, capable choice if you want to build a website for sharing your results or a web service to integrate easily with your production systems.
Python is much more popular than R
In the September 2019 Tiobe index of the most popular programming languages, Python is the third most popular programming language (and has grown by over 2% in the last year) in all of computer science and software development, whereas R has dropped over the last year from 18th to 19th place.
R vs Python for deep learning — Python is again more popular
Most serious deep learning projects use either TensorFlow or PyTorch. Both work well with Python, and while there is now an R interface for TensorFlow, much more deep learning work is being done with Python than with R. As deep learning becomes applicable to an increasingly wide range of domains (it started with computer vision, now it’s becoming the default approach for most Natural Language Processing tasks as well) that’s increasingly important.
Python is also extremely popular in big data, artificial intelligence, and machine learning. Lastly, it can also be used as a front end language and in web applications.
Python is more similar to other languages than R is
Conclusion — it’s better to learn Python before you learn R
There are still plenty of jobs where R is required, so if you have the time it doesn’t hurt to learn both, but I’d suggest that these days, Python is becoming the dominant programming language for data scientists and the better first choice to focus on.
Flatiron School covers Python extensively in our Data Science program, our 15-week course that teaches you all the skills you need to start a career in data. Here's how to get into that data science course.
Head of Data Science
Since we opened our doors in 2012, thousands of students have joined Flatiron School to launch new careers in tech.
Find the perfect course for you across our in-person and online programs designed to power your career change.
Connect with students and staff at meetups, lectures, and demos – on campus and online.
Have a question about our programs? Our admissions team is here to help.