SkillRary

Please login to post comment

Introduction to Jupyter Notebook

  • Amruta Bhaskar
  • May 13, 2021
  • 0 comment(s)
  • 1604 Views

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. This article will walk you through how to use Jupyter Notebooks for data science projects and how to set it up on your local machine.

First, though: what is a “notebook”?

A notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media. In other words: it's a single document where you can run code, display the output, and also add explanations, formulas, charts, and make your work more transparent, understandable, repeatable, and shareable.

Using Notebooks is now a major part of the data science workflow at companies across the globe. If your goal is to work with data, using a Notebook will speed up your workflow and make it easier to communicate and share your results.

Best of all, as part of the open source Project Jupyter, Jupyter Notebooks are completely free. You can download the software on its own, or as part of the Anaconda data science toolkit.

Although it is possible to use many different programming languages in Jupyter Notebooks, this article will focus on Python, as it is the most common use case.

Installation

The easiest way for a beginner to get started with Jupyter Notebooks is by installing Anaconda.

Anaconda is the most widely used Python distribution for data science and comes pre-loaded with all the most popular libraries and tools.

Some of the biggest Python libraries included in Anaconda include NumPy, pandas, and Matplotlib, though the full 1000+ list is exhaustive.

Anaconda thus lets us hit the ground running with a fully stocked data science workshop without the hassle of managing countless installations or worrying about dependencies and OS-specific (read: Windows-specific) installation issues.

To get Anaconda, simply:

  • Download the latest version of Anaconda for Python 3.8.
  • Install Anaconda by following the instructions on the download page and/or in the executable.

Originally developed for data science applications written in Python, R, and Julia, Jupyter Notebook is useful in all kinds of ways for all kinds of projects:

  • Data visualizations. Most people have their first exposure to Jupyter Notebook by way of a data visualization, a shared notebook that includes a rendering of some data set as a graphic. Jupyter Notebook lets you author visualizations, but also share them and allow interactive changes to the shared code and data set.
  • Code sharing. Cloud services like GitHub and Pastebin provide ways to share code, but they’re largely non-interactive. With a Jupyter Notebook, you can view code, execute it, and display the results directly in your web browser.
  • Live interactions with code. Jupyter Notebook code isn’t static; it can be edited and re-run incrementally in real time, with feedback provided directly in the browser. Notebooks can also embed user controls (e.g., sliders or text input fields) that can be used as input sources for code.
  • Documenting code samples. If you have a piece of code and you want to explain line-by-line how it works, with live feedback all along the way, you could embed it in a Jupyter Notebook. Best of all, the code will remain fully functional—you can add interactivity along with the explanation, showing and telling at the same time.

Jupyter Notebooks can include several kinds of ingredients, each organized into discrete blocks:

  • Text and HTML. Plain text, or text annotated in the Markdown syntax to generate HTML, can be inserted into the document at any point. CSS styling can also be included inline or added to the template used to generate the notebook.
  • Code and output. The code in Jupyter Notebook notebooks is typically Python code, although you may add support in your Jupyter environment for other languages such as R or Julia. The results of executed code appear immediately after the code blocks, and the code blocks can be executed and re-executed in any order you like, as often as you like.
  • Visualizations. Graphics and charts can be generated from code, by way of modules like Matplotlib, Plotly, or Bokeh. Like output, these visualizations appear inline next to the code that generates them. However, code can also be configured to write them out to external files if needed.
  • Multimedia. Because Jupyter Notebook is built on web technology, it can display all the types of multimedia supported in a web page. You can include them in a notebook as HTML elements, or you can generate them programmatically by way of the IPython.display module.
  • Data. Data can be provided in a separate file alongside the .ipynb file that constitutes a Jupyter Notebook notebook, or it can be imported programmatically—for instance, by including code in the notebook to download the data from a public Internet repository or to access it via a database connection.
Please login to post comment

( 0 ) comment(s)