Decoding the Popularity of Jupyter Among Data Scientists

by May 15, 2020


Every data scientist is aware of how resourceful Jupyter is for them. Born out IPython in 2014, it is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text, and multimedia resources in a single document. Thanks to its multi-language, flexible, and highly interactive platform, Jupyter has seen itself rise as one of the most influential and popular computational notebooks in recent years. It draws its name from the programming languages Julia (Ju), Python (Py), and R.

A computational notebook is like an environment where users can execute code, observe results, modify and repeat them in the form of an iterative in a kind of iterative discussion between researcher and data. In simpler words, these are like laboratory notebooks for scientific computing—where lab protocols, researchers embed code, data, and text are documented to support analysis, computed methods, hypotheses and conjectures of users (especially data scientists). Researchers also use notebooks to create tutorials or interactive manuals for their software. Thus it is a versatile tool for data scientists for streamlining end to end data science workflows.

The Jupyter Notebook contains two sections-front end web page, where users feed inputs in the form of program codes and back-end kernel where codes delivered by the browser are run, and then kernel returns the results. Generally, each notebook can run only one kernel and one language, yet workarounds exist. Jupyter can support dozens of programming languages; for instance, one demo notebook can speak Python, Julia, R, and FORTRAN.

However, kernels may not necessarily exist on the user’s computer. These notebooks can work in the cloud and sometimes remote locations too. Google’s Colaboratory project offers a Google-themed front-end to the Jupyter notebook. It authorizes users to collaborate and run code that exploits Google’s cloud resources such as GPUs and save work on Google Drive. Jupyter can share documents that contain live code, equations, visualizations, and narrative text. Also, it has a Dashboard (Notebook Dashboard) and a control panel, displaying local files and allowing you to open notebook documents or shut down their kernels. Jupyter Notebook can be installed by either using Anaconda Distribution or via the PIP package manager. The key advantages of Jupyter are, automating caching of running cells, in-line printing of output, and language and platform laxity.

There are a few tools that can augment the usability of Jupyter for data scientists. One is JupyterHub, a service that lets institutions to provide Jupyter notebooks to large pools of users. And other is Binder, an open-source service that permits users to use Jupyter notebooks on GitHub in a web browser without having to install the software or any programming libraries.

In March, Jupyter announced the first public release of the Jupyter visual debugger. Using this, users can set breakpoints in notebook cells and source files, inspect variables, navigate the call stack, and can do much more. Recently, IBM has announced a new open-source, Elyra AI Toolkit that extends the JupyterLab user interface to simplify the development of data science and AI models. Elyra has hybrid runtime support, visual editor for building Notebook-based AI pipelines, and Python script execution capabilities that allow users to edit codes locally and seamlessly integrate it with the cloud.