PyCharm’s Interactive Tables for Data Science : Stanislav Garkusha

PyCharm’s Interactive Tables for Data Science
by: Stanislav Garkusha
blow post content copied from  PyCharm : The Python IDE for data science and web development | The JetBrains Blog
click here to view original post


Data cleaning, exploration, and visualization are some of the most time-consuming tasks for data scientists. Nearly 50% of data specialists dedicate 30% or more of their time to data preparation. The pandas and Polars libraries are widely used for these purposes, each offering unique advantages. PyCharm supports both libraries, enabling users to efficiently explore, clean, and visualize data, even with large datasets.

In this blog post, you’ll discover how PyCharm’s interactive tables can enhance your productivity when working with either Polars or pandas. You will also learn how to perform many different data exploration tasks without writing any code and how to use JetBrains AI Assistant for data analysis.

Getting started 

To start using pandas for data analysis, import the library and load data from a file using pd.read_csv(“FileName”), or drag and drop a CSV file into a Jupyter notebook. If you’re using Polars, import the library and use pl.read_csv(“FileName/path to the file”) to load data into a DataFrame. Then, print the dataset just by using the name of the variable.

PyCharm’s interactive tables – key features and uses

Browse, sort, and view datasets

Interactive tables offer a wide range of features that allow you to easily explore your data. For example, you can navigate through your data with infinite horizontal and vertical scrolling, use single and multiple column sorting, and many other features.

This feature allows you to sort columns alphabetically or maintain the existing column order. You can also find specific columns by typing the column name in the Column List menu. Through the context menu or Column List, you can selectively hide or display columns. For deeper analysis, you can hide all but the essential columns or use the Hide Other Columns option to focus on a single column.

Finally, you can open your dataframe in a separate window for even more in-depth analysis.

Explore your data 

You can easily understand data types directly from column headers. For example, is used for a data type object, while indicates numeric data.

Data Types

Additionally, you can access descriptive statistics by hovering over column headers in Compact mode or view them directly in Detailed mode, where distribution histograms are also available.

Create code-free data visualizations

Interactive tables also offer several features available in the Chart view section.

  • No-code chart creation, allowing you to visualize data effortlessly.
Graphs comparison
  • Ability to save your charts with one click.

Use AI Assistant for data analysis and visualization

You can access the AI Assistant in the upper-left corner of the tables for the following purposes:

  • To access insights about your data quickly with AI Assistant.
  • Use AI Assistant to visualize your data.

Using interactive tables for reliable Exploratory Data Analysis (EDA)

Why is EDA important? 

Exploratory Data Analysis (EDA) is a crucial step in data science, as it allows data scientists to understand the underlying structure and patterns within a dataset before applying any modeling techniques. EDA helps you identify anomalies, detect outliers, and uncover relationships among variables – all of which are essential for making informed decisions.

Interactive tables offer many features that allow you to explore your data faster and get reliable results.

Spotting statistics, patterns, and outliers 

Viewing the dataset information

Let’s look at a real-life example of how the tables could boost the productivity of your EDA. For this example, we will use the Bengaluru House Dataset. Normally we start with an overview of our data. This includes just viewing it to understand the size of the dataset, data types of the columns, and so on. While you can certainly do this with the help of code, using interactive tables allows you to get this data without code. So, in our example, the size of the dataset is 13,320 rows and 9 columns, as you can see in the table header.

Rows and columns information

Our dataset also contains different data types, including numeric and string data. This means we can use different techniques for working with data, including correlation analysis and others.

Data types

And of course you can take a look at the data with the help of infinite scrolling and other features we mentioned above.

Performing statistical analysis

After getting acquainted with the data, the next step might be more in-depth analysis of the statistics. PyCharm provides a lot of important information about the columns in the table headers, including missing data, mode, mean, median, and so on.
For example, here we see that many columns have missing data. In the “bath” column, we obviously have an outlier, as the max value significantly exceeds the 95th percentile.

Additionally, data type mismatches, such as “total_sqft” not being a float or integer, indicate inconsistencies that could impact data processing and analysis.

After sorting, we notice one possible reason for the problem: the use of text values in data and ranges instead of normal numerical values.

Analyzing the data using AI

Additionally, if our dataset doesn’t have hundreds of columns, we can use the help of AI Assistant and ask it to explain the DataFrame. From there, we can prompt it with any important questions, such as “What data problems in the dataset should be addressed and how?”

AI Assistant

Visualizing data with built-in charting

In some cases, data visualization can help you understand your data. PyCharm interactive tables provide two options for that. The first is Chart View and the second is Generate Visualizations in Chat

Let’s say my hypothesis is that the price of a house should be correlated with its total floor area. In other words, the bigger a house is, the more expensive it should be. In this case, I can use a scatter plot in Chart View and discover that my hypothesis is likely correct.

Wrapping up

PyCharm Professional’s interactive tables offer numerous benefits that significantly boost your productivity in data exploration and data cleaning. The tables allow you to work with the most popular data science library, pandas, and the fast-growing framework Polars, without writing any code. This is because the tables provide features like browsing, sorting, and viewing datasets; code-free visualizations; and AI-assisted insights.

Interactive tables in PyCharm not only save your time but also reduce the complexity of data manipulation tasks, allowing you to focus on deriving meaningful insights instead of writing boilerplate code for basic tasks.

Download PyCharm Professional and get an extended 60-day trial by using the promo code “PyCharmNotebooks”. The free subscription is available for individual users only.

For more information on interactive tables in PyCharm, check out our related blogs, guides, and documentation:


October 01, 2024 at 08:02PM
Click here for more details...

=============================
The original post is available in PyCharm : The Python IDE for data science and web development | The JetBrains Blog by Stanislav Garkusha
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================

Salesforce