pythonrpandasdplyr

Equivalent for R / dplyr's glimpse() function in Python for Panda dataframes?


I find the glimpse function very useful in R/dplyr. But as someone who is used to R and is working with Python now, I haven't found something as useful for Panda dataframes.

In Python, I've tried things like .describe() and .info() and .head() but none of these give me the useful snapshot which R's glimpse() gives us.

Nice features which I'm quite accustomed to having in glimpse() include:

Here is some simple code you could work it with:

R

library(dplyr)

test <- data.frame(column_one = c("A", "B", "C", "D"),
           column_two = c(1:4))

glimpse(test)

# The output is as follows

Rows: 4
Columns: 2
$ column_one <chr> "A", "B", "C", "D"
$ column_two <int> 1, 2, 3, 4

Python

import pandas as pd

test = pd.DataFrame({'column_one':['A', 'B', 'C', 'D'],
                     'column_two':[1, 2, 3, 4]})

Is there a single function for Python which mirrors these capabilities closely (not multiple and not partly)? If not, how would you create a function that does the job precisely?


Solution

  • Here is one way to do it:

    def glimpse(df):
        print(f"Rows: {df.shape[0]}")
        print(f"Columns: {df.shape[1]}")
        for col in df.columns:
            print(f"$ {col} <{df[col].dtype}> {df[col].head().values}")
    

    Then:

    import pandas as pd
    
    df = pd.DataFrame(
        {"column_one": ["A", "B", "C", "D"], "column_two": [1, 2, 3, 4]}
    )
    
    glimpse(df)
    
    # Output
    Rows: 4
    Columns: 2
    $ column_one <object> ['A' 'B' 'C' 'D']
    $ column_two <int64> [1 2 3 4]