To do this, pandas has two DataFrame/Series methods,
tail, that allow you to see the first/last few rows (5 if not specified).
If you want to look at a few rows at random, you can use the
It may also be used when working with a large dataset: you can first use
sample to generate a random subset, develop your code on that, then apply your code to the full dataset.
There is also a method called
info that I absolutely adore.
df.info() gives you information on the DataFrame’s shape, indices, column names, number of non-null values in each column, and the data types.
Here is an example:
Note: Do not confuse this with the
info attribute (
To wrap up the EDA series, here is the customized
eda function I use whenever working on a new dataset:
Note that I’ve made each part (
duplicated) modular so you can toggle anyone of them on or off.
And here is an example of the output:
That’s it! Hope you’ve enjoyed the EDA series!