Styling complex values in a dataframe

Cees Roele
7 min readOct 26, 2022

Displaying one value styled using a second value in a pandas dataframe

Term Score Matrix: terms coloured with a qualitative colormap; no gradient represented (image by author)

Introduction

Pandas dataframes can be used for a styled presentation of data. In this article I will show how to use one value in a cell to provide the style for a second value. This is relevant for applications like the Term Score Matrix displayed above this article, where the terms are displayed, while the scores serve to define the styling.

Basic styling with pandas dataframes

Based on the minimum and maximum values in a cell, pandas can create a background gradient for cells.

# Create a 5x5 matrix of random numbers between [-0.3,0.7)
m = np.random.rand(5,5) - 0.3
df = pd.DataFrame(m)
df.style.background_gradient()

The result looks like this:

Background gradient on a 5x5 matrix with random numbers (image by author)

Applying CSS to all cells

We can use DataFrame.style.applymap to apply a CSS-styling function to all elements, which updates the HTML representation with the result. This method takes a function as its argument, which processes the value of the cell and returns CSS styles as strings, in the format ‘attribute: value; attribute2: value2; …’ or, if nothing is to be applied to that element, an empty string or None. If any keyword arguments are given to applymap, they are passed to the function argument.

Let’s highlight the negative values in the above matrix.

def style_negative(v, props=''):
return props if v < 0 else None
df.style.applymap(style_negative, props='color:red;')

The result is:

Styling: negative values coloured red with applymap (image by author)

Complex values

Pandas documentation on applying styles to tables is limited to discussing how the actual values in a dataframe are to be styled.

But what if we have complex values in our dataframe, e.g. a word/value pair, where the word is to be displayed and the value is used to define the styling?

Let’s create a dataframe of (word, length) pairs out of the words of a poem:

import re
# "A Visit from St. Nicholas" by Clement Clarke Moore
# https://www.poetryfoundation.org/poems/43171/a-visit-from-st-nicholas
poem = """
'Twas the night before Christmas, when all through the house
Not a creature was stirring, not even a mouse;
The stockings were hung by the chimney with care,
In hopes that St. Nicholas soon would be there;
"""
words = re.split(r'\W+', poem)
pairs = [(x, len(x)) for x in words]
# Create a 5x5 matrix of tuples: (word, length)
matrix = []
for i in range(0,5):
matrix.append(pairs[5*i:5*(i+1)])
df2 = pd.DataFrame(matrix)

The dataframe looks like:

A dataframe of (word, length) pairs (image by author)

Displaying words only

To display the value of a cell from a dataframe we use the format method. Out of the different possibilities this method offers, we are now interested only in passing it a function to convert the value of the cell to whatever is to be displayed. Right now, that is the word, the first element of the tuple in the cells.

df2.style.format(lambda x: x[0])

Here is the result:

Display only the first element of the tuples in the dataframe (image by author)

Styling values

As earlier with colouring negative values, we use applymap to style the value of the cells. Here we use the second element of the pair to apply a font weight and font colour if the length of the word is shorter than 4. We must now apply both format and applymap to the DataFrame.style element, a Styler object.

def font_short(v, weight, color):
"""
Mark short words

Parameters
----------
v: tuple of (word, length)
weight: CSS term accepted as a font-weight value
color: CSS term accepted as a color
"""
return f"font-weight: {weight}; color: {color}" if v[1] < 4 else None
# We must apply both `format` and `applymap` to the DataFrame.style
styler = df2.style.format(lambda x: x[0])
styler.applymap(font_short, weight='bold', color='orange')

Now we see a matrix of words, as above, but with the words shorter than four characters being marked bold and orange:

Mark words shorter than four characters (image by author)

Using a gradient to represent word lengths

Next, let’s represent the length of words as a gradient. Long words become dark, short words light. To do this, we must convert the length of the word to a fraction between 0 and 1, relative to the length of the shortest word and the length of the longest word.

We use matplotlib to convert that fraction to a hexadecimal term which we then use to define the background colour in CSS. We can use any matplotlib colormap, but to represent a gradient, we must use a sequential colormap.

For contrast: The image at the top of this article is based on declining values from left to right in all rows, but this decline is not represented because a qualitative colormap is used instead of a sequential colormap.

Below I use the colormap YlGn, which is short for “yellow-to-green”, where yellow stands for a low value and green for a high value.

import matplotlib as mpl
def make_gradient(v, min_length, max_length, cmap='YlGn'):
"""
Parameters
----------

v: tuple of (word, length)
min_length: int
minimum length of all words in the matrix
max_length: int
maximum length of all words in the matrix
cmap: matplotlib color map, default value here is 'YlGn'

Returns
-------

string:
CSS setting a colour

For Matplotlib colormaps:
See: https://matplotlib.org/stable/tutorials/colors/colormaps.html
"""
# normalize the word length as a fraction of the range
# between min_length and max_length
rel_v = (v[1] - min_length) / (max_length - min_length)
# define the colormap
cmap = mpl.cm.get_cmap(cmap)
# Get a colour out of the given colormap based on a value [0,1]
rgba = cmap(rel_v)
# convert the colour to a hexadecimal string representation
return f'background-color: {mpl.colors.rgb2hex(rgba)};'
# We must apply both `format` and `applymap` to the DataFrame.style
styler = df2.style.format(lambda x: x[0])
min_length = min([x[1] for x in pairs])
max_length = max([x[1] for x in pairs])
styler.applymap(lambda x: make_gradient(x, min_length, max_length))

Here is the result:

Gradient based on word lengths (image by author)

Using the background_gradient again

At the beginning of this article we saw how the background_gradient method can be used on values in a dataframe. It doesn’t work on tuples, but it does have a parameter gmap which can take a dataframe with the same index and columns as input for styling. As we will use the entire dataframe we need to set axis=None as an additional argument.

As the values on which the styling is based are now presented in a separated dataframe, we can refrain from having tuples and instead use one dataframe with words and a second one with their lengths.

# Create one dataframe with only words
df_words = df2.applymap(lambda x: x[0])
# .. and a second one with only lengths
df_lengths = df2.applymap(lambda x: x[1])
# Now use the lengths to apply the background gradient to the words
df_words.style.background_gradient(gmap=df_lengths, axis=None)

That was easy to do! The outcome is quite similar to the previous image, so let’s skip showing it for now.

Adding tooltips

Gradient colours give us a good idea of relative lengths, but we still might want to be able to see the exact lengths of words if they interest us.

Let’s add a tooltip to our dataframe that displays the length. For this we use the Styler’s set_tooltips method. It takes a dataframe and CSS properties as its arguments. The dataframe doesn’t need to be the one which we display, it just needs to correspond to it, just as our df_lengths dataframe corresponds to df_words.

Let’s create a dataframe of tooltips based on df_lengths where the cells have the content: “length: n”, where n stands for the word length.

df_lengths.applymap(lambda x: f'length: {x}')

we get:

Matrix corresponding to df_words with tooltips only (image by author)

Now we do this again in a call to set_tooltips(dataframe, props). Never mind the exact CSS properties I use here. Also, I use a different colormap for the background_gradient.

styler = df_words.style.background_gradient(gmap=df_lengths,     axis=None, cmap='YlOrRd')
styler.set_tooltips(
df_lengths.applymap(lambda x: f'length: {x}'),
props=[
('visibility', 'hidden'),
('position', 'absolute'),
('background-color', 'white'),
('color', 'black'),
('z-index', 1),
('padding', '3px 3px'),
('margin', '2px')
])

The result is:

Gradients and tooltips applied to a matrix with words (image by author)

You see the tooltip at the word “when” indicating that its length is 4. Also, you see that background_gradient provides an improvement over the earlier “manual” creation of gradient colours: here the font color is set to white when the colour gets dark, which improves readability.

Conclusion

We saw two ways of using values that are not visible in a dataframe to style the visible cells:

  1. Create a dataframe with tuples and then use format and applymap to display and style the cells.
  2. Create two corresponding dataframes where the second is used to apply style to the first.

Example

For an application of using this technique, see my following article:

--

--

Cees Roele

Language Engineer, Python programmer, Scrum Master, Writer