83 – Extracting text data into a dict#
Regular expression matches from the module re have a method groupdict that allow you to create a dictionary with the named groups that your regular expression defines.
This is useful, for example, if you’re extracting data from structured text and want to convert it to a more convenient format (a dictionary).
Suppose you have a number of files for copyright-free books with a frontmatter header and the markdown contents:
---
title: Moby Dick
author: Herman Melville
---
[...]
You can define a regex pattern to extract the title and author information:
pattern = r"""(?x)
---\n
title:\ (?P<title>.*?)\n
author:\ (?P<author>.*?)\n
---"""
Then, you can use the module re and any of its functions to search/find text.
If you get a match, you can use the method groupdict to create a dictionary with key/value pairs for every named group you defined.
In the example above, that would be a dictionary with keys "title" and "author":
import re
print(
re.match(pattern, text).groupdict()
)
# {'title': 'Moby Dick',
# 'author': 'Herman Melville'}
Further reading: