We automate the creation of reports in Word with data from Excel on Python
Ridiculously simple code that can save a ton of time in the future. And as an automation fan who tries to avoid any monotonous and routine work, I am simply delighted by such solutions. By the way, I talk about projects in this topic here.
I first encountered such a task many years ago, when my landlady, from whom I rented an apartment in St. Petersburg, came to me with a request from the “well, you’re a programmer” series (although at that time I was still working as an engineer and I did not have any super knowledge in developers, so I can definitely assure that almost any beginner developer can handle this code). She was involved in the organization of children’s dance competitions at the regional (or higher) level and for each competition she hired a separate person who printed diplomas for two days. There were hundreds of children, diplomas too, and purely due to the human factor, there were naturally many marriages. And she asked to somehow automate it (also a programmer).
Technically, the task looked like this: based on the results of each stage of the competition, an excel table was formed, each line of which contained information about the child/dance group, manager, nomination, program and place (diploma of 1st, 2nd or 3rd degree). Accordingly, according to the data from each line, it was necessary to create a separate diploma according to the template. The template was a simple Horde document, where data from the table must be inserted in certain places.
The task is typical for many and when I was creating my course on automating routine tasks in python, I remembered this story and devoted one practical work to it, which you will actually see below. Of course, I will not shine with the data of talented children, but for my listeners, I spent the whole evening inventing a database with imaginary employees who need to generate a vacation application. The task is completely similar: an Excel table with information about employees and vacation dates, an application template in Word. Demo below.
All data was invented from the head, coincidences with real persons are completely random. I also made up the e-mail addresses, many of them were entered incorrectly specifically for regular expression practice in my course.
Template preparation
The python-docx-template library is used to set the data to the .docx template.
Library docxtpl
is used to work with Microsoft Word (.docx) documents in Python, which allows you to create and edit documents based on templates. It allows you to automate the process of creating documents by inserting variable data into pre-created templates.
The principle of working with the library docxtpl
:
-
Document templates: You create a document template in .docx format using Microsoft Word or another editor. In the template, you can define the places where variable data will be inserted.
-
Using variables: In a template, you define variables that will be filled with data from your application or data source. For example, in a template you can specify “{{ username }}” and then insert the actual username in that place.
-
Generation of documents: Once the template is created and populated with variables, you can use the library
docxtpl
to create a document based on this template. It will automatically replace the template variables with the appropriate data.
Now let’s look at an example. First, the template must be prepared, i.e specify the places in the document where we will place the data.
How to do it: in those places in our template, where some data will be inserted, we insert double curly brackets and the name of the variable, which will be substituted later. An example is below.
We set the formatting not in the code, but in the template. That is, everything related to the font, size, indents, etc. – we specify all this manually in the document template. Of course, this can be done in the code, but in this article we will not do this, but only substitute the data.
The code for substituting some data in the template is ridiculously simple.
Installing the library:
pip install docxtpl
Code:
# Импортируем нужный объект из библиотеки
from docxtpl import DocxTemplate
# Загрузка шаблона
doc = DocxTemplate("шаблон.docx")
# Данные для заполнения шаблона
context = {
'company': 'Моя любимая работа',
'name': 'Иван',
'last_name': 'Иванов',
'start_data': '10.03.2024',
'end_data': '24.03.2024'
}
# Заполнение шаблона данными
doc.render(context)
# Сохранение документа
doc.save("новый_документ.docx")
Let’s take a step-by-step breakdown of what’s going on:
-
from docxtpl import DocxTemplate
: We import the classDocxTemplate
from the librarydocxtpl
which allows you to work with .docx files and fill them with data from the dictionary. -
doc = DocxTemplate("шаблон.docx")
: Download the document template from the “template.docx” file and create the objectdoc
which this document represents. -
context = { ... }
: We create a dictionarycontext
contains data to fill the template. The dictionary keys correspond to the template variables, which will be replaced with the corresponding values. Pay attention to this. -
doc.render(context)
: We fill the template with data from the dictionarycontext
. This will replace the template variables with the appropriate values. -
doc.save("новый_документ.docx")
: Save the completed document to a new file with the name “new_document.docx”.
So this code takes a document template, populates it with data from a dictionary context
and saves the result in a new .docx file.
Run the code and see the result
We connect the table
OpenPyXL is a library for working with Microsoft Excel format files (.xlsx)
Installation
pip install openpyxl
Basic capabilities
-
Reading data: OpenPyXL allows you to read data from existing Excel files.
-
Data recording: The library allows you to create new Excel files and write data to these files.
-
Modification of data: You can modify existing data in Excel tables, add new rows, change formatting, etc.
-
Working with formulas: OpenPyXL supports the use of Excel formulas, allowing you to calculate values based on other cells.
What functions of this library will we need now:
wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
-
load_workbook
: This is a function from the library.openpyxl
which loads a workbook from an Excel file. -
filename="people_data_vacation.xlsx"
: This function argumentload_workbook
specifies the path to the file to download. In this case, it is the file “people_data_vacation.xlsx”.
After executing this line of code, the variable wb
will contain a workbook object that represents the contents of the “people_data_vacation.xlsx” file. We can then use this object to access the contents of the Excel file, such as worksheets and their contents. What will we do next?
sheet = wb['vacation']
This line of code extracts a worksheet from a workbook wb
and assigns it to a variable sheet
. In this case, we receive an email with the name ‘vacation’. If there is no such letter, an exception will be thrown.
We can then extract data (from specific cells) from this sheet.
Let’s look carefully at one row from the table. Let’s come up with an algorithm for one employee, and then it’s not difficult to apply it to others in a cycle.
For example, we need to fill out an application for the first employee – the second row of the table. Accordingly, you need to get data from cells A2 (surname), B2 (first name), D2 (company), F2 (start of vacation) and G2 (end of vacation).
You can get data from a specific cell with the following command
sheet['A2'].value
After this line of code is executed, the value that is in cell A2 of the worksheet will be returned sheet
.
Then for other cells it will look like this:
name = sheet['B2'].value
last_name = sheet['A2'].value
company = sheet['D2'].value
start_data = sheet['F2'].value.date()
end_data = sheet['G2'].value.date()
I also add the .date() function to the dates so that it returns not in the format ‘2022-05-01 00:00:00’, but at least 2022-05-01.
We complete the code written in the previous part to create a statement with data from the table for the first employee
import openpyxl
from docxtpl import DocxTemplate
doc = DocxTemplate("шаблон.docx")
wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
sheet = wb['vacation']
name = sheet['B2'].value
last_name = sheet['A2'].value
company = sheet['D2'].value
start_data = sheet['F2'].value.date()
end_data = sheet['G2'].value.date()
# Данные для заполнения шаблона
context = {
'company': company,
'name': name,
'last_name': last_name,
'start_data': start_data,
'end_data': end_data
}
# Заполнение шаблона данными
doc.render(context)
# Сохранение документа
doc.save("новый_документ.docx")
Notice that I replaced the data in the context dictionary with variables. It would be possible to do without intermediate variables and immediately create a dictionary, but in the future you may need some checks that cannot be done in the dictionary.
It turns out that the code for the first employee is ready, all that remains is to add a loop and generate a statement for all others. Essentially, the only thing that will change in the code for each employee is the cell index, so let’s bind it to the counter in the loop.
for num in range(2,6):
name = sheet['B'+str(num)].value
last_name = sheet['A'+str(num)].value
company = sheet['D'+str(num)].value
start_data = sheet['F'+str(num)].value.date()
end_data = sheet['G'+str(num)].value.date()
context = {
'company': company,
'name': name,
'last_name': last_name,
'start_data': start_data,
'end_data': end_data
}
doc.render(context)
doc.save(last_name+' заявление на отпуск.docx')
Construction [‘B’+str(num)] will alternately return B2, B3, B4, etc.
Please also note that a new name must be created for each new document, for convenience I immediately add the last name doc.save(last_name+’vacation application.docx’) so that the resulting documents look like this
range(2,6) -
we start at index 2, since the first row of the table is the headers, and up to 6 is purely for testing the code, because if you make a mistake, it will be easier to delete a few test statements than a couple of hundreds. If several test documents were formed correctly, then you can replace the final number with the required number (either manually or use len(list(sheet.rows)+1
)
Final code
import openpyxl
from docxtpl import DocxTemplate
wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
sheet = wb['vacation']
doc = DocxTemplate('шаблон.docx')
for num in range(2,len(list(sheet.rows))+1):
name = sheet['B'+str(num)].value
last_name = sheet['A'+str(num)].value
company = sheet['D'+str(num)].value
start_data = sheet['F'+str(num)].value.date()
end_data = sheet['G'+str(num)].value.date()
context = {
'company': company,
'name': name,
'last_name': last_name,
'start_data': start_data,
'end_data': end_data
}
doc.render(context)
doc.save(last_name+' заявление на отпуск.docx')
Run the code and enjoy the result.
In my channel IT as a digital creativity, I write about automation, various interesting trainings, creating courses, talk about my experience and how I got from an engineer at a factory to creating an IT startup, give recommendations and announce the launch of interesting development projects.
I hope the material was useful to you! And good luck in your endeavors!