Once in a while as a data scientist, you may need to create PDF reports of your analyses. This seems somewhat “old school” nowadays, but here are a couple situations why you might want to consider it:
- You need to make reports that are easily printable. People often want “hard copies” of particular reports they are running and don’t want to reproduce everything they did in an interactive dashboard.
- You need to match existing reporting formats: If you’re replacing a legacy reporting system, it’s often a good idea to try to match existing reporting methods as your first step. This means that if the legacy system used PDF reporting, then you should strongly consider creating this functionality in the replacement system. This is often important for getting buy-in from people comfortable with the old system.
I recently needed to do PDF reporting in a work assignment. The particular solution I came up with uses two main tools:
- Jinja2 templates to generate HTML files of the reports that I need.
- Pdfkit to convert these reports to PDF.
- You also need to install a tool called wkhtmltopdf for pdfkit to work.
We’ll install our required packages with the following commands:
pip install pdfkit pip install Jinja2
Then follow instructions here to install wkhtmltopdf.
Primer on Jinja2 Templates
Jinja2 is a great tool to become familiar with, especially if you do web development in Python. In short, it lets you automatically generate text documents by programmatically filling in placeholder values that you assign to text file templates. It’s a very flexible tool, used widely in Python web applications to generate HTML for users. You can think of it like super high-powered string substitution.
We’ll be using Jinja2 to generate HTML files of our reports that we will convert into PDFs with other tools. Keep in mind that Jinja2 can come in handy for other reporting applications, like sending automated emails or creating reports in other text file formats.
There are two main components of working with Jinja2:
- Creating the text file Jinja2 templates that contain placeholder values. In these templates, you can use a variety of Jinja2 syntax features that allow you to adjust the look of the file and how it loads the placeholder data.
- Writing the python code that assigns the placeholder values to your Jinja2 templates and renders a new text string according to these values.
Let’s create a simple template just as an illustration. This template will simply be a text file that prints out the value of a name. All you have to do it create a text file (let’s call it name.txt). Then in this file, simply add one line:
Your name is: {{ name }}
Here, ‘name’ is the name of the python variable that we’ll pass into the template, which holds the string placeholder that we want to include in the template.
Now that we have our template created, we need to write the python code that fills in the placeholder values in the template with what you need. You do this with the render function. Say, we want to create a version of the template where the name is “Mark”. Then write the following code:
https://gist.github.com/marknagelberg/91766668fcb668c702c9080387d96538
Now, outputText holds a string of the template where {{ name }} is now equal to “Mark”. You can confirm this by writing the following on the command line:
The arguments to template.render() are the placeholder variables contained in the template along with what you want to assign them to:
template.render(placeholder_variable_in_template1=value_you_want_it_assigned1, placeholder_variable_in_template2=value_you_want_it_assigned2, ..., placeholder_variable_in_templateN=value_you_want_it_assignedN)
There is much much more you can to with Jinja2 templates. For example, we have only shown how to render a simple variable here but Jinja2 allows more complex expressions, such as for loops, if-else statements, and template inheritance. Another useful fact about Jinja2 templates is you can pass in arbitrary python objects like lists, dictionaries, or pandas data frames and you are able to use the objects directly in the template. Check out Jinja2 Template Designer Documentation for a full list of features. I also highly recommend the book Flask Web Development: Developing Web Applications with Python which includes an excellent guide on Jinja2 templates (which are the built-in template engine for the Flask web development framework).
Creating PDF Reports
Let’s say you want to print PDFs of tables that show the growth of a bank account. Each table shows the growth rate year by year of $100, $500, $20,000, and $50,000 dollars. Each separate pdf report uses a different interest rate to calculate the growth rate. We’ll need 10 different reports, each of which prints tables with 1%, 2%, 3%, …, 10% interest rates, respectively.
Lets first define the Pandas Dataframes that we need.
https://gist.github.com/marknagelberg/a302de8904f0a5f3e3f739be84723dd3
data_frames contains 10 dictionaries, each of which contain the data frame and the interest rate used to produce that data frame.
Next, we create the template file. We will generate one report for each of the 10 data frames above, and generate them by passing each data frame to the template along with the interest rate used.
https://gist.github.com/marknagelberg/9018e3a1bb9ad9d6da1b8b31468e5364
After creating this template, we then write the following code to produce 10 HTML files for our reports.
https://gist.github.com/marknagelberg/cb31ff6c2902e33bea22898c828bab80
Our HTML reports now look something like this:
As a final step, we need to convert these HTML files to PDFs. To do this, we use pdfkit. All you have to do is iterate through your HTML files and then use a single line of code from pdfkit to each file to convert it into a pdf.
https://gist.github.com/marknagelberg/a72f5d5ada749c64980139466619b312
All of this code combined will pop out the following HTML files with PDF versions:
You can then click on 1.pdf to see that we’re getting the results we’re looking for.
We’ve given a very stripped down example of how you can create reports using python in an automated way. With more work, you can develop much more sophisticated reports limited only by what’s possible with HTML.
For access to my shared Anki deck and Roam Research notes knowledge base as well as regular updates on tips and ideas about spaced repetition and improving your learning productivity, join "Download Mark's Brain".
Excellent article Mark. Very well expained and logically presented. I can see this being very useful across a number of use cases. I looked at Flask a while ago but didn’t dig deeper, other yhan getting it to run on my Android phone within Termux (which I still think is pretty nifty).
Your’re article has inspired my to dig deeper, first by replicating your results, then by seeking ways of applying these techniques elsewhere.
Again, well done Sir!
Regards,
Jim Murray
Thanks Jim, glad you found it useful! If possible, let me know about useful applications of this kind of reporting.
Also thanks for the tip about Termux. Looks very cool!
Will do. I note with interest you are with Winnepeg Transit. I have a transit software background of sorts in my ancient past I worked for both Teleride-Sage and Giro. Would I be correct in assuming WT uses Trapeze for their scheduling\runcutting needs?
They took over a sizeable chunck of the market in years passed. Last project I worked on at Giro was using a combination of their scheduling system, AWK and Ventura Publisher(!) to produce sign-post time-tables. Would do it differently now, but the concept would be the same. Hang around long enough and like Twain says, “History may not repeat itself, but it sure does rhyme…”
I’m actually not sure about what Winnipeg Transit uses for scheduling / runcutting – I work for the Traffic Signals Branch. Nice to hear that you have a transportation background though! I just started several months ago and I’m getting up to speed on the transportation-related domain knowledge.
Awesome thank you, I think the PDF report could use a little CCS to spruce it up. I’m often asked by directors to create PDF reports and they always want fancy PDF formatting. The best I’ve done is to link my MSSQL dB to Google Sheets have it read in a data source and create my ‘PDF’ sheets using the =query function and then some JS to output PDFs on a daily basis (thank you AppScript)!
this looks more fun, guess I need to learn HTML.
Glad you found it useful!
Right now it looks like WordPress is the preferred
blogging platform out there right now. (from what I’ve
read) Is that what you are using on your blog?
I loved this post! i actually read your blpg very often, and you’re
always coming out with some reat stuff. I embedded tnis on my facebook, and my followers
loved it. I reall admire the good work 🙂
Thanks Mark – very helpful and clear and useful.
One suggestion – after you give instructions to install jinja and pdfkit, you have a line that says “Note that you also need to install a tool called wkhtmltopdf for pdfkit to work. ”
Its very easy to miss this (some of use don’t read every line in an article :-)) – maybe it could be bulleted and added onto the other requirements above it?
Thanks for the suggestion – I’ve added a separate bullet
The wkhtmltopdf license is LGPLv3, therefore problematic. Is there any alternative ?
Super helpful post! Thanks a lot!
Thanks, glad you found it useful!
hi,
i am working on django project ,i enter input in my base template and got output as datafarme
{{df|safe}}
i want this output copy into another template can anyone help me please