Automating file handling with Python

Posted by Sam on Tuesday, May 31, 2022

Python - Automate file

Requirement

I use Notion for all my writing and note takings and Notion has been awesome so far. I have also stored many notes in Notion which want to publish to my blog.

Notion allows exporting my content to markdown with images. Awesome! Just one small problem is the naming of the files are not in my preferred format for my blog.

The manual work of unzipping the exported content from Notion then renaming everything is quite cumbersome.

Thus, my thought is how to automate this? Also, I have been learning Python so why not automate this process with Python.

Let’s get started!

This is totally a personal task. I just want to document my learning process so the codes are not fault-proof or awesome production-level in any way. It simply fastens my current blogging process. However, I’ll also utilize this requirement to test out different things with Python in future sprints.

Brainstorm initial flow

  1. Export an article (page in Notion) to Markdown → It is downloaded manually and saved to a folder in my local computer.
  2. Detect a new zip file in a source folder then unzip it
  3. Scan through the new unzipped folder (containing the markdown file and associated images) to change the files’ name.
    1. This part should take an input of the article name e.g. 2022-05-20-extend-edt
    2. The unzipped folder is named “2022-05-20-extend-edt”
    3. The markdown file is named “index.md” (per Hugo’s requirement of page bundle)
    4. The associated images should be named “2022-05-20-extend-edt-1”, “2022-05-20-extend-edt-2”, etc.
  4. After renaming successfully, move the whole folder to a destination folder (containing formatted folders)

1st Version

What I have learn:

  • Import module
  • Use zipfile to extract zip files
  • Use for loop, use built-in enumarate() to get index
  • Use different functions of os module to scan through directory and rename files
  • Convert types

The 1st version achieves all the requirements successfully. Happy case done!

# importing required modules
import os
import re
from zipfile import ZipFile

# specifying the source folder path containing the zip file
src_path = r"D:\Notion Export\Raw Source"
des_path = r"D:\Notion Export\Formatted"
output_filename = input("What is the output name: ")
output_dir = os.path.join(r"D:\Notion Export\Formatted", output_filename)

# creating a folder to store the extracted files
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

print('Scan through source folder:')
with os.scandir(src_path) as src_dirs:
    for entry in src_dirs:
        zip_name = entry.path
        print(entry.path)
    
# opening the zip file in READ mode
with ZipFile(zip_name, 'r') as zip:
    # printing all the contents of the zip file
    # zip.printdir()
    zipObjects = zip.infolist()
    print('Extracting all the files now...')
    # scanning through a list of objects in the zip folder again and renaming files 
    for index, obj in enumerate(zipObjects):
        # extracting all the files
        print('The original file name is: ', obj.filename)
        if(re.search('.md$', obj.filename)):
            obj.filename = "index.md"
            print('Markdown file should be renamed to index.md. The new file name is: ', obj.filename)   
        else:
            print('Renaming images:')
            obj.filename = output_filename + "-" + str(index) + ".png"
            print('Image files new file name is: ', obj.filename)   
        zip.extract(obj, output_dir)    
    print('Formating Process Done!')

# removing the zip file from source folder
os.remove(zip_name)
print("Removing the zip file: ", zip_name)

Result:

The flow is now as below

  1. I export the content to markdown in Notion. exporting the content to markdown in Notion

  2. The zip file is saved to the source folder. the source folder contains the zip file

  3. I run my above Python program which takes my preferred file name as input and automates the process of unzipping and renaming. run the program and input the preferred file name

    the program run successfully

    the output folder contains renamed files

Todo for further improvement

  1. Try/catch & error handing since the first version is a straightforward happy scenario only.
  2. Refactor codes