Adobe XD, Python, Website

Streamlining Your Web Export: Automating Image Extraction from Base64 to Files with Python

Streamlining Your Web Export: Automating Image Extraction from Base64 to Files with Python

After exporting your Adobe XD project using the Web Export plugin, you’re left with an HTML file cluttered with lengthy base64 strings within image sources. These unwieldy strings bloat your file and complicate image handling. To enhance manageability and performance, you can convert these strings into individual image files.

Step-by-Step Process to Clean Up Your HTML:

1. Identify the Embedded Images:

Begin by locating the base64 encoded strings within the src attributes of your <img> tags. These strings represent your images in a text format directly embedded within the HTML.

2. Prepare the Python Script:

Construct a Python script to automate the extraction and conversion process. Python, with its powerful libraries, can handle base64 decoding efficiently and automate file handling without breaking a sweat.

3. Decode and Save Images:

Use Python to decode each base64 string and save it as a corresponding image file in a format such as PNG or JPEG.

4. Update Your HTML File:

Modify the original HTML file to replace the base64 strings with references to the newly created image files. This will significantly reduce the size of your HTML file and make it more readable.

5. Finalize the Conversion:

Save the updated HTML content and the new image files. Your web project is now cleaner and more optimized for performance.

Now, let’s delve into the Python code that makes this magic happen and break down each part of the script.

Python Script for Conversion:

import re
import base64

# Open and read the HTML file
html_file_path = 'your_exported_file.html'
with open(html_file_path, 'r') as file:
    html_content = file.read()

# Find all base64 images using regex
pattern = r'src="data:image\/(png|jpeg|jpg);base64,(.*?)"'
matches = re.findall(pattern, html_content)

# Loop through the found base64 strings, decode, and save them as image files
for index, (img_format, base64_str) in enumerate(matches):
    image_data = base64.b64decode(base64_str)
    image_filename = f'image{index}.{img_format}'
    with open(image_filename, 'wb') as image_file:
        image_file.write(image_data)
    html_content = html_content.replace(f"src=\"data:image/{img_format};base64,{base64_str}\"", f'src="{image_filename}"')

# Write the new HTML content to a new file
with open('updated_exported_file.html', 'w') as file:
    file.write(html_content)

Detailed Explanation:

The script reads the original HTML file and searches for base64 strings. For each found image, it decodes the string and writes the binary image data to a new file. It then replaces the base64 string in the HTML with the path to the new image file. Finally, the script outputs an updated HTML file, which references these new, externally stored images.

By employing this Python script, you replace the cumbersome base64 strings with neat image file paths, thereby reducing the file size and loading times for your web pages.