ShareGPT

88 readers

1 users here now

To share LLM text output that others might find interesting.

founded 2 years ago

MODERATORS

PumpkinDrama@reddthat.com

Merging all Python files in a Git repo into a single Markdown for LLM analysis (reddthat.com)

submitted 2 months ago* (last edited 2 months ago) by PumpkinDrama@reddthat.com to c/sharegpt@reddthat.com

1 comments fedilink hide all child comments

I want to feed a Git repository to a large language model so it can give better feedback when I ask questions about the code. To do that, I need to merge all Python files in the repo (including those in subfolders) into a single Markdown file.

The format I’m thinking of is similar to the Linux tree command for the structure, followed by the content of each file. For example:

merged_python_files.md

├── a.py
a.py content between triple quotes
└── subfolder
    └── b.py
b.py content between triple quotes

I’d like a Python script that can recursively traverse the repo, collect all .py files, and produce this Markdown file automatically.

top 1 comments

sorted by: hot top controversial new old

[–] PumpkinDrama@reddthat.com 2 points 2 months ago

Got it! You want a Python script that recursively walks through a Git repository, finds all .py files, and writes them into a single Markdown file in a tree-like structure, followed by the file contents. Here's a clean script to do that:

import os

def generate_tree_md(root_dir, md_file):
    python_files = []
    
    # Walk the directory recursively and collect Python files
    for dirpath, dirnames, filenames in os.walk(root_dir):
        for filename in filenames:
            if filename.endswith('.py'):
                full_path = os.path.join(dirpath, filename)
                relative_path = os.path.relpath(full_path, root_dir)
                python_files.append(relative_path)
    
    # Sort files to ensure tree structure is consistent
    python_files.sort()
    
    with open(md_file, 'w', encoding='utf-8') as f:
        last_parts = []
        
        for file_path in python_files:
            parts = file_path.split(os.sep)
            # Print tree structure
            for i, part in enumerate(parts):
                if i < len(parts) - 1:  # Directory part
                    if len(last_parts) <= i or last_parts[i] != part:
                        # Print directory if it changed
                        f.write('    ' * i + f'└── {part}\n')
                        if len(last_parts) <= i:
                            last_parts.append(part)
                        else:
                            last_parts[i] = part
                else:  # File part
                    f.write('    ' * i + f'├── {part}\n')
            # Append file content
            f.write('```\n')
            with open(os.path.join(root_dir, file_path), 'r', encoding='utf-8') as code_file:
                f.write(code_file.read())
            f.write('```\n\n')

if __name__ == "__main__":
    repo_path = "."  # Change this to your repo path
    output_md = "merged_python_files.md"
    generate_tree_md(repo_path, output_md)
    print(f"Merged Python files written to {output_md}")

✅ How it works:

Recursively collects all .py files in the repo.
Sorts them to maintain tree order.
Writes a tree-like Markdown representation of directories and files.
Appends the content of each Python file in code blocks.

If you want, I can also improve this to look exactly like the Linux tree command with proper ├── and └── for both files and folders for nested directories—it’ll look prettier and more accurate for big projects.

Do you want me to do that?