Back to articles

Software Development

A Batch Translation Tool for PDFs, Built Just for Me

AI made personal software worth building again

The Customer Satisfaction Rate Is 100%

I built a small tool that translates English PDFs into Japanese Markdown files.

That sentence sounds more practical than exciting. But the part I find interesting is not the translation itself. ChatGPT and Claude can already translate. The interesting part is that I had a tiny, personal annoyance, and it was finally worth turning it into software.

The tool is simple.

I put multiple English PDF files into a folder. For example:

English Documents/
├── A.pdf
├── B.pdf
└── C.pdf

Then I run the tool and specify that folder. It extracts the main text from each PDF, translates it into Japanese, and creates a new folder called jp-translations.

The result looks like this:

English Documents/
├── A.pdf
├── B.pdf
├── C.pdf
└── jp-translations/
    ├── A.md
    ├── B.md
    └── C.md

This is not a product. It is not a startup. It is not a SaaS idea. It is not something I expect other people to use.

It is software made for one user: me.

It probably lacks many features. It probably handles some PDFs badly. It is probably not polished enough to explain to a normal customer. But it does exactly what I need, in the way I need it.

So the customer satisfaction rate is 100%.

That is what generative AI has changed for me. It has made it reasonable to build tiny pieces of personal software again.

Why I Still Want to Read the Original

I am Japanese, but I often read documents written in English: papers, technical documents, manuals, essays, and PDFs saved from overseas websites.

Of course, I can simply translate them into Japanese. That is often faster. But I still want to read the original English when the document matters.

This is not always rational.

It is a little like watching foreign movies with subtitles. I do not understand spoken English perfectly. A dubbed version in Japanese would be easier. Still, I prefer subtitles. The subtitled version feels closer to the original source. It feels as if I am getting something more direct, even if that feeling is partly an illusion.

Reading English documents in English has a similar appeal.

When I read the original, I feel closer to the author’s reasoning. The structure of the argument is often easier to see. Where does the author make a strong claim? Where do they hedge? Which word did they choose carefully? What is connected by “however,” “therefore,” or “although”?

A translation can carry the meaning, but it cannot preserve everything.

To make English sound natural in Japanese, a translator often has to split sentences, move phrases, add subjects, remove subjects, or reorganize the order of information. That is not a flaw. It is what good translation often requires.

But something changes in the process.

A word that remains slightly ambiguous in English may become clearer in Japanese. Two English terms that the author uses differently may collapse into the same Japanese expression. A cautious phrase such as “may,” “might,” or “is likely to” may be easy to miss when reading the translation quickly.

When I read the Japanese translation first and then go back to the English, I often notice that I had missed something.

I may have understood the topic. I may have understood the conclusion. But the original shows me how the author got there.

So I do not want Japanese translation to replace English.

I want Japanese translation to help me enter the English.

Translation as a Map, Not a Destination

The method I use is simple.

Before reading the English document, I skim a Japanese translation.

I do not read it carefully. I do not treat it as the final version. I quickly check what the document is about, what the main argument seems to be, and how the explanation is structured.

Then I read the English.

This changes the experience.

The translation gives me a rough map. I know the terrain before I enter it. Even if I do not understand every word in the original, I am less likely to lose the overall direction.

When visiting an unfamiliar city, a map does not replace walking through the streets. But it helps me understand where I am when I turn a corner.

For me, the Japanese translation plays that role.

It is a map for reading the original.

AI Translation Can Be Disposable

This method became practical because AI translation is now good enough and cheap enough to be used casually.

That word, casually, matters.

If I bought a translated book, I would feel obligated to read it carefully. If I paid a professional translator, I would never skim the translation for two minutes and then throw it away mentally.

But AI translation can be used in a more disposable way.

I can skim it. I can search inside it. I can read only the headings. I can ignore it after it has helped me understand the structure of the document.

This would be wasteful if translation were expensive. But with AI, a rough translation can be used as scaffolding.

Of course, I do not assume that AI translation is always correct.

For important passages, I return to the English. If the Japanese sounds strange, I check the source. If a phrase matters, I do not trust the translation blindly.

The Japanese output is not the destination.

It is a temporary map.

The Chat Interface Was the Wrong Interface

At first, I used ChatGPT or Claude directly.

I uploaded a PDF and asked for a Japanese translation. For one document, that works well enough.

But once I repeated the task, the friction became obvious.

I had to drag and drop each PDF. I had to explain the same preferences again: translate only the main text, ignore footnotes and navigation, do not summarize, stay close to the original.

For longer documents, the output often stopped halfway. Then I had to type:

Please continue.

If it stopped again, I had to ask again.

When the translation was done, I had to copy the output, combine the pieces, and save the result somewhere. If I had several PDFs, I had to repeat the process from the beginning.

Conversational AI is excellent when the task is exploratory. I can ask questions, clarify a paragraph, or request another explanation.

But this task was not exploratory.

I did not want to have a conversation every time.

I wanted a repeatable workflow.

The problem was not that AI could not translate. The problem was that the chat interface was the wrong interface for a batch job.

Turning Translation Into a Workflow

So I decided to build a small Python tool.

I am a software engineer, so I could have written it from scratch. I could choose a PDF library, extract text, call an API, split long documents, save the output, handle errors, and write the README.

The problem was not difficulty.

The problem was justification.

This was a tool for one person. There was no product roadmap, no customer contract, no budget, and no business case. Spending a full day polishing a personal PDF translation script would have felt excessive.

Generative AI changed that calculation.

This time, I made the design decisions myself and let AI handle much of the implementation.

The first prompt was roughly this:

I want a tool that takes English PDF files and creates Japanese translations.
 
The input PDFs will look like the attached examples.
 
Do not paraphrase. Translate relatively literally.
As when I ask Claude directly, I only want the main text translated.
Unnecessary ads, notes, navigation, and similar material should not be translated.
 
Do not simply translate the extracted text sequentially.
Please make the translation take into account the flow of the already translated parts.
 
Use Python in a venv environment and process all PDFs in a specified folder one by one.
Create a folder called jp-translations inside the specified folder and save the outputs there.
The output should be text files in Markdown format (.md).

I did not specify every implementation detail.

I did not tell it exactly which PDF library to use. I did not define every retry rule. I did not design every function name.

But I was specific about the behavior that mattered.

The input is PDF.

The unit of work is a folder.

The output is Markdown.

The tool should translate the main text, not the page furniture.

The translation should be close enough to the original that I can compare it with the English.

The previous translation should be used as context for the next part.

This is the important part of prompting for code: not magic words, but requirements.

What goes in?

What comes out?

What should be excluded?

What should happen when the task is too large?

What should happen when the process is interrupted?

Those questions matter more than making the prompt sound sophisticated.

“Main Text Only” Is a Real Requirement

PDFs are messy.

A PDF created from web material may include site headers, search boxes, navigation links, share buttons, page numbers, license notices, and footers. Academic papers may repeat journal names and page numbers on every page. Books may include footnotes, references, image credits, and copyright information.

If all of that is translated, the result becomes noisy.

I do not want a Japanese translation of “next chapter,” “share this page,” or “Creative Commons license” inserted into the middle of the text I am trying to read.

So the tool handles this in two layers.

First, normal Python code removes obvious noise: repeated short lines, page numbers, and lines that look like download notices.

Then the AI translation prompt also tells the model to exclude navigation, ads, footnotes, references, image credits, and similar material.

This division matters.

If I tried to solve everything with handcrafted rules, I would end up writing endless special cases for different PDF formats. If I sent all extracted text to AI, I would waste tokens and money on obvious noise.

So I used conventional programming for what conventional programming can do well, and AI for the more ambiguous judgment.

That is not “just ask AI.”

That is system design.

Long Documents Need Structure

A long PDF cannot simply be sent to an AI model as one giant request.

Even if it fits within the context window, translation quality may become uneven. The model may handle the beginning well and become less consistent later. It may lose track of terminology. It may translate repeated concepts differently.

So the tool splits the extracted text into chunks.

But it does not cut the text every fixed number of characters without regard for structure. It tries to split at paragraph boundaries. If a paragraph is too long, it splits by sentence.

Chunking creates another problem: each chunk becomes a separate translation request.

From the model’s point of view, the second chunk does not automatically know how the first chunk was translated. A technical term might change. A reference might become unclear. The tone might drift.

So the tool passes the tail end of the previous Japanese translation into the next request as context. The model is instructed not to repeat that previous translation, but to use it to keep the flow and terminology consistent.

This does not solve every long-document translation problem.

But it is better than translating every chunk as if it were unrelated.

The First Version Was Only the Beginning

The first AI-generated version handled the basic workflow.

It found PDFs in a folder. It extracted text. It split the text into chunks. It sent the chunks to the Claude API. It saved the translation as Markdown.

But once the code worked, the real design work became clearer.

The first version required me to set the API key in the terminal. I did not want to do that every time. So I asked for .env support.

Does this support python-dotenv?
 
I plan to write ANTHROPIC_API_KEY=... in a .env file.
Please make the tool automatically read the API key from .env.

Then I realized I did not want to test the tool on an entire folder immediately. I wanted a single-file mode.

I also want a mode for translating just one PDF for testing.
 
For example, I want to specify sample.pdf
and output the translation to a specified file such as result.md.
 
This should be separate from the folder batch mode,
so I can use it to check the behavior on a single PDF.

Then I thought about interruption.

If I stop a long translation with Ctrl+C, I do not want to lose everything translated so far. So I asked the tool to save after each completed chunk.

When the process stops halfway,
for example when I press Ctrl+C during translation,
I want to be able to get the Markdown translated up to that point.
 
How about generating the .md file immediately after the first chunk is translated,
and then overwriting it step by step as each chunk is completed?

AI then pointed out a useful detail.

If the program overwrites the output file directly and the process stops during the write, the file itself may become corrupted. The solution was atomic writing: first write the complete content to a temporary file, then replace the official output file after the write succeeds.

That is a small implementation detail.

But it is exactly the kind of small detail that turns a script into a tool I can trust.

Reruns, Partial Files, and Cost Limits

Next came reruns.

If the batch process runs again, already translated PDFs should not be processed again. So the tool skips files whose output already exists.

When I rerun the folder batch mode,
if the output file for a PDF does not exist, translate only that file.
 
If the output file already exists, do not translate it again.
 
After an interrupted run, rerunning the folder batch mode will not reprocess the file,
because the partial .md file still exists.
 
If I delete that output file and run the folder batch mode again,
I want only that PDF to be translated again.

This was an intentional simplification.

I could have built a sophisticated resume feature that continues from the exact chunk where the previous run stopped. But that would require tracking chunk boundaries, checking whether the source PDF changed, preserving context, and handling changes in prompts or model behavior.

For my use case, that was not worth it.

If I stop halfway and want a clean result, I can delete that Markdown file and rerun the tool. Completed PDFs are skipped. Only the deleted one is translated again.

Simple is often better than clever.

Then came cost control.

If a huge PDF is mixed into the folder, API usage may become larger than expected. So I added a default safety limit: translate only up to 10 chunks per file unless I explicitly ask for more.

When the limit is applied, the output file name makes that clear:

long-document.upto10chunk.md

The output itself also includes a note explaining that only part of the document was translated and how to remove the limit.

The prompt was roughly this:

I want to be able to specify max_chunk. The default should be 10.
 
If a huge file has more than 10 chunks,
then by default only the first 10 chunks should be translated.
 
When max_chunk is applied,
the output file name should end with something like .upto10chunk.md.
 
Also, at the end of the translated text,
please add a note explaining that max_chunk was applied
and how to remove the limit.
 
To remove the limit, the user should delete that output file
and rerun the tool with a larger max_chunk value,
or rerun it with no_max_chunk.

None of these details were in the first prompt.

They appeared only after I imagined using the tool repeatedly.

That is the point.

The first prompt did not need to be perfect. It needed to create something concrete enough for the next problem to become visible.

Prompting as Requirements Definition

People often talk about “prompt engineering” as if the hard part is finding the right phrase.

In this project, the hard part was different.

The hard part was turning vague irritation into observable behavior.

“ChatGPT translation is annoying” is not a requirement.

“Process every PDF in this folder and save each translation as Markdown” is a requirement.

“I do not want to lose work if it stops” is still vague.

“Save the Markdown file after each completed chunk, and write atomically so the output file is not corrupted” is a requirement.

“API cost worries me” is a feeling.

“Translate only the first 10 chunks by default, add .upto10chunk to the file name, and write a note at the end explaining how to rerun without the limit” is a requirement.

This is not mainly about writing clever prompts.

It is requirements definition with an AI implementation partner.

Was This Vibe Coding?

In February 2025, Andrej Karpathy described “a new kind of coding” that he called “vibe coding.” In his original X post, he described a style of development where he could “forget that the code even exists,” accept diffs without reading them carefully, and paste error messages back into the AI until the project mostly worked (Karpathy, 2025). AP News later summarized the same post as an example of how AI coding tools were changing software work, while also noting that some engineers dislike the term because responsibility still remains with the developer (O’Brien, 2025).

Later academic work has treated vibe coding as an emerging form of programming through conversation with AI. Sarkar and Drosos describe it as an iterative loop of prompting, scanning, testing, editing, and deciding when to intervene manually. Pimenova and colleagues emphasize co-creation, flow, experimentation, and trust.

That distinction matters here.

My PDF translation tool was not built by simply “going with the vibe.”

AI wrote much of the code. But I did not accept generated code without understanding the behavior I needed. I did not throw a vague request at AI and hope it would somehow feel right.

I defined the purpose and constraints:

  • The translation is not the final product. It is a map for reading the original.
  • Only the main text should be translated.
  • Loose paraphrasing should be avoided.
  • Previous translation should be used as context.
  • Multiple PDFs should be processed together.
  • Intermediate results should survive interruption.
  • Output files should not be corrupted during writing.
  • Completed files should not be processed again.
  • API cost should be limited by default.
  • A complex resume feature should intentionally not be built.

AI did not sense my “vibe.”

I observed a recurring inconvenience, converted it into requirements, and described those requirements in a form AI could implement.

What I delegated to AI was mainly code production.

What to build, what not to build, which behavior was safe, which complexity was unnecessary, and which result counted as correct remained human decisions.

So I would not call this vibe coding in the casual sense.

I would call it requirements-driven AI-assisted development.

I Did Not Build Translation. I Built a Way to Run Translation.

The finished tool works roughly like this:

Find PDFs in a folder

Extract text from each PDF

Remove repeated headers and footers

Split the text into chunks while respecting paragraphs

Pass the previous translation as context to AI

Translate only the main text into Japanese

Save each completed chunk to Markdown

Move on to the next PDF

I did not build a translation model.

Claude and ChatGPT already have translation capabilities.

What I built was a way to use that capability repeatedly, in a form that fits my reading style.

I turned a manual chat interaction into a repeatable workflow.

The value of the tool is not only in the translation itself.

It keeps partial results.

It skips completed files.

It limits unexpected cost.

It removes some obvious non-content text.

It saves Markdown files where I expect them to be.

These are boring features.

But boring features are often what make software useful.

Personal Software Is Back

The finished tool is not a major service.

It is not something I would pitch to investors. It is not something I would polish for a public launch. It is not even something I expect another person to install.

It is a tool for one person.

That used to be a strange category of software. Personal scripts existed, of course, but building a polished-enough tool for one private workflow often cost more time than the workflow itself.

Generative AI changes that balance.

It lowers the cost of making small tools that are too specific to become products.

That does not mean AI can build anything perfectly from one prompt. It does not remove the need for engineering judgment. It does not make software design disappear.

But it does make a new kind of personal software more practical.

Observe the inconvenience.

Turn it into behavior.

Build a first version.

Use it.

Notice what is wrong.

Add the next requirement.

Repeat.

Through that process, a general AI capability becomes a tool shaped around one person’s workflow.

In my case, the starting point was simple.

I wanted to read English documents in the original.

I wanted Japanese translation not as a replacement, but as a map.

And I did not want to keep dragging PDFs into a chat window and typing “Please continue.”

From those small frustrations, I got a tool I can use every day.

For a product, that would not be enough.

For personal software, it is perfect.

References

Karpathy, A. [@karpathy]. (2025, February 2). There’s a new kind of coding I call “vibe coding”... [Post]. X. https://x.com/karpathy/status/1886192184808149383

O’Brien, M. (2025, September 29). AI is transforming how software engineers do their jobs. Just don’t call it ‘vibe-coding’. AP News. https://apnews.com/article/ai-vibe-coding-anthropic-assistants-09f35ccc7545ac92447a19565322f13d

Sarkar, A., & Drosos, I. (2025). Vibe coding: Programming through conversation with artificial intelligence. arXiv:2506.23253. https://arxiv.org/abs/2506.23253