Apr 10, 2026
28 min read

At first glance, generating agreements dynamically sounds straightforward. In practice, it comes with a very strict constraint.
Legal teams provide approved templates where every detail—spacing, alignment, fonts, and positioning—is already finalized. In some cases, these documents are already reviewed or even signed.
That means:
· The generated PDF must match the original exactly
· Even small layout shifts are unacceptable
· The document must render consistently across viewers
· The output must remain unchanged once generated
This quickly becomes less about generating a document and more about ensuring fidelity, consistency, and immutability.
Before arriving at the final solution, we explored a few different approaches.
We started with HTML templates because they are flexible and easy to generate dynamically.
This worked well for content generation, but not for matching legal templates:
· Layout didn’t align with the original documents
· Small CSS differences caused visible shifts
· Rendering varied across engines
Even minor misalignment made the output unusable for legal purposes.
Next, we tried DOCX templates with placeholders.
The idea was simple: - Legal teams maintain DOCX templates - Replace placeholders with runtime data - Convert DOCX to PDF in Java
This improved authoring, but introduced a different set of issues:
· Conversion caused layout inconsistencies
· Fonts and spacing changed depending on the library
· The same document could render differently across environments
Even when the data was correct, the document was not reliable visually.
The real challenge was not generating data, but preserving the exact structure of the original document.
We moved to using PDF templates with predefined AcroForm fields.
Once we had the final PDF layout from legal, the next step was not just filling values—it was preparing that PDF so it could be filled reliably at runtime.
That meant adding form fields to the PDF
right locations and with the right characteristics.
This part mattered as much as the filling logic itself, because a poorly sized or poorly styled field would break the visual fidelity of the document even if the data was correct.
The approach worked because it let us:
· Preserve the exact layout of the legal-approved PDF
· Add field regions only where dynamic values were needed
· Keep the font family, font size, and overall appearance aligned with the surrounding document
· Avoid conversion-related inconsistencies entirely
In other words, the PDF remained the source of truth, and engineering worked within that structure instead of recreating it elsewhere.
The process starts with the final PDF provided by legal. At this point, the document is treated as a fixed artifact—its layout, spacing, typography, and positioning are already finalized and should not be altered.
The next step is identifying which parts of the document need to be dynamic. These are typically values such as names, dates, or account details
Once the dynamic regions are identified, we prepare the PDF by introducing form fields (AcroForm fields) at those exact locations.
This is not just a mechanical step. Each field must be carefully designed so that it integrates seamlessly with the existing layout.
For every field, we need to decide:
· The exact position within the document
· The width and height of the field
· Whether the field should be single-line or multi-line
· The font family and font size to be used
· The alignment relative to surrounding text
Because the PDF layout is fixed, the available space for each field is constrained. This makes sizing decisions critical.
Not all fields behave the same way. Some values are predictable in length and format, while others vary significantly.
Predictable fields include: - Dates - Numeric values - IDs or reference numbers
These can be sized with confidence because their format is known.
Unpredictable fields include: - Full names - Addresses - Free-form text
For these, we need to anticipate variability.
A practical approach we followed:
· Estimate the longest reasonable value the field should support
· Use that estimate to define the field width
· Ensure the field still looks natural within the document
If a single-line field becomes visually cramped or forces awkward spacing, we consider multi-line handling where the template allows it.
This is a balance between:
· Preserving the original design
· Supporting real-world data
Another important aspect is maintaining the look and feel of the original document.
When form fields are added, they should not appear as overlays or injected elements. They should visually blend into the template.
To achieve this, we align with the existing document style:
· Use the same font family already present in the PDF
· Match the font size expected in that section
· Maintain consistent alignment and spacing
Even small mismatches in font or size can make the document look inconsistent or tampered with.
Once the fields are properly defined, runtime processing becomes more straightforward.
The application performs the following steps:
1. Load the PDF template
2. Access the AcroForm structure
3. Retrieve field definitions by name
4. Map runtime data to those fields
5. Populate the fields with values
6. Normalize appearances to ensure consistent rendering
7. Refresh appearances so values are visible
8. Flatten the document to remove editability
9. Send the document for signing and persist it
Before flattening, the PDF behaves like an interactive form. Users can modify values if they open it in an editor.
After flattening, all fields are converted into static content. At this point, the document becomes a final artifact and can no longer be edited.
This separation between preparation (field definition) and execution (field population and flattening) is key to making the system reliable.
PDDocument document =load(pdf);
PDAcroForm form =getForm(document);
fillFields(form,
data);
validateFields(form);
setupFonts(form);
normalizeAppearance(form);
form.refreshAppearances();
flatten(form);
returnsave(document);
This flow looks simple, but each step plays a specific role:
· fillFields: maps known field names to runtime values
· validateFields: ensures template and data are in sync
· setupFonts / normalizeAppearance: standardizes rendering across viewers
· refreshAppearances: ensures values are actually visible
· flatten: converts the document into a final, non-editable artifact
Together, these steps guarantee that:
· Field mapping is explicit and predictable
· Missing fields are detected early
· Rendering is consistent across environments
· The final document is immutable and safe to use downstream
Field names are predefined in the template and treated as a contract.
We enforce a strict validation approach:
· All expected fields must exist
· Missing fields result in failure
· No silent fallbacks
Additional checks include:
· Handling null or empty values
· Formatting data (dates, amounts, identifiers)
· Distinguishing required vs optional fields
This helps ensure every generated document is complete and correct.
Rendering in PDFs depends on appearance definitions, not just stored values.
We handle this explicitly by:
· Setting form-level font resources
· Normalizing field-level appearance strings
· Regenerating appearances
Without this step:
· Values may not be visible in some viewers
· Fonts may render inconsistently
In other words, rendering is part of correctness—not just presentation.
Field sizing is one of the most practical and nuanced parts of working with PDF forms.
Once the layout is fixed by the template, every field must fit within a predefined area. Unlike dynamic layouts (like HTML), the PDF does not reflow content automatically. This means that the data must adapt to the layout—not the other way around.
The first step is understanding how different types of data behave.
Some fields are highly predictable:
· Dates typically follow a fixed format
· Numeric values have known ranges and precision
· Identifiers follow structured patterns
These can be sized with relatively high confidence.
Other fields are much less predictable:
· Full names can vary widely in length
· Addresses may span multiple lines
· Free-form text may exceed expected limits
These require a more careful approach.
To handle unpredictable fields, we used a practical strategy:
· Assume the longest reasonable value that could appear
· Use that assumption to define the field width
· Validate inputs against this expectation
For example, when designing a “full name” field, we consider edge cases such as long compound names or multiple surnames.
This ensures that most real-world inputs fit without breaking the layout.
Even with careful sizing, there are cases where data does not fit cleanly into a single line.
In such situations, we consider:
· Allowing multi-line fields where supported
· Adjusting formatting to improve readability
· Applying validation rules to prevent extreme cases
The goal is not just to fit the data, but to ensure the document remains readable and visually consistent.
Field sizing cannot be treated in isolation. It must align with the surrounding content.
This means:
· Matching font size with adjacent text
· Aligning text baselines
· Preserving spacing between elements
If a field is too wide, too narrow, or misaligned, it becomes noticeable—even if the data itself is correct.
Ultimately, field constraints require balancing:
· Fixed layout from the template
· Variable nature of user input
· Readability and visual consistency
This is where much of the practical complexity lies. Getting this right ensures that the final document looks natural, not programmatically generated.
PDF forms can have hierarchical structures:
· Terminal fields (store values)
· Non-terminal fields (group other fields)
This affects how flattening works.
We detect the structure and apply the appropriate flattening strategy to ensure consistent results across templates.
Filling fields alone does not make a PDF immutable.
The document remains editable until we flatten it.
Flattening:
· Removes form field interactivity
· Converts values into static content
· Prevents further modification
The order of operations is important:
Fill → Normalize → Refresh → Flatten
We use an external service to sign documents with NOM-151 compliant signatures.
This allows us to:
· Avoid managing cryptographic keys in the application
· Ensure signatures are auditable
· Meet regulatory requirements
The trade-off is additional latency (~500ms), which is acceptable given the guarantees.
Key factors affecting performance:
· PDF parsing and serialization
· Network calls for signing
· Storage operations
Optimizations include:
· Running independent steps in parallel
· Avoiding unnecessary processing
· Reusing templates where possible
We handle failure cases explicitly:
· Missing fields → fail fast
· Appearance issues → log and continue where safe
· Flattening issues → retry strategy
· Invalid templates → reject early
The goal is simple: avoid silent failures and ensure predictable outcomes.
Load → Fill → Validate → Normalize → Refresh → Flatten → Sign → Store
Each step has a clear purpose and responsibility.
· Start with the final format when precision matters
· Avoid conversion pipelines for fixed-layout documents
· Treat templates as the source of truth
· Validate inputs and templates strictly
· Treat rendering as part of correctness
· Understand field structure before operating on it
· Enforce immutability explicitly
· Delegate security-sensitive operations
· Design for auditability from the beginning
A PDF is not just an output—it is the agreement.
Once generated, it must remain stable, accurate, and verifiable.
Getting there requires attention to layout, rendering, validation, and immutability—not just data population.
The complexity lies in the details, and those details are what make the system reliable.

5 min read

5 min read

5 min read