New in SPARROW.Clean: Create Spare Parts Directly from PDF Documents

You open the document.

It’s a recommended spare parts list from your OEM or a spare parts offer from your supplier. Fifty lines, sometimes hundreds. Each row contains the information you need: part numbers, descriptions, maybe a supplier reference.

But it’s a PDF.

So the process begins: copy one section, paste it into Excel, try to rebuild the columns. The formatting breaks. Descriptions jump into the wrong cells. Rows merge together. You start fixing the structure manually.

An hour later you are still cleaning the data before you can even start creating spare parts.

Leonid Vogel, Head of Product at SPARROW, has seen this scenario countless times:

“If you have a PDF with many entries, extracting the data manually is extremely frustrating. Copy-paste often destroys the structure. You can easily spend a full day just preparing the data.”

To eliminate this step entirely, we have introduced PDF Upload in SPARROW.Clean.

Instead of converting documents into spreadsheets first, users can now upload spare parts lists directly from a PDF and process them automatically.

Turning a PDF into Structured Spare Parts Data

PDF Upload analyses the document and extracts the individual spare parts rows it contains.

These rows are then converted into a structured list inside SPARROW.Clean.

From that point on, the workflow behaves exactly like creating spare parts from a structured list: each entry can be reviewed, validated and created as a new material record. Leonid explains:

“Conceptually it’s the same as list upload. The only difference is the starting point. Instead of uploading an Excel file, you upload a PDF and SPARROW extracts the rows so you can work with them immediately.”

Built for Real-World Spare Parts Documents

Once the data has been extracted from the PDF, SPARROW.Clean processes it through the same quality workflow used for structured lists.

1. Attribute Mapping

PDF documents rarely follow a standard structure. Column names may vary, languages differ and suppliers use their own formats.

With mapping, users simply assign the extracted fields to the correct SPARROW attributes.

This makes it possible to work with almost any document format. Leonid says:

“A PDF can contain almost anything. Companies use different naming conventions and languages. Mapping lets you align that structure with the SPARROW data model.”

2. Duplicate Detection in the Uploaded List

SPARROW first checks the extracted list itself to detect duplicates.

This prevents identical spare parts from being created multiple times during the same upload.

3. Duplicate Detection Against the Existing Catalogue

Next, the system compares the entries with the existing material master.

If a spare part already exists in the catalogue, SPARROW flags it before a duplicate can be created.

4. Automatic Data Enrichment

Finally, SPARROW enriches the data by matching it against manufacturer and supplier catalogues.

Even when a PDF contains only limited information — for example a part number and a short description — the system can automatically add additional attributes such as supplier information or technical identifiers.

The result is a cleaner and more complete spare parts record.

Working With the Data You Already Receive

In spare parts management, information rarely arrives in perfectly structured spreadsheets.

It comes from machine manuals, supplier quotations, engineering documents and spare parts catalogues — often as PDFs.

PDF Upload allows teams to work directly with these documents instead of spending hours converting them first.

As Leonid summarises:

“The goal is simple: you should be able to use the data you already receive, without spending hours preparing it.”

Content