If you have text in image-only PDF files or make PDF files from image files containing text, you will not be able to search these documents based on their content. To make these files searchable, OCR should be used to extract their text. A searchable PDF document presents page images, but also contains the recognized text in a separate layer, with each text character referenced to its image counterpart. This allows the PDF to be searched. Searchable PDF is especially useful to access content in documents that must be archived with their precise original appearance.
Note
When Searchable PDF is selected, it runs the OCR process only when no accessible text layer is detected in an input file. When a text layer is found, this is used to make a normal PDF that is searchable without the need to run OCR. This happens even if Searchable PDF is disabled.
You use Create Assistant to turn image-only PDF files or various types of image files into searchable PDF documents.
You can set the OCR language in the Searchable PDF Conversion Settings dialog box.
Tip
See the list of supported file types in Create Assistant.
Create Assistant provides a separate profile named Searchable PDF, but you can also create Searchable PDF using other profiles by turning on the checkbox Searchable PDF.
To use the 'Searchable PDF' profile in Create Assistant
In the Create Assistant Profile selection box, select Searchable PDF.
Open one or more files you want transformed to Searchable PDF.
Click the Profiles button to check settings in the PDF Create Profiles dialog box. The Searchable PDF checkbox is automatically turned on. Keep this setting and change other settings (e.g. security, watermark, etc.) if required.
Click the Settings button to display the Searchable PDF Conversion Settings dialog box. Select the language of your source document, then close the dialog box. Click OK to close the PDF Create Profiles dialog box.
Click the Start PDF Creation tool. If you chose multiple files with the option Create a PDF for each input document, and you set Query the File name as the saving option, the Save As dialog box appears separately for each generated PDF file.
To create Searchable PDF using other profiles
In the Create Assistant Profile selection box, select a profile and load files.
Click the Profiles button.
In the PDF Create Profiles dialog box, turn on the Searchable checkbox.
Click the Settings button to display the Searchable PDF Conversion Settings dialog box. Select the language of your source document, then click OK.
In the PDF Create Profiles dialog box check and change other settings (e.g. security, watermark, etc.) if required. Click OK and start creation as described above.
Tip
To get a Searchable PDF with MRC compression, turn on both checkboxes. In this case if you click the Settings button, the Searchable MRC PDF Conversion Settings dialog box will appear.
When you open an image-only PDF file in Power PDF, or one that has image-only pages, the program can be made to auto-detect this (File > Options > Document > Searchable PDF Documents). To do this de-select the relevant checkbox (by default it is selected). If prompting is enabled, the program offers to make the document a:
Searchable PDF: this keeps the original page images, so the appearance is conserved, but adds a searchable text layer.
Normal PDF: this generates text and keeps pictures, but discards the original page images.
PDF Form: done by running Form Typer on it to create active form controls.
PDF file that remains as it is.
For more detail, see About Editing PDF Documents.
Note
In Power PDF it is possible to transform an image-only PDF or a PDF with image-only parts into a searchable PDF using the command Make PDF Searchable at Home > Convert You can influence this transformation under File > Options > Document > Searchable PDF documents. Choose whether to have OCR run only on pages with image-only parts or on all pages – in this case any text layer content previously in the PDF is replaced by the OCR results. Another option allows OCR to run if a text layer is present but unusable due to non-standard encoding. It is possible to proofread generated text to raise its accuracy from the OCR process.