Manual Reference Pages - PDFGREP (1)
pdfgrep - search pdf files for a regular expression
pdfgrep [OPTION...] PATTERN [FILE...]
Search for PATTERN in each FILE. PATTERN is an extended regular expression.
pdfgrep works much like grep, with one distinction: It operates on pages and not on lines.
Ignore case distinctions in both the
and the input files.
as a list of fixed strings separated by newlines, any of which is to be matched.
as a Perl compatible regular expression (PCRE). See
pcresyntax(3) for a quick overview.
Print the file name for each match. This is the default setting when there is more than one file to search.
Suppress the prefixing of file name on output. This is the default setting when there is only one file to search.
Prefix each match with the number of the page where it was found.
Suppress normal output. Instead print the number of matches for each input file. Note that unlike grep, multiple matches on the same page will be counted individually.
-c, but prints the number of matches per page.
-C, --context NUM
Print at most
characters of context around each match. The exact number will vary, because pdfgrep tries to respect word boundaries. If
is "line", the whole line will be printed. If this option is not set, pdfgrep tries to print lines that are not longer than the terminal width.
Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal. (The default setting is
Always use colors, even when stdout is not a terminal.
Do not use colors.
Use colors only when stdout is a terminal.
Print only the matched part of a line without any surrounding context.
Recursively search all files (restricted by
--exclude) under each directory, following symlinks only if they are on the command line.
-r, but follows all symlinks.
Skip files whose base name matches
glob(7) for wildcards you can use. You can use this option multiple times to exclude more patterns. It takes precedence over
--include. Note, that in- and excludes apply only to files found via
and not to the argument list.
Only search files whose base name matches
for details. The default is
Use PASSWORD to decrypt the PDF-files. Can be specified multiple times; all passwords will be tried on all PDFs.
that this password will show up in your command history and the output of
ps(1). So please do not use this if the security of
-m, --max-count NUM
Stop reading a file after
matches. When the -c or --count option is also used, pdfgrep does not output a count greater than
Output a null byte (called
in ASCII and \0 in C) instead of the colon that usually separates a filename from the rest of the line. This option makes the output unambiguous in the presence of colons, spaces or newlines in the filename. It can be used in conjunction with commands such as
Changes the colon used to separate filename, line number and text in the output to
SEP, which can be an arbitrary string. This is useful when filenames contain colons, but only for interactive usage. For scripting,
should be used.
Enable debug output.
Note: Due to limitations of poppler before version 0.30.0, some debug output is also printed without
when using such a poppler version.
Print a warning to
if a PDF contains no searchable text. This is the case for PDFs that consist only of images, for example scanned documents.
Remove accents and ligatures from both the search pattern and the PDF documents. This is useful if you want to search for a word containing "ae", but the PDF uses the single character "æ" instead. See
This option is experimental and only available if pdfgrep is compiled with unac support.
Suppress all normal output to stdout. Errors will be printed and the exit codes will be returned (see below).
Print a short summary of the options.
-V, --version Show version information.
Normally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred. But if the --quiet or -q option is used and a match was found, pdfgrep will return 0 regardless of errors.
The behavior of pdfgrep is affected by the following environment variable.
Specifies the colors and other attributes used to highlight various parts of the output. The syntax and values are like
grep(1) for more details. Currently only the capabilities
are used by
have the same effect.
Print the first ten lines matching pattern and print their page number
pdfgrep -n --max-count 10 pattern foo.pdf
Search all .pdf files whose names begin with foo recursively in the current directory
pdfgrep -r --include "foo*.pdf" pattern
Search all .pdf files that are smaller than 12M recursively in the current directory
find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern
Note that in contrast to the previous examples, this task could not be solved with pdfgrep alone, but the Unix tools
had to be used. Thats because pdfgrep itself doesnt include options to exclude files by their size. But as you see, it doesnt have to!
Bugs can either be reportet to the mailing list (firstname.lastname@example.org) or to the bugtracker on gitlab (https://gitlab.com/pdfgrep/pdfgrep/issues).
pdfgrep prints a single line multiple times, if there is more than one match in that line. That doesnt mirror to the behavior of grep.
Also, the current context options dont have the same semantics as the grep ones.
pdfgrep is maintained by Hans-Peter Deifel.
See the AUTHORS file in the source for a full list of contributors.
grep(1), pcre(3), regex(7)
See pdfgreps website https://pdfgrep.org for more information, downloads, git repository and more.
|Pdfgrep 1&.4&.1 ||PDFGREP (1) ||09/26/2015 |
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.