Scenario:
So I have this old PDF file which is password protected and forgot
the password. I tried various permutations and combinations that
came to my mind and they did not work. So what to do?
I started with
PDFcrack and almost 3 days later, the program is still running !!!. So
that's when I started looking at alternatives. Nothing wrong with PDFcrack
per se, just that it is one of the limitations of brute force password
cracking. In fact there is even a
beautiful blogpost from Ruby Pdf Technologies on some of the intricacies of PDFcrack.
I remembered using something earlier when I was experimenting with
Kali Linux. So I went
back to the "gold standard" which is called
John the Ripper password cracker.
Types of Password's:
PDFs can be encrypted for confidentiality by requiring either a user
password or a owner password (as in case of DRM). PDFs encrypted with a user password can only be opened by providing
this password. PDFs encrypted with a owner password can be opened without
providing a password, but some restrictions will apply (for example,
printing could be disabled).
Ideally what we need to do first is to determine if the PDF is protected with a user password or an owner
password. For this we can use
QPDF . But in this case to keep things short and easy, we will assume
that we are using a user password. So what we are trying to do here is crack the user password.
Prerequisite:
I assume you have some knowledge about Linux system and the Terminal and
command line. I also assume you have some basic knowledge about cracking,
encryption and decryption of password. For this blog, the following steps were performed in Ubuntu Desktop 20.04 LTS.
Installation of John the Ripper from existing repos:
There are many ways in which we can install John the ripper . The easiest
is to just go to the Ubuntu Software Centre, search for the software and
install it. In our case, we will use the command line. So first open the
terminal by pressing CTRL+ALT+T and run the following command
sudo apt install john
Next just type in john in the terminal and you will see the
entire list of options for usage. If you get an output like the one shown
below, it means that the installation is done correctly.
Also try the john --test to run tests and benchmark for time.
As you can see there are various options. By starting John The Ripper without any options, it will first run in
single crack mode and then in wordlist mode until it finds the password
(secret). But you can also provide your own wordlists (with option –wordlist) and use
rules (option –rules) or work in incremental mode (–incremental).
Uninstall John from Ubuntu:
To uninstall John from Ubuntu, it is better to remove the binaries of the
package and also the configuration files. This can be achieved by
Build John the Ripper from Source Code:
The Main reason we need to build John the Ripper from the source is becos
it uses all the features of your processor. The compilation takes into
account the specifics of your system. As a result, during compilation, the
instruction sets supported by the processor will be indicated, which will
have a very significant impact on performance. For some algorithms, this
will speed up brute-forcing substantially. Alternatively, John's binaries
compiled on newer hardware may not work on some computers.
Let us first install the dependencies in ubuntu
sudo apt install build-essential libssl-dev yasm libgmp-dev libpcap-dev libnss3-dev libkrb5-dev pkg-config
Open the Terminal, Choose the directory in which you want to have your
John the Ripper and download latest version of JohnTheRipper from GitHub.
wget https://github.com/openwall/john/archive/bleeding-jumbo.zip
unzip bleeding-jumbo.zip
rm bleeding-jumbo.zip
cd john-bleeding-jumbo/src/
./configure && make
Please note that in the last command you can use the -j option after which
specify the number of (logical) cores of your processor, for example, I have
8 logical cores, then I use:
Now go to the run folder:
You don't need to install John the Ripper at the system level – move the
run folder to a location convenient to you and run John from there. In
addition to the John the Ripper executable, the run folder contains many
scripts for extracting hashes – we will talk about them later.
Note: Remember that if you type in the terminal
then the version preinstalled on the system will run, not the one you
compiled. To get the one we compiled, come to the run folder and chek
on --help
john-bleeding-jumbo/run$ ./john --help
How does John the Ripper Crack Passwords ?
John the ripper by itself cannot type in a password and open the
PDF file. John only cracks hashes.
Hashes are the output of a hashing algorithm like Message Digest
5(MD5 ) or Secure Hash Algorithm(SHA). These algorithms
essentially aim to produce a unique, fixed-length string – the hash
value, or “message digest” – for any given piece of data or
“message”. As every file on a computer is, ultimately, just data
that can be represented in binary form, a hashing algorithm can take
that data and run a complex calculation on it and output a
fixed-length string as the result of the calculation. The result is
the file’s hash value or message digest. For Linux, the command line
md5 can be used to calculate a files hash. And irrespective of
whether you’re using Windows, Mac or Linux, the hash value will be
identical for any given file and hashing algorithm. If two
different files could produce the same digest, we would have a
"collision", we would not be able to use the hash as a reliable
identifier for that file. The possibility of producing a collision
is very rare, but not unheard of. For this reason, more secure
algorithms like SHA-2 have replaced SHA-1 and MD5.
So what we need to do is create a hash which John the ripper can
crack. John has utilities for doing precisely this. When we install
John from the source, we an find these utilities in the run folder:
This run folder has all the *2john* utilities. Some are of .python
extensions and some are of .perl extensions. These scripts in this
folder are authored by eminent researchers in the field of security.
Of interest to us is pdf2john.pl
To extract the hash, run a command like this:
In the above command, the pdf file that needs to be cracked is
FILE and the hash output is saved in pdf.hash. And once
we have extracted the hash file, how do we know that we got the hash
correctly ?
$ ./john --list=format-all-details --format=PDF
Pay attention to the "Example Ciphertext", besides the minimum and
maximum length of password.
Crack the PDF Password:
For this example, I have created a dummy pdf file and saved it with a
password. I have called this file as "hello342.pdf" and saved it in the
documents directory.
First let us generate the hashfile named mypdf.hash:
$ ./pdf2john.pl '/home/pavan/Documents/hello342.pdf' > mypdf.hash
Since we are running this from the run directory
of john-bleeding-jumbo , the mypdf.hash
file is stored in the same directory. In case the hello342.pdf is
also located in the same folder, you don't need to mention the whole
path.
$ ./john --min-length=1 --max-length=3 mypdf.hash
Let us see the output:
Yahoo !!!!. It took less than a few seconds for John to crack the 3 digit
password.
Now if you pay attention to the output you will see "Proceeding with wordlist:./password.lst, lengths: 1-3". Now closely check the list of files in run directory and you will see
that there is one file titled "password.lst". When we ran the previous command, this file is basically used as a
wordlist which is referenced by john to crack the password. We can
create our own wordlist in lieu of the standard.
Crack a PDF with 7 digit password:
Let us create a dummy file with a 7 digit password. We can do this by
creating a word file, saving it as a PDF and then
encrypting that pdf by using this online site. Lets say we have forgotten the password, except that we remember it
is digits and of 7 in length. Try with john .....
The above crack just took a few seconds. But what if we did not know that
it was a 7 digit password ? What if we did not know that it contains ONLY
digits ? What if it were a combination of digits and words and special
characters ? In such a scenario, what we do is run john just with the pdf hash. By
default john will try to crack using the "Single Crack Mode". Then it
switches to cracking with a wordlist and finally it uses the incremental
mode list to crack the password. Check this
link on Different Modes that john uses
This time around it took quite some time to crack. more than a few minutes
time to crack. 30 minutes letter, it is still running ...
Try to Crack a PDF Password with bigger length:
Let us test with a numeric only, longer password, say 11 digits. One
that requires a lot more resources. Can we use all the resources
of our CPU ? To significantly speed up the cracking speed, use the --fork=NUMBER
option, set the number of logical CPU cores on your computer as a
number. For example, if there are 12 logical cores, then you need to
use the --fork=12 option. Although that's how the theory goes, for all
practical purposes, I have noticed that it is better to use the
Physical cores than the Logical Cores. To iterate over all passwords consisting only of numbers of length 11
characters and perform calculations on the CPU, using all the
resources of the CPU,run a command like this:
./john --fork=4 --mask='?d' --min-length=11 --max-length=11 mypdf.hash
The --mask='?d' in the above command implies digits. Like
the dictionary mode of attack, the mask mode is another and the brute force
mode is another. There are a host of modes which john can use and details
are beyond the scope of this blog. There are multiple permutations and combinations we can deploy in cracking
a password using John the Ripper.
Now let us check the CPU consumption while executing John:
The CPU gets heated very fast and there are options to deal with this
also in John. Go ahead. Explore them. Check out
the examples in openwall
Its about 30 minutes since I started running john to crack the pdf has
.Finally let us take a look at the current output
Today is 16 Nov. Should I wait till 23 Nov to see the output ??? Or maybe
switch over to cracking with the Nvidia GPU !!!. While a CPU has
anywhere from 1–72 cores mainly optimized for sequential serial processing
GPU has 1000’s of cores with 1000’s of threads for parallel
processing.
But here is the catch - John the Ripper cant be run on GPU for the purpose
of cracking a PDF file.
Let us say that I know that the last two digits are 47, so let me give this
another try, now a dictionary attack. First I create a pdf hash, then I
generate a dictionary mypasswords.txt based on my criteria.
When I tried, the file size of mypasswords.txt was around 12 GB. Let us iterate yet again. Let us try with last 4 digits as 0647 and try a
dictionary attack. This time the filesize of mypasswords.txt was
around 120 MB. Now that seems better option to try.
$ ./pdf2john.pl '/home/pavan/Documents/ram.pdf' > mypdf.hash
$ ./john --mask='?d?d?d?d?d?d?d0647' --min-length=11 --max-length=11 --stdout > mypasswords.txt
$ ./john --fork=4 --wordlist='mypasswords.txt' mypdf.hash
Keep trying. IN another blog I will show the usage of GPU for cracking using hashcat.
Resources for Reference:
No comments:
Post a Comment