Python for Trading & Investing: Crack PDF Password using John the Ripper in Ubuntu 20.04

Scenario:

So I have this old PDF file which is password protected and forgot the password. I tried various permutations and combinations that came to my mind and they did not work. So what to do?

I started with PDFcrack and almost 3 days later, the program is still running !!!. So that's when I started looking at alternatives. Nothing wrong with PDFcrack per se, just that it is one of the limitations of brute force password cracking. In fact there is even a beautiful blogpost from Ruby Pdf Technologies on some of the intricacies of PDFcrack.

I remembered using something earlier when I was experimenting with Kali Linux. So I went back to the "gold standard" which is called John the Ripper password cracker.

Types of Password's:

PDFs can be encrypted for confidentiality by requiring either a user password or a owner password (as in case of DRM). PDFs encrypted with a user password can only be opened by providing this password. PDFs encrypted with a owner password can be opened without providing a password, but some restrictions will apply (for example, printing could be disabled).

Ideally what we need to do first is to determine if the PDF is protected with a user password or an owner password. For this we can use QPDF . But in this case to keep things short and easy, we will assume that we are using a user password. So what we are trying to do here is crack the user password.

Prerequisite:

I assume you have some knowledge about Linux system and the Terminal and command line. I also assume you have some basic knowledge about cracking, encryption and decryption of password. For this blog, the following steps were performed in Ubuntu Desktop 20.04 LTS.

Installation of John the Ripper from existing repos:

There are many ways in which we can install John the ripper . The easiest is to just go to the Ubuntu Software Centre, search for the software and install it. In our case, we will use the command line. So first open the terminal by pressing CTRL+ALT+T and run the following command

sudo apt install john

Next just type in john in the terminal and you will see the entire list of options for usage. If you get an output like the one shown below, it means that the installation is done correctly.

Also try the john --test to run tests and benchmark for time.

As you can see there are various options. By starting John The Ripper without any options, it will first run in single crack mode and then in wordlist mode until it finds the password (secret). But you can also provide your own wordlists (with option –wordlist) and use rules (option –rules) or work in incremental mode (–incremental).

Uninstall John from Ubuntu:

To uninstall John from Ubuntu, it is better to remove the binaries of the package and also the configuration files. This can be achieved by

sudo apt-get purge john

Build John the Ripper from Source Code:

The Main reason we need to build John the Ripper from the source is becos it uses all the features of your processor. The compilation takes into account the specifics of your system. As a result, during compilation, the instruction sets supported by the processor will be indicated, which will have a very significant impact on performance. For some algorithms, this will speed up brute-forcing substantially. Alternatively, John's binaries compiled on newer hardware may not work on some computers.

Let us first install the dependencies in ubuntu

sudo apt install build-essential libssl-dev yasm libgmp-dev libpcap-dev libnss3-dev libkrb5-dev pkg-config

Open the Terminal, Choose the directory in which you want to have your John the Ripper and download latest version of JohnTheRipper from GitHub.

wget https://github.com/openwall/john/archive/bleeding-jumbo.zip
unzip bleeding-jumbo.zip
rm bleeding-jumbo.zip
cd john-bleeding-jumbo/src/
./configure && make

Please note that in the last command you can use the -j option after which specify the number of (logical) cores of your processor, for example, I have 8 logical cores, then I use:

./configure && make -j8

Now go to the run folder:

cd ../run

And run the test:

./john --test

You don't need to install John the Ripper at the system level – move the run folder to a location convenient to you and run John from there. In addition to the John the Ripper executable, the run folder contains many scripts for extracting hashes – we will talk about them later.

Note: Remember that if you type in the terminal

john

then the version preinstalled on the system will run, not the one you compiled. To get the one we compiled, come to the run folder and chek on --help

john-bleeding-jumbo/run$ ./john --help

How does John the Ripper Crack Passwords ?

John the ripper by itself cannot type in a password and open the PDF file. John only cracks hashes.

Hashes are the output of a hashing algorithm like Message Digest 5(MD5 ) or Secure Hash Algorithm(SHA). These algorithms essentially aim to produce a unique, fixed-length string – the hash value, or “message digest” – for any given piece of data or “message”. As every file on a computer is, ultimately, just data that can be represented in binary form, a hashing algorithm can take that data and run a complex calculation on it and output a fixed-length string as the result of the calculation. The result is the file’s hash value or message digest. For Linux, the command line md5 can be used to calculate a files hash. And irrespective of whether you’re using Windows, Mac or Linux, the hash value will be identical for any given file and hashing algorithm. If two different files could produce the same digest, we would have a "collision", we would not be able to use the hash as a reliable identifier for that file. The possibility of producing a collision is very rare, but not unheard of. For this reason, more secure algorithms like SHA-2 have replaced SHA-1 and MD5.

So what we need to do is create a hash which John the ripper can crack. John has utilities for doing precisely this. When we install John from the source, we an find these utilities in the run folder:

This run folder has all the *2john* utilities. Some are of .python extensions and some are of .perl extensions. These scripts in this folder are authored by eminent researchers in the field of security. Of interest to us is pdf2john.pl

To extract the hash, run a command like this:

pdf2john FILE > pdf.hash

In the above command, the pdf file that needs to be cracked is FILE and the hash output is saved in pdf.hash. And once we have extracted the hash file, how do we know that we got the hash correctly ?

$ ./john --list=format-all-details --format=PDF

Pay attention to the "Example Ciphertext", besides the minimum and maximum length of password.

Sometimes it is also good to compare with hashes of a similar type. Check this page for an example of all Hashes

Crack the PDF Password:

For this example, I have created a dummy pdf file and saved it with a password. I have called this file as "hello342.pdf" and saved it in the documents directory.

First let us generate the hashfile named mypdf.hash:

$ ./pdf2john.pl '/home/pavan/Documents/hello342.pdf' > mypdf.hash

Since we are running this from the run directory of john-bleeding-jumbo , the mypdf.hash file is stored in the same directory. In case the hello342.pdf is also located in the same folder, you don't need to mention the whole path.

$ ./john --min-length=1 --max-length=3 mypdf.hash

Let us see the output:

Yahoo !!!!. It took less than a few seconds for John to crack the 3 digit password.

Now if you pay attention to the output you will see "Proceeding with wordlist:./password.lst, lengths: 1-3". Now closely check the list of files in run directory and you will see that there is one file titled "password.lst". When we ran the previous command, this file is basically used as a wordlist which is referenced by john to crack the password. We can create our own wordlist in lieu of the standard.

Crack a PDF with 7 digit password:

Let us create a dummy file with a 7 digit password. We can do this by creating a word file, saving it as a PDF and then encrypting that pdf by using this online site. Lets say we have forgotten the password, except that we remember it is digits and of 7 in length. Try with john .....

The above crack just took a few seconds. But what if we did not know that it was a 7 digit password ? What if we did not know that it contains ONLY digits ? What if it were a combination of digits and words and special characters ? In such a scenario, what we do is run john just with the pdf hash. By default john will try to crack using the "Single Crack Mode". Then it switches to cracking with a wordlist and finally it uses the incremental mode list to crack the password. Check this link on Different Modes that john uses

This time around it took quite some time to crack. more than a few minutes time to crack. 30 minutes letter, it is still running ...

Try to Crack a PDF Password with bigger length:

Let us test with a numeric only, longer password, say 11 digits. One that requires a lot more resources. Can we use all the resources of our CPU ? To significantly speed up the cracking speed, use the --fork=NUMBER option, set the number of logical CPU cores on your computer as a number. For example, if there are 12 logical cores, then you need to use the --fork=12 option. Although that's how the theory goes, for all practical purposes, I have noticed that it is better to use the Physical cores than the Logical Cores. To iterate over all passwords consisting only of numbers of length 11 characters and perform calculations on the CPU, using all the resources of the CPU,run a command like this:

./john --fork=4 --mask='?d' --min-length=11 --max-length=11 mypdf.hash

The --mask='?d' in the above command implies digits. Like the dictionary mode of attack, the mask mode is another and the brute force mode is another. There are a host of modes which john can use and details are beyond the scope of this blog. There are multiple permutations and combinations we can deploy in cracking a password using John the Ripper.

Now let us check the CPU consumption while executing John:

The CPU gets heated very fast and there are options to deal with this also in John. Go ahead. Explore them. Check out the examples in openwall

Its about 30 minutes since I started running john to crack the pdf has .Finally let us take a look at the current output

Today is 16 Nov. Should I wait till 23 Nov to see the output ??? Or maybe switch over to cracking with the Nvidia GPU !!!. While a CPU has anywhere from 1–72 cores mainly optimized for sequential serial processing GPU has 1000’s of cores with 1000’s of threads for parallel processing.

But here is the catch - John the Ripper cant be run on GPU for the purpose of cracking a PDF file.

So lets take a look at hashcat

Let us say that I know that the last two digits are 47, so let me give this another try, now a dictionary attack. First I create a pdf hash, then I generate a dictionary mypasswords.txt based on my criteria. When I tried, the file size of mypasswords.txt was around 12 GB. Let us iterate yet again. Let us try with last 4 digits as 0647 and try a dictionary attack. This time the filesize of mypasswords.txt was around 120 MB. Now that seems better option to try.

$ ./pdf2john.pl '/home/pavan/Documents/ram.pdf' > mypdf.hash

$ ./john --mask='?d?d?d?d?d?d?d0647' --min-length=11 --max-length=11 --stdout > mypasswords.txt

$ ./john --fork=4 --wordlist='mypasswords.txt' mypdf.hash

Keep trying. IN another blog I will show the usage of GPU for cracking using hashcat.

Resources for Reference:

John the Ripper password cracker
Ethical hacking and penetration testing
john Package Description
Didier Stevens Blog

hashcat advanced password recovery

hashcat Package Description

Python for Trading & Investing

16 November 2020

Crack PDF Password using John the Ripper in Ubuntu 20.04