Forums | Mahara Community

Developers /
Creating PDF of a Page


anonymous profile picture
Account deleted
Posts: 4

16 April 2014, 19:30

Hi,

I'm currently looking into the possibility of creating a PDF from a Page view in Mahara using wkhtmltopdf and a PHP wrapper.

My initial problem was getting around authentication - simply trying to produce a PDF of a page by providing the URL didn't work (it generated a PDF of the login screen) unless the page was public, a secret URL was used, or the user was logged in.

I've managed to get around this by creating a separate instance of view/view.php which uses smarty to fetch the output and generate a PDF. My plan is to put a button somewhere on the page which allows the owner to generate a PDF. This will only appear when they are logged in.

I currently have two problems:

1) Uploaded images are not displaying correctly in the generated PDF. In theory, if the user is logged in and is the owner of the images, they should appear, but instead I get a placeholder with the '?' symbol. I've managed to trace this down to the following bit of code in artefact/download.php:

 

 if (!can_view_view($viewid)) {
        throw new AccessDeniedException('');
    }

 

It would seem that this is returning false, so the images are not displaying correctly in the PDF.

 

2) Pages which contain the 'Open Badges/Mozilla Backpack' block type are not generating. Could this be a security issue?

 

I was wondering if anyone could perhaps provide some help as to how I could get around these issues?

Many thanks

Richard

Robert Lyon's profile picture
Posts: 793

20 April 2014, 16:50

Hi Richard,

This idea of saving the page as a PDF is a great idea! When you were originally trying to get the pdf to parse a url were you passing the correct $USER and session value to the generator?

This could be also why the images are not being parsed as it believes the $USER trying to access them does not have permissions.

You shouldn't need to create a new version of view/view.php but rather work out how to pass the correct $USER and session key to where it is needed

Cheers

Robert

 

 

A post by Aaron Wells was deleted

Aaron Wells's profile picture
Posts: 896

30 April 2014, 17:01

Yeah, "export to PDF" has been on our wishlist for a while now. Here's the wishlist bug: https://bugs.launchpad.net/mahara/+bug/547690

So if you get something working, let us know! :)

It sounds like the problems you're having with wkhtmltopdf are because the wkhtmltopdf script is doing its own HTTP request and fetching the page contents that way. The thing is, when this happens, since wkhtmltopdf is not passing any session data along, it just looks like a logged-out user to Mahara. That's why only publicly viewable pages show up.

The problem with images is similar. Even if you bypass the access checks in view/view.php, there's another HTTP request needed for each image in the page, so you need to bypass the access checks for those as well.

Looking at the wkhtmltopdf documentation, I think your best bet is probably to use something like the --cookie or --custom-header option to send a flag to Mahara, and then add code to Mahara that looks for that flag and bypasses the page and artefact access checks if it's present. Probably view/view.php, and artefact/file/download.php would be good places to start.

Also, in order to avoid making this a backdoor to let anyone view anyone's content, you'd need to add some authentication to the data that wkhtmltopdf is passing via that flag. Perhaps something using hashes and timestamps and a secret key stored in the Mahara database or config.php file.

Cheers,

Aaron

anonymous profile picture
Account deleted
Posts: 4

01 May 2014, 1:13

Thanks Robert and Aaron for your replies - some useful pointers!

The method I've currently got working would provide an additional button the page view - when a user clicks this it calls a PHP script which generates the page again (it's a copy of view.php with a few amendments) and then outputs the PDF file. So, this only works when the user is logged in, which is ideal, because I only really want users to be able to PDF a page when they are logged in.

If they call the PHP script without being logged in, then the PDF will simply be a login screen (as expected).

It's just the images I'm mostly having trouble with.

01 May 2014, 2:43

Hi Richard,

I would already great to create a PDF of a page, as you described. But couldn't it more useful to have a PDF export method alonside the already existing Leap2a and web site methods?

-dajan

Aaron Wells's profile picture
Posts: 896

01 May 2014, 12:45

Hi Richard,

If the PHP script you're talking about, generates the page to HTML, and then pipes that HTML to wkhtmltopdf, then the problem sounds like what I described above. Something like this:

1. The human user clicks the PDF button in their web browser

2. Their web browser sends an HTTP request to your web server, which contains a session cookie indicating the logged-in user

3. Your web server then runs your view.php PHP script, using the supplied session cookie to identify the logged-in user. This view.php script renders the page to HTML.

4. The HTML is piped to wkhtmltopdf on your server

5. wkhtmltopdf sees the images in the HTML as image tags like this: <img src="https://mahara.org/artefact/file/download.php?file=274488&view=81194&post=26630" />

6. In order to know what that image looks like, the wkhtmltopdf program makes its own HTTP request to the URL https://mahara.org/artefact/file/download.php?file=274488&view=81194 . This request has no session cookie, because it is sent by wkhtmltopdf on your server, rather than by the user's web browser on their computer, and wkhtmltopdf has its own separate "cookie jar" with no cookies in it.

7. Mahara sees the HTTP request coming from wkhtmltopdf's HTTP client. This request has no session cookie, so it is assumed to come from a logged-out user, and the image is only displayed if a logged-out user would have permission to see it.

Cheers,

Aaron

anonymous profile picture
Account deleted
Posts: 4

01 May 2014, 21:47

Hi Aaron,

Thanks for the explanation.

I've tried passing the 'mahara' cookie to wkhtmltopdf on the intial request, but the images are still not showing in the generated PDF file. Would I need to pass something else or setup a cookie jar?

Also, we're using HTTPS - would this have any further influence on how it works?

I'm assuming that it's the cookie named 'mahara' which would need to be passed, or can I get something from the Mahara session?

Thanks again for your help.

Richard

Aaron Wells's profile picture
Posts: 896

02 May 2014, 10:18

Hi Richard,

I don't think any of the existing Mahara cookies would actually be suitable for this. The Mahara session cookie just uses a random number to identify the current user's unique session, and then when the user logs in, Mahara notes that this session represents a logged-in user. So adding a Mahara session cookie to your request won't do you any good, because in a single request there will be no way to represent that it's a valid, logged-in session.

What you're going to need to do, is supply your own special cookie, and then have the download.php script (which is what Mahara uses to render images) check for the presence of that cookie.

So for example, you'd invoke wkhtmltopdf like this:

wkhtmltopdf --cookie pdfcookie 1

... and then in download.php, in the part where it checks whether you have access to view the image, you'd add code like this:

$isforpdf = get_cookie('pdfcookie');
if ($isforpdf == 1) {
    // It's okay to view the image!
}

Note that this basic implementation is rather insecure, because anyone who knows the name of the cookie and the expected value would be able to add it to an HTTP request and use that to view any image. If you wanted to make something more secure, you'd have to use Apache access rules to make sure that only requests from your local server can hit this particular file, or you could incorporate some crypto techniques into the cookie value like what we do with the webservices for the Mahara mobile app. On the other hand, if all your traffic is going over https, then that mitigates the security risks somewhat because it'd be harder for an attacker to eavesdrop and learn the cookie name & value.

Cheers,

Aaron

anonymous profile picture
Account deleted
Posts: 4

02 May 2014, 20:51

Aaron,

Thanks for the explanation - that would figure why passing the Mahara cookie to wkhtmltopdf wasn't working.

I'll try some of your suggestions of setting a custom cookie and adding some code to the download.php page.

Incidentally, if I add "define('BULKEXPORT')" to the download.php page, then the "can_view_view" function returns true, so the images are displayed.

I tried adding some additional checks, such as checking it the user is logged in and then defining BULKEXPORT, but for some reason it only works if I define BULKEXPORT at the top of the download.php script (where the other constants are defined).

I'm aware that this isn't the best option, but certainly one way of getting the "can_view_view" function to return true instead of false.

Thanks again,

Richard

15 results