Walking the HTML DOM tree in PHP
Sun, 30th Dec, '07
Walking the DOM in JavaScript has been covered well and good. But I couldn’t find substantial help when it came to walking the DOM tree of HTML files in PHP. It seems all there is to DOM is limited to XML. It would be wonderful if we could keep aside all tags and the related regex mumbo-jumbo to parse tags away, and instead check the values, text contained, IDs of elements, their class names and style rules. The function presented here walks through all the elements presented in a HTML file accessing all the node attributes and node values.
Anyways, being the wonderful thing that DOM is, I cooked up a small little function to walk the tree of an HTML file in PHP and print the output to screen, and boy does it walk. Anyway, less talk and more …
<?php
function walkDom($node, $level = 0)
{
$indent = ”;
for ($i = 0; $i < $level; $i++)
$indent .= ‘ ’; //prettifying the output
if($node->nodeType != XML_TEXT_NODE)
{
echo $indent.’<b>’.$node->nodeName.’</b>’;
if( $node->nodeType == XML_ELEMENT_NODE )
{
$attributes = $node->attributes; // get all the attributes(eg: id, class …)
foreach($attributes as $attribute)
{
echo ‘, ‘.$attribute->name.’=’.$attribute->value;
// $attribute->name is usually one of these:
// src, type, rel, link, name, value, href, onclick,
// id, class, style, title
// You can add your custom handlers depending on the Attribute.
}
//if( strlen(trim($node->childNodes->item(0)->nodeValue)) > 0 && count($cNodes) == 1 )
//echo ‘<br>’.$indent.’(contains=’.$node->childNodes->item(0)->nodeValue.’)'; // do this to print the contents of a node, which maybe the link text, contents of div and so on.
}
echo ‘<br><br>’;
}
$cNodes = $node->childNodes;
if (count($cNodes) > 0)
{
$level++ ; // go one level deeper
foreach($cNodes as $cNode)
walkDom($cNode, $level); //so this is recursion my professor kept talkin’ about
$level = $level – 1; // come a level up, and had to do it this way or else wordpress would take away one dash. ![]()
}
}
?>
Is that good?? Because here is how you use it:
<?php
$doc = new DOMDocument();
@$doc->loadHTMLFile(‘http://www.google.com’);
walkDom($doc);
?>
And this prints away the entire DOM of the read in file specified by the URL to loadHTMLFile. More information about the used constants and functions can be found here. And believe me, this works.
Preparing a secure login form with PHP & JavaScript
Wed, 26th Dec, '07
We have had encryption, we have had SSLs, … well we also had digitally signed certificates, but where’s the hack?? How can you cook up a secure login form that does the following:
[1] doesn’t send the login information in clear-text
[2] in case somebody is sniffing the line, he/she shouldn’t be able to login with the sniffed information
So, with the above information in hand, here is what we do, and how we do.
You need:
[1] http://www.webtoolkit.info/javascript-md5.html [javascript implementation of MD5]
[2] Two php functions. runquery($query) , which will run a query supplied as string & getcol($query) will get the column asked for in a select statement.
Now, create a table, if you don’t already have, to store user information. What we do want to be stored is the login timestamp.
create table user(
loginid varchar(200),
password varchar(200),
lastLoginTS bigint
);
Now your login.php file should look like this:
<html>
<head>
<title>Secure Login Form</title>
<script type=”text/javascript” src=”md5.js”></script>
</head>
<body>
<form action=”dologin.php” method=”post” onsubmit=”javascript:document.getElementById(‘phash’).value = MD5(document.getElementById(‘password’).value + document.getElementById(‘hts’).value); document.getElementById(‘password’).value = ” ;”>
LoginID: <input type=”text” name=”loginid”><br>
Password: <input type=”password” name=”password” id=”password”><br>
<?php
$TS = time(); //the current timestamp
echo “<input type=’hidden’ value=’”.$TS.”‘ name=’hts’ id=’hts’><br>”;
?>
<input type=’hidden’ name=’phash’ id=’phash’ value=”>
<input type=”submit” value=”send”>
</form>
</body>
</html>
This file assumes in the line “<script type=”text/javascript” src=”md5.js”></script>” that you have a file named md5.js in the same directory as login.php, and the javascript file should have a function named MD5(). So make sure this is the case. Next up is dologin.php:
<?php
$loginid = $_REQUEST['loginid'];
$phash = $_REQUEST['phash'];
$hts = $_REQUEST['hts'];
$password = getcolumn(“select password from user where loginid=’$loginid’;”);
$lastLoginTS = getcolumn(“select lastLoginTS from user where loginid=’$loginid’;”);
if(strlen($loginid) > 0 && strlen($phash) > 0 && $phash == md5($password.$hts) && $hts > $lastLoginTS)
{
runquery(“update user set lastLoginTS=’”.time().”‘ where loginid=’$loginid’;”);
echo “done”;
}
else
echo “failed”;
?>
That’s it. So now what is the deal here. This is your regular login form except that the password is hashed with a timestamp value sent in a hidden form field named ‘hts’. The hashing is done in the event handler for the Javascript onsubmit event, and the password field is cleared as well, to prevent it from being sent in the clear.
The server receives the loginid, timestamp and the hashed value from the client. Retrieve from your database the original password for the loginid specified and calculate another hash at the server with the help of the retrieved original password and the timestamp sent by the client. If the user typed the password correctly, the hashes will match.
This method of course, sort of, encrypts the password and hence prevents the password from being sent in clear, should anybody be sniffing the lines. But should anybody be really sniffing the lines, he/she can just store the values and send them again and again to validate him/herself at the server posing as the valid user. To prevent that, there is another check at the server just before validating. The timestamp sent by the client should be always greater than the last login timestamp stored in the database for that user. Since the last login timestamp is only updated on a successful login, as soon as a valid user logs in, the last login timestamp for the user is updated in the database, and as a result, the sniffed information is rendered stale. The hash now needs to be calculated again using a fresh timestamp and a password which only the user and server know.
I hope this suffices for most of you out there.
UPDATE : This is a proof of concept. The system described here lacks certain things which are very obvious and shouldn’t omit them just because I haven’t mentioned them here to make it simple to grasp. Foremost[thanks William], don’t store passwords in cleartext on the server. Try looking up “hashing password with salts” for that.
How long do your users stay on a page??
Tue, 25th Dec, '07
So … here’s the real deal into measuring how much time a user spends on each individual page by url, and measured in milliseconds.
[1] As soon as page loads, set the current time in a variable in javascript with the help of the onload event. Let this variable be called tstart.
[2] On the unload event, get the current timestamp, and subtract from this the starting timestamp, the first one. So tTotal = tend – tstart.
[3] Now send this time information alongwith location.href to your server, which will record this in a log file, or database to use later, maybe to serve relevant content by keeping in mind a users viewing and browsing patterns.You get the picture right.
So here are the files. Just store monitorme.html & logtimefile.php somewhere on your server and create a writeable file in the same directory named as timelog.txt. Now get monitorme.html and now refresh your page, navigate away to some other page or just plain close the window to find your timelog.txt file piling up with times you spent on the page.
monitorme.html
<html>
<head>
<title>Duration Logging Demo</title>
<script type=”text/javascript”>
var oRequest;
var tstart = new Date();
// ooooo, ajax. ooooooo …
if(window.XMLHttpRequest)
oRequest = new XMLHttpRequest();
else if(window.ActiveXObject)
oRequest = new ActiveXObject(“Microsoft.XMLHTTP”);
function sendAReq(sendStr)
// a generic function to send away any data to the server
// specifically ‘logtimefile.php’ in this case
{
oRequest.open(“POST”, “logtimefile.php”, true); //this is where the stuff is going
oRequest.setRequestHeader(“Content-Type”, “application/x-www-form-urlencoded”);
oRequest.send(sendStr);
}
function calcTime()
{
var tend = new Date();
var totTime = (tend.getTime() – tstart.getTime())/1000;
msg = “[URL:" location.href "] Time Spent: ” totTime ” seconds”;
sendAReq(‘tmsg=’ msg);
}
</script>
</head>
<body onbeforeunload=”javascript:calcTime();”>
Hi, navigate away from this page or Refresh this page to find the time you spent seeing
this page in a log file in the server.
</body>
</html>
logtimefile.php
<?php
function logtimemsg($timemsg)
{
//write your own handling code here, store it in a file or store it in a DB, whatever
$logfilename = ‘timelog.txt’;
if (is_writable($logfilename))
{
if (!$handle = fopen($logfilename, ‘a’))
{
exit;
}
if (fwrite($handle, $timemsg.”\r\n”) === FALSE)
{
exit;
}
fclose($handle);
}
}
logtimemsg($_REQUEST['tmsg']);
?>
UPDATE: People, I love if anyone of you writes back, even if to tell that this doesn’t work. Thanks to JayVee who pointed that <body onunload=”> sucks compared to <body onbeforeunload=”> for this post. Change accomodated.
Automatic Javascript Bug Reporting Using AJAX
Mon, 24th Dec, '07
I spend a considerable time debugging errors, and being a Web developer spans debugging CSS, JS and server end files as well. Often I find myself rolling up and down scrollbars on MySQL manual pages.
It annoys me no end to find that a user is seeing Javascript errors. And since Javascript being something that browsers have been trusted with, they had a field day unsupporting it. So it is very common to find that what you think is OK on browser1 is NOT_OK on browser2. And worse still, this incompatibility is reported by someone who is seeing a webpage on a live setup, a user or someone who least expects a nasty bug or something.
Now try this, save this as something.html on your server:
<html>
<head>
<title>Auto JS Bug Reporting Demo</title>
<script type=”text/javascript”>
var msg = null;
var opdiv;
var oRequest;
// ooooo, ajax. ooooooo …
if(window.XMLHttpRequest)
oRequest = new XMLHttpRequest();
else if(window.ActiveXObject)
oRequest = new ActiveXObject(“Microsoft.XMLHTTP”);
onerror = handleErrors;
function handleErrors(errorMessage, url, line)
{
msg = “[URL:" url ", line no.: " line "] ERROR: ” errorMessage;
sendAReq(“err=” msg, ‘errdiv’);
return true;
}
function sendAReq(sendStr, odiv)
// a generic function to send away any data to the server
// specifically ‘handleerror.php’ in this case
// what the server replies is handled by showcontent()
{
opdiv = odiv;
oRequest.open(“POST”, “handleerror.php”, true); //this is where the stuff is going
oRequest.onreadystatechange = showcontent;
oRequest.setRequestHeader(“Content-Type”, “application/x-www-form-urlencoded”);
oRequest.send(sendStr);
}
function showcontent()
{
if(oRequest.readyState == 4)
{
// if the output is ready, print it to the div as set by the caller function
// in this case, the opdiv, the output div, was set to ‘errdiv’ in handleErrors(…)
if(oRequest.status == 200)
document.getElementById(opdiv).innerHTML = oRequest.responseText;
else
document.getElementById(opdiv).innerHTML = “<font color=#FF3300 > Try Again Later </font>”;
}
}
</script>
</head>
<body onload=”javascript:wow();”>
<div id=”testdiv”>
I am a div. Press this button to cause an error.<br>
<input type=”button” onclick=”javascript:sendAReq(‘err=hello, ‘errdiv’);” value=”GET some FUN”>
</div>
<div id=”errdiv”>
</div>
</body>
</html>
So this file has two Javascript errors. One is in the body onload function, which calls a nonexistent function, wow() . And the other is caused in the onclick event of the only input button on this page.
So this page is easy in the way that …
[1] It assigns handleErrors as the default error handler in the line onerror=handleErrors ;
[2] It puts together a error report in handleErrors and then calls sendAReq with the report string and the name of the output div, in this case being ‘errdiv’.
[3] sendAReq sends the error report to handleerror.php and sets the output div dutifully to ‘errdiv’ in the first line itself. The output div we keep referring to here is the div to which the server will send the output. Oh … so you looking for handleerror.php??
<?php
function logerr($errmsg)
{
//write your own handling code here, store it in a file or store it in a DB, whatever
$errfilename = ‘jsbugs.txt’;
if (is_writable($errfilename))
{
if (!$handle = fopen($errfilename, ‘a’))
{
exit;
}
if (fwrite($handle, $errmsg.’\r\n’) === FALSE)
{
exit;
}
fclose($handle);
}
}
logerr($_REQUEST['err']);
echo “I got this: “.$_REQUEST['err'];
?>
[4] So handleerror.php gets the error report Javascript compiled and stores it in a file named jsbugs.txt. make sure you have a file created already named jsbugs.txt with write permissions before you run this script. Should you want to do more fancy things, you can store the User-Agent string, timestamp and a lot more things in a database maybe. I leave that to you.
So now, you are ready with something.html about to be fetched in your browser, and handleerror.php copied in the web root directory and jsbugs.txt in the same directory as well, with write permissions.
Now get something.html and you should see your jsbugs.txt file piling up with cute strings, which may later turn up to be sleep-stealing, girlfriend-eating monsters.




