Walking the DOM in JavaScript has been covered well and good. But I couldn’t find substantial help when it came to walking the DOM tree of HTML files in PHP. It seems all there is to DOM is limited to XML. It would be wonderful if we could keep aside all tags and the related regex mumbo-jumbo to parse tags away, and instead check the values, text contained, IDs of elements, their class names and style rules. The function presented here walks through all the elements presented in a HTML file accessing all the node attributes and node values.

Anyways, being the wonderful thing that DOM is, I cooked up a small little function to walk the tree of an HTML file in PHP and print the output to screen, and boy does it walk. Anyway, less talk and more …
<?php
function walkDom($node, $level = 0)
{
$indent = ”;
for ($i = 0; $i < $level; $i++)
$indent .= ‘&nbsp;&nbsp;’; //prettifying the output
if($node->nodeType != XML_TEXT_NODE)
{
echo $indent.'<b>’.$node->nodeName.'</b>’;
if( $node->nodeType == XML_ELEMENT_NODE )
{
$attributes = $node->attributes; // get all the attributes(eg: id, class …)
foreach($attributes as $attribute)
{
echo ‘, ‘.$attribute->name.’=’.$attribute->value;
// $attribute->name is usually one of these:
// src, type, rel, link, name, value, href, onclick,
// id, class, style, title
// You can add your custom handlers depending on the Attribute.
}
//if( strlen(trim($node->childNodes->item(0)->nodeValue)) > 0 && count($cNodes) == 1 )
//echo ‘<br>’.$indent.'(contains=’.$node->childNodes->item(0)->nodeValue.’)’; // do this to print the contents of a node, which maybe the link text, contents of div and so on.
}
echo ‘<br><br>’;
}
$cNodes = $node->childNodes;
if (count($cNodes) > 0)
{
$level++ ; // go one level deeper
foreach($cNodes as $cNode)
walkDom($cNode, $level); //so this is recursion my professor kept talkin’ about
$level = $level – 1; // come a level up, and had to do it this way or else wordpress would take away one dash. 😦
}
}
?>

Is that good?? Because here is how you use it:
<?php
$doc = new DOMDocument();
@$doc->loadHTMLFile(‘http://www.google.com&#8217;);
walkDom($doc);
?>

And this prints away the entire DOM of the read in file specified by the URL to loadHTMLFile. More information about the used constants and functions can be found here. And believe me, this works.


So … here’s the real deal into measuring how much time a user spends on each individual page by url, and measured in milliseconds.

[1] As soon as page loads, set the current time in a variable in javascript with the help of the onload event. Let this variable be called tstart.

[2] On the unload event, get the current timestamp, and subtract from this the starting timestamp, the first one. So tTotal = tend – tstart.

[3] Now send this time information alongwith location.href to your server, which will record this in a log file, or database to use later, maybe to serve relevant content by keeping in mind a users viewing and browsing patterns.You get the picture right.

So here are the files. Just store monitorme.html & logtimefile.php somewhere on your server and create a writeable file in the same directory named as timelog.txt. Now get monitorme.html and now refresh your page, navigate away to some other page or just plain close the window to find your timelog.txt file piling up with times you spent on the page.

monitorme.html

<html>
<head>
<title>Duration Logging Demo</title>
<script type=”text/javascript”>
var oRequest;
var tstart = new Date();

// ooooo, ajax. ooooooo …
if(window.XMLHttpRequest)
oRequest = new XMLHttpRequest();
else if(window.ActiveXObject)
oRequest = new ActiveXObject(“Microsoft.XMLHTTP”);

function sendAReq(sendStr)
// a generic function to send away any data to the server
// specifically ‘logtimefile.php’ in this case
{
oRequest.open(“POST”, “logtimefile.php”, true); //this is where the stuff is going
oRequest.setRequestHeader(“Content-Type”, “application/x-www-form-urlencoded”);
oRequest.send(sendStr);
}

function calcTime()
{
var tend = new Date();
var totTime = (tend.getTime() – tstart.getTime())/1000;
msg = “[URL:” location.href “] Time Spent: ” totTime ” seconds”;
sendAReq(‘tmsg=’ msg);
}
</script>
</head>

<body onbeforeunload=”javascript:calcTime();”>
Hi, navigate away from this page or Refresh this page to find the time you spent seeing
this page in a log file in the server.
</body>
</html>

logtimefile.php

<?php
function logtimemsg($timemsg)
{
//write your own handling code here, store it in a file or store it in a DB, whatever
$logfilename = ‘timelog.txt’;
if (is_writable($logfilename))
{
if (!$handle = fopen($logfilename, ‘a’))
{
exit;
}
if (fwrite($handle, $timemsg.”\r\n”) === FALSE)
{
exit;
}
fclose($handle);
}
}

logtimemsg($_REQUEST[‘tmsg’]);
?>

UPDATE: People, I love if anyone of you writes back, even if to tell that this doesn’t work. Thanks to JayVee who pointed that <body onunload=”> sucks compared to <body onbeforeunload=”> for this post. Change accomodated.


I spend a considerable time debugging errors, and being a Web developer spans debugging CSS, JS and server end files as well. Often I find myself rolling up and down scrollbars on MySQL manual pages.

It annoys me no end to find that a user is seeing Javascript errors. And since Javascript being something that browsers have been trusted with, they had a field day unsupporting it. So it is very common to find that what you think is OK on browser1 is NOT_OK on browser2. And worse still, this incompatibility is reported by someone who is seeing a webpage on a live setup, a user or someone who least expects a nasty bug or something.

Now try this, save this as something.html on your server:

<html>
<head>
<title>Auto JS Bug Reporting Demo</title>
<script type=”text/javascript”>
var msg = null;
var opdiv;
var oRequest;

// ooooo, ajax. ooooooo …
if(window.XMLHttpRequest)
oRequest = new XMLHttpRequest();
else if(window.ActiveXObject)
oRequest = new ActiveXObject(“Microsoft.XMLHTTP”);

onerror = handleErrors;
function handleErrors(errorMessage, url, line)
{
msg = “[URL:” url “, line no.: ” line “] ERROR: ” errorMessage;
sendAReq(“err=” msg, ‘errdiv’);
return true;
}

function sendAReq(sendStr, odiv)
// a generic function to send away any data to the server
// specifically ‘handleerror.php’ in this case
// what the server replies is handled by showcontent()
{
opdiv = odiv;
oRequest.open(“POST”, “handleerror.php”, true); //this is where the stuff is going
oRequest.onreadystatechange = showcontent;
oRequest.setRequestHeader(“Content-Type”, “application/x-www-form-urlencoded”);
oRequest.send(sendStr);
}

function showcontent()
{
if(oRequest.readyState == 4)
{
// if the output is ready, print it to the div as set by the caller function
// in this case, the opdiv, the output div, was set to ‘errdiv’ in handleErrors(…)
if(oRequest.status == 200)
document.getElementById(opdiv).innerHTML = oRequest.responseText;
else
document.getElementById(opdiv).innerHTML = “<font color=#FF3300 > Try Again Later </font>”;
}
}
</script>
</head>

<body onload=”javascript:wow();”>
<div id=”testdiv”>
I am a div. Press this button to cause an error.<br>
<input type=”button” onclick=”javascript:sendAReq(‘err=hello, ‘errdiv’);” value=”GET some FUN”>
</div>
<div id=”errdiv”>
</div>
</body>
</html>

So this file has two Javascript errors. One is in the body onload function, which calls a nonexistent function, wow() . And the other is caused in the onclick event of the only input button on this page.

So this page is easy in the way that …

[1] It assigns handleErrors as the default error handler in the line onerror=handleErrors ;

[2] It puts together a error report in handleErrors and then calls sendAReq with the report string and the name of the output div, in this case being ‘errdiv’.

[3] sendAReq sends the error report to handleerror.php and sets the output div dutifully to ‘errdiv’ in the first line itself. The output div we keep referring to here is the div to which the server will send the output. Oh … so you looking for handleerror.php??


<?php
function logerr($errmsg)
{
//write your own handling code here, store it in a file or store it in a DB, whatever
$errfilename = ‘jsbugs.txt’;
if (is_writable($errfilename))
{
if (!$handle = fopen($errfilename, ‘a’))
{
exit;
}
if (fwrite($handle, $errmsg.’\r\n’) === FALSE)
{
exit;
}
fclose($handle);
}
}
logerr($_REQUEST[‘err’]);
echo “I got this: “.$_REQUEST[‘err’];
?>

[4] So handleerror.php gets the error report Javascript compiled and stores it in a file named jsbugs.txt. make sure you have a file created already named jsbugs.txt with write permissions before you run this script. Should you want to do more fancy things, you can store the User-Agent string, timestamp and a lot more things in a database maybe. I leave that to you.

So now, you are ready with something.html about to be fetched in your browser, and handleerror.php copied in the web root directory and jsbugs.txt in the same directory as well, with write permissions.

Now get something.html and you should see your jsbugs.txt file piling up with cute strings, which may later turn up to be sleep-stealing, girlfriend-eating monsters.