This is an archived cached-text copy of the developerWorks article. Please consider viewing the original article at: IBM developerWorks



Skip to main content

skip to main content

developerWorks  >  Open source  >

Beef up the Find command in Firefox

Create a Greasemonkey script to highlight search entries relative to nearby content

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Introductory

Nathan Harrington (harrington.nathan@gmail.com), Programmer, IBM 

12 Aug 2008

The Find command in Firefox locates the user-specified text in the body of a Web page. The command is an easy-to-use tool that works well enough for most users most of the time. Sometimes, however, a more powerful Find-like tool would make locating text easier. This article shows how to build a tool that isolates relevant text in Web pages faster by detecting the presence and absence of nearby words.

Native text-search capabilities in Firefox provide useful highlighting of contiguous search terms and phrases. Additional Firefox extensions are available to incorporate regular-expression searches and other text-highlighting capabilities. This article presents tools and code needed to add your own text-searching interface to Firefox. With a Greasemonkey user script and some custom algorithms, you'll be able to add grep -v functionality to text searches — that is, highlighting a first search term where a second one is not located nearby.

Requirements

Hardware

Text searches on typical Web pages with older (pre-2002) hardware are nearly instantaneous. However, the code presented here is not designed for speed and may require faster hardware to perform at a user-friendly speed on large Web pages.

Software

The code was developed for use with Firefox V2.0 and Greasemonkey V0.7. Newer versions of both will require testing and possibly modifications to ensure their functionality. As a Greasemonkey script, the code presented here should work on any operating system that supports Firefox and Greasemonkey. We tested on Microsoft® Windows® and Linux® Ubuntu V7.10 releases.



Back to top


Greasemonkey and Firefox extensions

User modification to Web pages is the role Greasemonkey fulfills, and the code presented here uses the Greasemonkey framework to search for and highlight the relevant text. See Resources for the Greasemonkey Firefox extension.

Examples of what this Greasemonkey script is designed to do

Those familiar with the UNIX grep command and its common -v option know how indispensable grep is for extracting relevant lines of text from a file. Text files conforming the UNIX tradition of simplicity generally store their text in a line-by-line format that makes it easy to find words close together. The -v option prints lines where the specified text is not found.

Unlike text files, Web pages generally divide text with tags and other markers rendered into lines by the browser. A wide variety of browser window sizes makes it difficult to isolate nearby text based on expected line positions. Tables, links, and other text markup also make it difficult to isolate text that is in the same "line."

Algorithms in this article are designed to address some of these difficulties by providing a simple grep-like functionality piped to a function that works like grep's -v option. This allows the user to find a certain word of text, then only highlight entries where a different word is not nearby. Figure 1 shows what this can look like.


Figure 1. Example of DOM and DOM hierarchy searches
Example of DOM and DOM hierarchy searches

In the top portion of the image, the search text of "DOM" is highlighted by the script. In the bottom portion, notice how only the first three "DOM" entries are highlighted because the second search text of "hierarchy" is found in close proximity to the third "DOM."

Consider Figure 2.


Figure 2. Example of 2008 and 2008 PM searches
Example of 2008 and 2008 PM searches

The first portion of the image shows all the 2008 entries, while the second portion only shows the before-noon entries due to the -v keyword of PM. Read on for full details and further examples of how to implement this functionality.



Back to top


greppishFind.user.js Greasemonkey user script

An introduction to the unique aspects of the Greasemonkey programming environment are beyond the scope of this article. Familiarity with Greasemonkey, including how to install, modify, and debug scripts, is assumed. Consult the Resources for more information about Greasemonkey and how to get started programming your own user scripts.

Generally speaking, the greppishFind.user.js user script is started on a page load, provides a text area after a specific key combination is entered, and performs highlighting searches based on user-entered text. Listing 1 shows the beginning of the greppishFind.user.js user script.


Listing 1. greppishFind.user.js program heading

// ==UserScript==
// @name          greppishFind
// @namespace     IBM developerWorks
// @description   grep and grep -v function-ish for one or two word searches
// ==/UserScript==

var boxAdded = false;       // user interface for search active
var dist = 10;              // proximity distance between words

var highStart = '<high>';   // begin and end highlight tags
var highEnd   = '</high>';

var lastSearch = null;      // previous highlight text

window.addEventListener('load', addHighlightStyle,'true');
window.addEventListener('keyup', globalKeyPress,'true');

After defining the required metadata that describes the user script and its function, global variables, and highlighting tags, the load and keyup event listeners are added to process user-generated events. Listing 2 details the addHighlightStyle function called by the load event listener.


Listing 2. addHighlightStyle function

function addHighlightStyle(css)
{
  var head = document.getElementsByTagName('head')[0];
  if( !head ) { return; }

  var style = document.createElement('style');
  var cssStr = "high {color: black; background-color: yellow; }";
  style.type = 'text/css';
  style.innerHTML = cssStr;
  head.appendChild(style);
}//addHighlightStyle

The function creates a new node in the current DOM hierarchy with the appropriate highlighting information. In this case, it's a simple yellow-on-black text attribute. Listing 3 shows the code of the other event listener, globalKeyPress, as well as the boxKeyPress function.


Listing 3. globalKeyPress, boxKeyPress functions

function globalKeyPress(e)
{
  // add the user interface text area and button, set focus and event listener
  if( boxAdded == false && e.altKey && e.keyCode == 61 )
  {
    boxAdded = true;
    var boxHtml = "<textarea wrap='virtual' id='sBoxArea' " +
              "style='width:300px;height:20px'></textarea>" +
              "<input name='btnHighlight' id='tboxButton' " +
              "value='Highlight' type='submit'>";
    var tArea = document.createElement("div");
    tArea.innerHTML = boxHtml;
    document.body.insertBefore(tArea, document.body.firstChild);

    tArea = document.getElementById("sBoxArea");
    tArea.focus();
    tArea.addEventListener('keyup', boxKeyPress, true );

    var btn = document.getElementById("tboxButton");
    btn.addEventListener('mouseup', processSearch, true );

  }//if alt = pressed

}//globalKeyPress

function boxKeyPress(e)
{
  if( e.keyCode != 13 ){ return; }

  var textarea = document.getElementById("sBoxArea");
  textarea.value = textarea.value.substring(0,textarea.value.length-1);
  processSearch();

}//boxKeyPress

Catching each keystroke and listening for a specific combination is the purpose of globalKeyPress. When the Alt+= keys are pressed (that is, hold Alt and press the = key), the user interface for the search box is added to the current DOM. This interface consists of a text area for entering the keywords and a Submit button. After the new items are added, the text area needs to be selected by the getElementById function to set the focus correctly. Event listeners are then added to process the keystrokes in the text area, as well as executing the search when the Submit button is clicked.

The second function in Listing 3 processes each keystroke in the text area. If the Enter key is pressed, the text area's value has the newline removed and the processSearch function executed. Listing 4 details the processSearch function.


Listing 4. processSearch function

function processSearch()
{
  // remove any existing highlights
  if( lastSearch != null )
  {
    var splitResult = lastSearch.split( ' ' );
    removeIndicators( splitResult[0] );
  }//if last search exists

  var textarea = document.getElementById("sBoxArea");

  if( textarea.value.length > 0 )
  {
    var splitResult = textarea.value.split( ' ' );
    if( splitResult.length == 1 )
    { 
      oneWordSearch( splitResult[0] );

    }else if( splitResult.length == 2 )
    { 
      twoWordSearch( splitResult[0], splitResult[1] );

    }else
    { 
      textarea.value = "Only two words supported";

    }//if number of words
  }//if longer than required

  lastSearch = textarea.value;

}//processSearch

Each search is stored in the lastSearch variable to be removed each time processSearch is called. After the removal, the search query is highlighted using oneWordSearch if there is only one query word or if the twoWordSearch function if the grep -v functionality is desired. Listing 5 shows the details on the removeIndicators function.


Listing 5. removeIndicators function

function removeIndicators( textIn )
{
  // use XPath to quickly extract all of the rendered text
  var textNodes = document.evaluate( '//text()', document, null,
                                     XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                     null );

  for (var i = 0; i < textNodes.snapshotLength; i++)
  {
    textNode = textNodes.snapshotItem(i);

    if( textNode.data.indexOf( textIn ) != -1 )
    {
      // find the appropriate parent node with the innerHTML to be removed
      var getNode = getHtml( textNode );
      if( getNode != null )
      {
        var temp = getNode.parentNode.innerHTML;
        var reg = new RegExp( highStart, "g");
        temp = temp.replace( reg, "" );

        reg = new RegExp( highEnd, "g");
        temp = temp.replace( reg, "" );
        getNode.parentNode.innerHTML = temp;

      }//if correct parent found

    }//if word found
  }//for each text node

}//removeIndicators

Instead of traversing the DOM tree manually, removeIndicators uses XPath to extract the text nodes in the document quickly. If any of the text nodes contains the lastSearch text (the most recent highlighted word), getHtml finds the appropriate parent node, and the highlighted text is removed. Note that combining the extract of innerHTML and assignment of innerHTML into one step will cause various issues, so temporarily assigning the innerHTML to an external variable is required. Listing 6 is the getHtml function that shows in detail how to find the appropriate parent node.


Listing 6. getHtml function

function getHtml( tempNode )
{
  // walk up the tree to find the appropriate node
  var stop = 0;

  while( stop == 0 )
  {
    if( tempNode.parentNode != null &&
        tempNode.parentNode.innerHTML != null )
    {
      // make sure it contains the tags to be removed
      if( tempNode.parentNode.innerHTML.indexOf( highStart ) != -1 )
      {

        // make sure it's not the title or greppishFind UI node
        if( tempNode.parentNode.innerHTML.indexOf( "<title>" ) == -1 &&
            tempNode.parentNode.innerHTML.indexOf("btnHighlight") == -1)
        {
          return( tempNode );

        }else{ return(null); }

      // the highlight tags were not found, so go up the tree
      }else{ tempNode = tempNode.parentNode; }

    // stop the processing when the top of the tree is reached
    }else{ stop = 1; }

  }//while
  return( null );
}//getHtml

While walking up the DOM tree in search of the innerHTML with the highlighting tags inserted, it is important to disregard two specific nodes. The nodes containing title and btnHighlight should not be updated, as changes in these nodes cause the document to display incorrectly. When the correct node is found, regardless of the number of parents up the DOM tree it is, the node is returned and the highlighting removed. Listing 7 is the first of the functions that adds highlighting to the document.


Listing 7. oneWordSearch function

function oneWordSearch( textIn )
{
  // use XPath to quickly extract all of the rendered text
  var textNodes = document.evaluate( '//text()', document, null,
                                     XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                     null );

  for (var i = 0; i < textNodes.snapshotLength; i++)
  {
    textNode = textNodes.snapshotItem(i);

    if( textNode.data.indexOf( textIn ) != -1 )
    {
      highlightAll( textNode, textIn );

    }//if word found
  }//for each text node

}//oneWordSearch

Again using XPath, oneWordSearch processes each text node to find the query. When found, the highlightAll function is called, as shown in Listing 8.


Listing 8. highlightAll function

function highlightAll( nodeOne, textIn )
{
  if( nodeOne.parentNode != null )
  {
    full = nodeOne.parentNode.innerHTML;
    var reg = new RegExp( textIn, "g");
    full = full.replace(  reg,  highStart + textIn + highEnd );
    nodeOne.parentNode.innerHTML = full;
  }//if the parent node exists
}//highlightAll

function highlightOne( nodeOne, wordOne, wordTwo )
{
  var oneIdx = nodeOne.data.indexOf( wordOne );
  var tempStr = nodeOne.data.substring( oneIdx + wordOne.length );
  var twoIdx = tempStr.indexOf( wordTwo );

  // only create the highlight if it's not too close
  if( twoIdx > dist )
  {
    var reg = new RegExp( wordOne );
    var start = nodeOne.parentNode.innerHTML.replace(  
      reg,  highStart + wordOne + highEnd 
    );
    nodeOne.parentNode.innerHTML = start;
  }//if the distance threshold exceeded
}//highlightOne

Similar to the removeIndicators function, highlightAll uses a regular expression to replace the text to be highlighted with markup, including the highlighting tags and the original text.

Function highlightOne, used later in the twoWordSearch function, checks that the first word is sufficiently far away from the second word, then performs the same replacement. Word distance checks need to take place in the rendered text as returned from the XPath statement; otherwise, various markup, such as <b>, will affect the distance calculations. Listing 9 shows the twoWordSearch function in detail.


Listing 9. twoWordSearch function

function twoWordSearch( wordOne, wordTwo )
{
  // use XPath to quickly extract all of the rendered text
  var textNodes = document.evaluate( '//text()', document, null,
                                     XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                                     null );
  var nodeOne;
  var foundSingleNode = 0;

  for (var i = 0; i < textNodes.snapshotLength; i++)
  {
    textNode = textNodes.snapshotItem(i);

    // if both words in the same node, highlight if not too close
    if( textNode.data.indexOf( wordOne ) != -1 &&
        textNode.data.indexOf( wordTwo ) != -1 )
    { 
      highlightOne( textNode, wordOne, wordTwo );
      foundSingleNode = 0;
      nodeOne = null;
    }else
    { 
      if( textNode.data.indexOf( wordOne ) != -1 )
      { 
        // if the first word is already found, highlight the entry
        if( foundSingleNode == 1  &&
            nodeOne.parentNode != null &&
            nodeOne.parentNode.innerHTML.indexOf( wordTwo ) == -1 )
        { 
          highlightAll( nodeOne, wordOne );
        }//if second word is in the same parent node

        // record current node found 
        nodeOne = textNode;
        foundSingleNode = 1;

      }//if text match

      if( textNode.data.indexOf( wordTwo ) != -1 ){ foundSingleNode = 0; }

    }//if both words in single node

  }//for each text node

  // no second word nearby, highlight all entries
  if( foundSingleNode == 1 ){ highlightAll( nodeOne, wordOne ); }

}//twoWordSearch

Walking through each text node as retrieved from the XPath call is done the same way as in the oneWordSearch function. If both words are found within the current text node, the highlightOne function is called to highlight the instances of wordOne where it is sufficiently distant from wordTwo.

If both words are not in the same node, the foundSingleNode variable is set on the first match. On subsequent matches, the highlightAll function is called when the single node is detected again before a second node match. This ensures that each instance of the first word is highlighted — even those that do not have the second word nearby. Upon a loop, a final check is made to run highlightAll if the last wordOne match was isolated and still needs to be highlighted.

Save the file created with the above code as greppishFind.user.js and read on for installation and usage details.



Back to top


Installing the greppishFind.user.js script

Open your Firefox browser with the Greasemonkey V0.7 extension installed and enter the URL to the directory where greppishFind.user.js is located. Click on the greppishFind.user.js file and you should see the standard Greasemonkey install pop up. Select install, then reload the page to activate the extension.

Usage examples

Once the greppishFind.user.js script is installed into Greasemonkey, you can mimic the examples shown in Figure 1 by entering dom inspector as a search query at www.google.com. When the results page appears, press Alt+= to activate the user interface. Type the query DOM (case-sensitive) and press Enter to see all entries of DOM highlighted. Change the query to DOM hierarchy, and you'll see how only the first three entries of DOM are highlighted, as shown in Figure 1.

Choose a directory listing such as file:///home/ or file:///c:/ to show entries like those listed in Figure 2. You may want to experiment with changes to the distance parameter or highlighting style to achieve results tailored to your searches.

Conclusion, further additions

With the code above and your completed greppishFind.user.js program, you now have a baseline for implementing your own text-search capabilities in Firefox. Although this program focuses on specific cases of certain words appearing in close proximity to others, it provides a framework for further text-searching options.

Consider adding color changes for highlighted words based on how close the secondary terms are. Expand the number of grep -v words to eliminate entries gradually. Use the code here and your own ideas to create new Greasemonkey user scripts that further enhance users' abilities to find text.




Back to top


Download

DescriptionNameSizeDownload method
Sample codeos-customserach-firefox-greppishFind_0.1.zip3KBHTTP
Information about download methods


Resources

Learn
  • Learn more about Greasemonkey at Greasespot.net.

  • Read about JavaScript from the source at Mozilla.org

  • To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.

  • Stay current with developerWorks' Technical events and webcasts.

  • Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.

  • Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.

  • Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.


Get products and technologies
  • Grab the Greasemonkey Firefox add-on (extension) from Mozilla.org.

  • Innovate your next open source development project with IBM trial software, available for download or on DVD.

  • Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss


About the author

Nathan Harrington

Nathan Harrington is a programmer at IBM currently working with Linux and resource-locating technologies.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!