Create personal Web index

Sixearch automatically creates a Web index based on the user's bookmark file and Web search history during installation. Afterward, Sixearch will update the index by periodically adding new bookmarks or search history.

Users can also manually create or add to the personal index by running a best-N-first 1 topical crawler, which crawls the Web in a more focused way guided by a provided topic. The crawling results will then be indexed for keyword searching.
To create a personal Web index for searching and sharing, users need to provide the following mandatory or optional information:
  1. Personal Web document collection (optional).
    A folder on your disk containing documents that you want to share through Sixearch (currently the system only supports text and html files).

  2. Personal bookmarks file (currently only working for FireFox bookmarks) (optional).
    • Windows XP users: You can find your FireFox bookmarks file in C:\Documents and Settings\Your log in name\Application Data\Mozilla\Firefox\Profiles\xxxxxxx\bookmarks.html
    • Linux/Unix users: You can find your FireFox bookmarks file in /home/Your log in name/.mozilla/firefox/xxxxxxx/bookmarks.html
    • Mac users: You can find your FireFox bookmarks file in /Users/Your log in name/Library/Application Support/Firefox/Profiles/xxxxxxx/bookmarks.html

    • xxxxxxx will be a random directory name generated by Firefox when it was installed. It is different on every installation.

  3. Crawling topic (optional). This is a bunch of keywords describing your interests. You can enter any keywords you like. If no keyword is provided, Sixearch will do a general crawl.

  4. Number of Web pages to crawl. Go ahead and crawl a thousand or more, it will take only a few minutes. It is OK to close the window once the crawl starts, it will continue in the background.

The Sixearch index management tool displays all the documents indexed by the local engine. For each indexed document, the user can assign tags, which are searchable by the local engine or modify existing tags. Users can also delete/undelete any document entries or remove the entire index. In addition, Sixearch integrates Luke (http://www.getopt.org/luke/) to provide more advanced index management options.

1http://www.informatics.indiana.edu/fil/IS/JavaCrawlers/