TorBot – Intelligence Tool for Dark Web

TorBot
TorBot

TorBot – Open Source Intelligence Tool for the Dark Web


TorBot is an open source intelligence tool developed in python. The main objective of this project is to collect open data from the deep web (aka dark web) and with the help of data mining algorithms, collect as much information as possible and produce an interactive tree graph. The interactive tree graph module will be able to display the relations of the collected intelligence data.

Working Procedure/Basic Plan

The basic procedure executed by the web crawling algorithm takes a list of seed URLs as its input and repeatedly executes the following steps:

URLs = input(url)
while(URLs is not empty) do
    dequeue url
    request page
    parse for Links
    for(link in Links) do 
        if (link islive && link is not visited) then 
            add link to URLs
    store page content

Features

  1. Onion Crawler (.onion).(Completed)
  2. Returns Page title and address with a short description about the site.(Partially Completed)
  3. Save links to database.(PR to be reviewed)
  4. Get emails from site.(Completed)
  5. Save crawl info to JSON file.(Completed)
  6. Crawl custom domains.(Completed)
  7. Check if the link is live.(Completed)
  8. Built-in Updater.(Completed)
  9. TorBot GUI(see branch front_end)
  10. Social Media integration.(not Started) …(will be updated)

Contribute

Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, Feature_FasterCrawl_1.0. Contributor name will be updated to the below list. ?
NOTE : The PR should be made only to dev branch of TorBot.

OS Dependencies

  • Tor
  • Python 3.x
  • Golang 1.x (Not Currently Used)

Python Dependencies

  • beautifulsoup4
  • pyinstaller
  • PySocks
  • termcolor
  • requests
  • requests_mock
  • yattag

Basic setup

Before you run the torBot make sure the following things are done properly:

  • Run tor service sudo service tor start
  • Make sure that your torrc is configured to SOCKS_PORT localhost:9050
  • Install TorBot Python requirements pip3 install -r requirements.txt

On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x install.sh Now you can run ./install.sh to create the torBot binary. Run ./torBot to execute the program.

An alternative way of running torBot is shown below, along with help instructions.

python3 torBot.py or use the -h/--help argument

usage: torBot.py [-h] [-v] [--update] [-q] [-u URL] [-s] [-m] [-e EXTENSION]
                 [-l] [-i]

optional arguments:
  -h, --help            Show this help message and exit
  -v, --version         Show current version of TorBot.
  --update              Update TorBot to the latest stable version
  -q, --quiet           Prevent header from displaying
  -u URL, --url URL     Specifiy a website link to crawl, currently returns links on that page
  -s, --save            Save results to a file in json format
  -m, --mail            Get e-mail addresses from the crawled sites
  -e EXTENSION, --extension EXTENSION
                        Specifiy additional website extensions to the
                        list(.com or .org etc)
  -l, --live            Check if websites are live or not (slow)
  -i, --info            Info displays basic info of the scanned site (very
  • NOTE: All flags under -u URL, –url URL must also be passed a -u flag.

Read more about torrc here : Torrc

Using Docker

  • Ensure than you have a tor container running on port 9050.
  • Build the image using following command:docker build -t dedsecinside/torbot .
  • Run the container (make sure to link the tor container as tor): docker run --link tor:tor --rm -ti dedsecinside/torbot

TO-DO

  • Visualization Module
  • Implement BFS Search for webcrawler
  • Multithreading for Get Links
  • Improve stability (Handle errors gracefully, expand test coverage and etc.)
  • Create a user-friendly GUI
  • Randomize Tor Connection (Random Header and Identity)
  • Keyword/Phrase search
  • Social Media Integration
  • Increase anonymity and efficiency

Have ideas?

If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST]. If the idea is worth implementing, congratz, you are now a contributor.