Skip to main content

IT@Cornell


About Spider

Spider scans your hard drive, web site, or other collection of files to identify confidential data such as social security, credit card, or bank account and routing numbers. When the scan is complete, Spider produces a list of files that may potentially contain confidential data.

You can then use Spider to:

  • Securely move or erase files.
  • Encrypt files.
  • Redact (sometimes called "scrub") confidential information in files.

Choosing the correct protection strategy is a combination of business needs, local policy and technology, user education, and other factors. Cornell community members using Spider can contact the Security group with questions.

Note: It is against University policy to store sensitive data on an unsecured machine. See the Security Requirements page for information.

Warning: Spider report files create a roadmap to confidential data and should be destroyed or well-secured.

Who Should Use Spider?

If your computer or site may potentially contain sensitive data, you should use Spider. Running a basic scan will almost always provide useful information. More technical users may choose to use the advanced configuration features.

What Types of Data Can Spider Identify?

Spider scans your hard drive, web site, or other collection of files to identify confidential data such as social security, credit card, or bank account and routing numbers. Spider can identify the following types of data:

  • Social Security numbers (SSN)
  • Canadian Social Security numbers (SIN)
  • Credit card numbers
  • UK National Health Insurance numbers (NINO)

Spider can also scan for any data for which you can supply a regular expression including keyword searches.

How Does Spider Work?

Spider scans the space you designate (hard drive, web page, unallocated space) for patterns of numbers or letters that resemble specific types of confidential data such as Social Security numbers or bank routing numbers.

By default, Spider scans a limited list of file types that are the most likely to contain sensitive data. These include:

  • Mailboxes
  • Office documents including Open Office and MS Office (up to MS Office 2007).
  • PDFs
  • Some database files including FoxPro, Access, Filemaker, and most dBase III/IV derivatives
  • Compressed archives including ZIP, Gzip, and BZip
  • HTML
  • Legacy formats such as Quattro and Lotus 1-2-3 files

Spider Can Be Configured for Different Scans

You specify the details for the scan including:

  • Which files to scan
  • Which types of confidential data to identify
  • What priority your system gives to the scan process