Stucco project page

Overview

Stucco addresses a fundamental problem in cyber security: quickly putting security events in context.

Security event data, such as intrusion detection system alerts, provide a starting point for analysis, but are information impoverished. To provide context, analysts must manually gather and synthesize relevant data from myriad sources within their enterprise and external to it. Analysts search system logs, network flows, and firewall data; they search IP blacklists and reputation lists, software vulnerability information, malware and threat data, OS and application vendor blogs, and news sites. All of these sources are manually searched for data relevant to the event being investigated. Relevant results must then be brought together and synthesized to put the event in context and make decisions about its importance and impact.

This is a manual, tedious process, but the results of this process are required to know how to react to events. Stucco is a cyber intelligence platform to help automate this process and provide relevant information to analysts quickly and easily. Stucco collects data not typically integrated into security systems, extracts domain concepts and relationships, and integrates that information into a cyber security knowledge graph to accelerate decision making.

By organizing data into a knowledge graph, security analysts will be able to rapidly search for domain concepts, speeding up access to the information needed for decision-making. The information returned will only be that which is pertinent to their search. Our approach enables analysts to more quickly identify events that can be discarded as false positives and to perform more thorough analysis with the relevant context to make decisions.

Getting Started

Build your own VM

To get started with a demonstration environment, you will need a Linux or Mac OS X host with at least 16GB of memory. We have tested on Mac OS X 10.9, Ubuntu 12.04, and CentOS 6. Install VirtualBox, Vagrant, and Ansible (Ansible requires Python 2.6). Then, open up a terminal and enter the following:

git clone https://github.com/stucco/dev-setup.git stucco
cd stucco
vagrant up

Then you can open a web browser to http://10.10.10.100:8000/help and start looking at the data being collected (it takes a while to collect all of the data). For more detailed instructions, see the dev-setup Readme.

If you run into any issues getting the demonstration VM working, please submit an issue.

Run a pre-configured demo VM

To run a virtual machine locally, you just need to install VirtualBox, Vagrant. These instructions will pull the VM image from atlas.hashicorp.com/stucco. Create a `Vagrantfile` with the following contents:

Vagrant.configure(2) do |config|
  config.vm.box = "stucco/demo"
  config.vm.network "forwarded_port", guest: 8000, host: 8000
  config.vm.synced_folder ".", "/vagrant", disabled: true
end

Then, open up a terminal and enter the following:

vagrant up

Then you can open a web browser to http://localhost:8000/help and start looking at the data that has been pre-loaded into the VM.

Demonstration Site

We have a demonstration site now available: http://stucco.stanford.edu/

Source Code

This project is open-source, under an MIT license. All source code can be found at https://github.com/stucco/.

The main components of the system are:

Collectors: Suite of data collectors for scheduling and gathering structured and unstructured documents.
RT: Real-time processing engine that extracts domain concepts from documents and inserts those concepts into a knowledge graph.
Document Service: Storage service for text documents and metadata over an HTTP API. This is the repository for unprocessed documents.
UI: Web application and API for interacting with the Stucco knowledge graph.

There are Ansible roles to facilitate installing each of these components. Stucco relies on RabbitMQ to pass data into the system, and Titan - a distributed graph database - to store the knowledge graph.

Stucco is still a work in progress and not yet fully documented, but we do have some Documentation, although some may be out of date; we are in the progress of cleaning things up.

If you run into any issues, please open an issue in the docs repository.

Selected Papers

Harshaw, Christopher R., Robert A. Bridges, Michael D. Iannacone, Joel W. Reed, and John R. Goodall. "GraphPrints: Towards a Graph Analytic Method for Network Anomaly Detection." Proceedings of the 11th Annual Cyber and Information Security Research ConferenceACM International Conference Proceedings Series, 2016.

Jones, Corinne L., Robert A. Bridges, Kelly M. T. Huffer, and John R. Goodall. "Towards a Relation Extraction Framework for Cyber-Security Concepts." Proceedings of the 10th Annual Cyber and Information Security Research Conference (CISR). ACM International Conference Proceedings Series, 2015.

Iannacone, Michael, Shawn Bohn, Grant Nakamura, John Gerth, Kelly Huffer, Robert Bridges, Erik Ferragut, and John R. Goodall. "Developing an Ontology for Cyber Security Knowledge Graphs."" Proceedings of the 10th Annual Cyber and Information Security Research Conference (CISR). ACM International Conference Proceedings Series, 2015.

Bridges, Robert A., Corinne L. Jones, Michael D. Iannacone, Kelly M. Testa, John R. Goodall. "Automatic Labeling for Entity Extraction in Cyber Security." ASE Third International Conference on Cyber Security, Academy of Science and Engineering (ASE), 2014.

McNeil, Nikki, Robert A. Bridges, Michael D. Iannacone, Bogdan Czejdo, Nicolas E. Perez, and John R. Goodall. "PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts." International Conference on Machine Learning and Applications (ICMLA). IEEE Press, 2013, 60-65.