FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 |

«Abstract. This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the ...»

-- [ Page 1 ] --

Behavior Profiling of Email

Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, and

Chia-Wei Hu

Columbia University, New York, NY 10027, USA


Abstract. This paper describes the forensic and intelligence analysis

capabilities of the Email Mining Toolkit (EMT) under development at

the Columbia Intrusion Detection (IDS) Lab. EMT provides the means

of loading, parsing and analyzing email logs, including content, in a wide

range of formats. Many tools and techniques have been available from the fields of Information Retrieval (IR) and Natural Language Processing (NLP) for analyzing documents of various sorts, including emails. EMT, however, extends these kinds of analyses with an entirely new set of analyses that model ”user behavior”. EMT thus models the behavior of individual user email accounts, or groups of accounts, including the ”social cliques” revealed by a user’s email behavior.

1 Introduction This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the Columbia IDS Lab.

EMT provides the means of loading, parsing and analyzing email logs, including content, in a wide range of formats. Many tools and techniques have been available from the fields of IR and NLP for analyzing documents of various sorts, including emails. EMT, however, extends these kinds of analyses with an entirely new set of analyses that model ”user behavior”. EMT thus models the behavior of individual user email accounts, or groups of accounts, including the ”social cliques” revealed by a user’s email behavior. EMT’s design has been driven by the core security application to detect virus propagations, spambot activity and security policy violations. However, the technology also provides critical intelligence gathering and forensic analysis capabilities for agencies to analyze disparate Internet data sources for the detection of malicious users, attackers, and other targets of interest. This dual use is graphically displayed in Figure ??. For example, one target application for intelligence gathering supported by EMT is the identification of likely ”proxy email accounts”, email accounts that exhibit similar behavior and thus may be used by a single person. Although EMT has been designed specifically for email analysis, the principles of its operation are equally relevant to other Internet audit sources.

This data mining technology previously reported [?,?,?], and graphically displayed in Figure ??, has been proven to automatically compute or create both signature-based misuse detection and anomaly detection-based misuse discovery.

The application of this technology to diverse Internet objects andevents (e.g., email and web transactions) allows for a broad range of behavior-based analyses including the detection of proxy email accounts and groups of user accounts that communicate with one another including covert group activities.

Data mining applies machine learning and statistical techniques to automatically discover and detect misuse patterns, as well as anomalous activities in general. When applied to network-based activities and user account observations for the detection of errant or misuse behavior, these methods are referred to as behavior-based misuse detection.

Behavior-based misuse detection can provide important new assistance for counter-terrorism intelligence. In addition to standard Internet misuse detection, these techniques will automatically detect certain patterns across user accounts that are indicative of covert, malicious or counter-intelligence activities.

Moreover, behavior-based detection provides workbench functionalities to interactively assist an intelligence agent with targeted investigations and off-line forensics analyses.

Intelligence officers have a myriad of tasks and problems confronting them each day. The sheer volume of source materials requires a means of honing in on those sources of maximal value to their mission. A variety of techniques can be applied drawing upon the research and technology developed in the field of Information Retrieval. There is, however, an additional source of information available that can used to aid even the simplest task of rank ordering and sorting documents for inspection: behavior models associated with the documents can be used to identify and group sources in interesting new ways. This is demonstrated by the Email Mining Toolkit that applies a variety of data mining techniques for profiling and behavior modeling of email sources.

The deployment of behavior-based techniques for intelligence investigation and tracking tasks represents a significant qualitative step in the counter-intelligence ”arms race”. Because there is no way to predict what data mining will discover over any given data set, ”counter-escalation” is particularly difficult.

Behavior-based misuse detection is more robust against standard knowledgebased techniques. Behavior-based detection has the capabilities to detect new patterns (i.e., patterns that have not been previously observed), provide early warning alerts to users and analysts, and automatically adapt to both normal and misuse behavior. By applying statistical techniques over actual system and user account behavior measurements, automatically-generated models and rules are tuned to the particular source material. This process, in turn, avoids the human bias that is intrinsic when misuse signatures, patterns and other knowledge-based models are designed by hand, as is the norm.

Despite this, no general infrastructure has been developed for the systematic application of behavior-based (misuse) detection across a broad set of detection and intelligence analysis tasks such as fraudulent Internet activities, virus detection, intrusion detection and user account profiling. Today’s Internet security systems are specialized to apply a small range of techniques, usually knowledgebased, to an individual misuse detection problem, such as intrusion, virus or SPAM detection. Moreover, these systems are designed for one particular network environment, such as medium-sized network enclaves, and only tap into an individual cross-section of network activity such as email activity or TCP/IP activity. Behavior-based detection technology as proposed herein will likely provide a quantum leap in security and in intelligence analysis in both offline and online task environments.

EMT has been described in another publication, focusing on its use for security applications, including virus and spam detection, as well as security policy violations. In this paper, we focus on several of its features specific to intelligence applications, namely the means of clustering email by content based analyses, identification of ”similar email accounts” based upon measuring similarity between account profiles represented by histograms, and clique analyses that are supported by EMT.

1.1 Applying Behavior-Based Detection to Email sources

Table ?? enumerates a range of behavior-based Internet applications. These applications cover a set of detection, security and marketing applications that exist within the government, commercial and private sectors. Each of these applications are within the capabilities of behavior-based techniques by applying data mining algorithms over appropriate audit data sources.

Our current research has applied behavior-based methods directly to the first six applications listed in Table ??: Fraud detection, malicious email detection, intrusion detection, user community discovery, behavior pattern discovery, and analyst workbench. Each of these are Internet security applications, applying to both outbound and inbound network- and email-based traffic.

Solving Internet security problems greatly assists surveillance intelligence activities. For example, the discovery of user account communities and the discovery and detection of certain community behavior patterns can be directed to uncover certain classes of covert, clandestine or espionage behavior performed with Internet resources. Furthermore, fraud detection in particular has direct benefit for an intelligence agency by profiling and identifying users and clusters of users that participate in such malicious Internet activities such as fraudulent activities.

Behavior-based detection has been proven against similar, analogous security applications. The finance, telecom and energy industries have protected their customers from fraudulent misuse of their services (e.g., fraudulent misuse of credit card accounts, telephone calling cards, stealing of utility service, etc.) by modeling their individual customer accounts and detecting deviations from this model for each of their customers. The behavior-based protection paradigm applied to the Internet thus has an historical precedent that is now ubiquitous and transparent as exemplified by the credit card in the reader’s wallet or purse.

1.2 EMT as an Analyst Workbench for Interactive Intelligence Investigations The ”Malicious Email Tracking” (MET) [?] is an online system that uses email flow statistics to capture new virii, which are largely undetectable by the ”signature” detection methods of today’s state-of-the-art commercial virus detection systems. Specifically, all email attachments are tracked by tracing a private hash value, temporal statistics such as replication rate are recorded to trace the attachments’ trajectory, e.g., across LANs, and these statistics directly inform the detection of self-replicating, malicious software attachments. MET has been developed and deployed as an extension to mail servers and is fully described elsewhere. MET is an example of an online ”behavior-based” security system that defends and protects a system not solely by attempting to identify known attacks against a system, but rather by detecting deviations from a system’s normal behavior. Many approaches to ”anomaly detection” have been proposed, including research systems that aim to detect masqueraders by modeling user behaviors in command line sequences, or even keystrokes. However, in this case, MET is architected to protect user accounts by modeling user email flows to detect malicious email attachments, especially polymorphic viruses that are not detectable or traceable via signature-based detection methods.

The ”Email Mining Toolkit” (EMT) on the other hand, is an offline system applied to email files gathered from server logs or client email programs. EMT computes information about email flows from and to email accounts, aggregate statistical information from groups of accounts, and analyzes content fields of emails. The EMT system provides temporal statistical feature computations and behavior-based modeling techniques, through an interactive user interface to enable targeted intelligence investigations and semi-manual forensic analysis of email files. Figure ?? illustrates the general architecture of a behavior-based

system deploying dual functionality:

1. An online security detection application (in this case, MET for malicious email detection)

2. A general analyst workbench for intelligence investigations (EMT, for email source analysis) As this figure illustrates, these functionalities share a great deal of overhead.

With regard to the implementation, by deploying these dual functionalities, the audit module, computation of temporal statistics, user modeler and database of user models each serve for both functionalities. Moreover, with regard to the conceptual design, the particular set of temporal statistics and user model processes designed for one can improve the performance of the other. In particular, temporal features, as well as user account models and clusters, are representatively

general ”fundamental building blocks.” EMT provides the following functionalities, interactively:

– Querying a database (warehouse) of email data and computed feature values,


• Ordering and sorting emails on the basis of content analysis (n-gram analysis, keyword spotting, and classifications of email supported by an integrated supervised learning feature using Nave Bayes classifier trained on user selected features)

• Historical features that profile user groups by statistically measuring behavior characteristics.

• User models that group users according to features such as typical emailing patterns (as represented by histograms over different selectable statistics), and email communities (including the ”social cliques” revealed in email exchanges between email accounts.

– Applying statistical models to email data to alert on abnormal or unusual email events.

Table 1. Behavior-Based Internet Applications for Security and Beyond

–  –  –

EMT is also designed as a plug in to a data mining platform, originally designed and implemented at Columbia called the DW/AMG architecture (Data Warehouse/Adaptive Model Generation system). That work has been transferred to System Detection Inc (SysD http://www.sysd.com), a DARPA-spinout from Columbia who has commercialized the system as the Hawkeye Security Platform.

2 EMT Features The full range of EMT features have been described elsewhere. For the present paper, we provide a brief overview of several of its key features of direct relevance to security analysis and intelligence applications, along with descriptive screenshots of EMT in operation.

Fig. 1. User account profiling, dual use: online detection and offline analysis.

2.1 Attachment models

MET was initially conceived to statistically model the behavior of email attachments in real time flowing through an enclave’s email server, and support the coordinated sharing of information among a wide area of email servers to identify malicious attachments and halt their propagation before saturation. In order to properly share such information, each attachment must be uniquely identified, which is accomplished through the computation of an MD5 hash of the entire attachment.

EMT runs an analysis on each attachment in the database to calculate a number of metrics. These include, birth rate, lifespan, incident rate, prevalence, threat, spread, and death rate. They are explained fully in 1, and are displayed graphically in Figure 3.

Pages:   || 2 | 3 |

Similar works:

«Induced Nucleation Processes during Batch Cooling Crystallization Zur Erlangung des akademischen Grades eines Dr.-Ing. von der Fakultät Biound Chemieingenieurwesen der Technischen Universität Dortmund genehmigte Dissertation vorgelegt von Dipl.-Ing. Kerstin Wohlgemuth aus Hattingen Tag der mündlichen Prüfung: 07. September 2012 1. Gutachter: Prof. Dr.-Ing. Gerhard Schembecker 2. Gutachter: Prof. Dr.-Ing. habil. Dr. h.c. Joachim Ulrich Dortmund 2012 Wissenschaftliche Forschung läuft immer...»

«Arenberg Doctoral School of Science, Engineering & Technology Faculty of Engineering Department of Mechanics Centre for Industrial Management Automated Tourist Decision Support Wouter Souffriau Dissertation presented in partial fulfillment of the requirements for the degree of Doctor in Engineering March 2010 Automated Tourist Decision Support Wouter Souffriau Jury: Dissertation presented in partial Prof. em. dr. ir. Y. D. Willems, chairman fulfillment of the requirements for Prof. dr. D....»

«Eine Methode zur optimalen Redundanzallokation im Vorentwurf fehlertoleranter Flugzeugsysteme Vom Promotionsausschuss der Technischen Universität Hamburg-Harburg zur Erlangung des akademischen Grades Doktor-Ingenieur genehmigte Dissertation von Dipl.-Ing. Christian Raksch aus Neumünster 1. Gutachter: Prof. Dr.-Ing. Frank Thielecke Institut für Flugzeug-Systemtechnik Technische Universität Hamburg-Harburg 2. Gutachter: Prof. Dr.-Ing. Robert Luckner Fachgebiet Flugmechanik, Flugregelung und...»

«MINISTRY OF EDUCATION, RESEARCH, YOUTH AND SPORT UNIVERSITY OF ARTS TÂRGU MUREȘ Applied Drama and Theatre – Drama Techniques in Teaching English for Specific Purposes PhD Thesis (Extract) Coordinator: University Professor Dr. BÉRES ANDRÁS PhD Candidate: KOVÁCS GABRIELLA Târgu Mureș Contents 1. Introduction 1.1. Justification of the research topic 1.2. The research questions 1.3. Hypotheses 1.4. The interdisciplinarity of the research Theatre and pedagogy – acting pedagogy and...»


«Simulation in Production and Logistics 2015 Markus Rabe & Uwe Clausen (eds.) Fraunhofer IRB Verlag, Stuttgart 2015 Ein generisches Standardmodell zur Simulation spurgebundener Transportsysteme in der Halbleiterindustrie An Adjustable Standard Simulation Model of Path Mover Transportation Systems in the Semiconductor Industry Sebastian Rank, Christian Hammel, Thorsten Schmidt, Technische Universität Dresden, Dresden (Germany), sebastian.rank@tu-dresden.de Germar Schneider, Infineon Technologies...»

«THL D Schutzjacke und Schutzhose Protective jacket and Protective trousers Verwenderinformation User information Verwenderinformation D User information GB Rosenbauer THL D, Schutzjacke / Schutzhose THL D Schutzjacke / Schutzhose zur technischen Hilfeleistung und Brandbekämpfung im freien Gelände Die Schutzjacke und Schutzhose sind eine Feuerwehrschutzbekleidung ▪ gemäß EN 15614 Die Feuerwehtschutzjacke THL D muss immer gemeinsam mit der ▪ Feuerwehrschutzhose THL D getragen werden. Wir...»

«TECHNICAL REPORT TR-157 Amendment 3 Component Objects for CWMP Issue: 1 Issue Date: November 2010 © The Broadband Forum. All rights reserved. Component Objects for CWMP TR-157 Amendment 3 Issue 1 Notice The Broadband Forum is a non-profit corporation organized to create guidelines for broadband network system development and deployment. This Broadband Forum Technical Report has been approved by members of the Forum. This Broadband Forum Technical Report is not binding on the Broadband Forum,...»

«Et kryssplatform API for å kontrollere en ekstern enhet Arnfinn Refshal Gjørvad Master i informatikk Innlevert: desember 2014 Hovedveileder: Maria Letizia Jaccheri, IDI Medveileder: Rune Alstad, aalberg audio Norges teknisk-naturvitenskapelige universitet Institutt for datateknikk og informasjonsvitenskap i Preface This is a Masters thesis in Software engineering at NTNU as part of the Informatics study. The thesis was done in 2014. The project was made in cooperation with Aalberg Audio....»

«DRY POWDER MICROFEEDING SYSTEM FOR SOLID FREEFORM FABRICATION Xuesong Lu, Shoufeng Yang, Lifeng Chen and Julian R.G.Evans Department of Materials, Queen Mary, University of London, Mile End Road, London, E1 4NS, UK Abstract Second generation SFF techniques allow both composition and shape to be downloaded directly from a computer file so that 3D functionally graded materials (FGM) can be assembled. Methods for multi-material deposition are also needed in combinatorial research, colour...»

«January 2016 International Financial Reporting Standard® Recognition of Deferred Tax Assets for Unrealised Losses Amendments to IAS 12 Recognition of Deferred Tax Assets for Unrealised Losses (Amendments to IAS 12) Recognition of Deferred Tax Assets for Unrealised Losses (Amendments to IAS 12) is issued by the International Accounting Standards Board® (IASB). Disclaimer: the IASB, the IFRS Foundation®, the authors and the publishers do not accept responsibility for any loss caused by acting...»

«NC State Building Codes Amendments Effective 1/1/2016 (adopted September 2014 through June 2015) (Note: some amendments may indicate earlier effective dates) The North Carolina Codes are available at www.iccsafe.org/NCDOI for purchase online. Soft-bound copies are available for walk-in purchase only at the following location. NC Department of Insurance, 322 Chapanoke Road, Suite 200, Raleigh, NC 27603 919x 240 (call for availability) The following pages represent a summary of the Building Code...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.