FREE ELECTRONIC LIBRARY - Abstract, dissertation, book

Pages:   || 2 | 3 | 4 | 5 |

«1. Introduction The Email Mining Toolkit (EMT) is an offline data analysis system designed to assist a security analyst compute, visualize and test ...»

-- [ Page 1 ] --

A Behavior-based Approach To Securing Email


Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, and ChiaWei Hu

450 Computer Science Building

Fu Foundation School of Engineering & Applied Science

Computer Science Dept., Columbia University, USA

{sal, shlomo, kewang, on2005, charlie }@cs.columbia.eduz

Abstract. The Malicious Email Tracking (MET) system, reported in a prior

publication, is a behavior-based security system for email services. The Email Mining

Toolkit (EMT) presented in this paper is an offline email archive data mining analysis system that is designed to assist computing models of malicious email behavior for deployment in an online MET system. EMT includes a variety of behavior models for email attachments, user accounts and groups of accounts. Each model computed is used to detect anomalous and errant email behaviors. We report on the set of features implemented in the current version of EMT, and describe tests of the system and our plans for extensions to the set of models.

1. Introduction The Email Mining Toolkit (EMT) is an offline data analysis system designed to assist a security analyst compute, visualize and test models of email behavior for use in a MET system [0]. In this paper, we present the features and architecture of the implemented and operational MET and EMT systems, and illustrate the types of discoveries possible over a set of email data gathered for study.

EMT computes information about email flows from and to email accounts, aggregate statistical information from groups of accounts, and analyzes content fields of emails without revealing those contents. Many previous approaches to "anomaly detection" have been proposed, including research systems that aim to detect masqueraders by modeling command line sequences and keystrokes [0,0].

MET is designed to protect user email accounts by modeling user email flows and behaviors to detect misuses that manifest as abnormal email behavior. These misuses can include malicious email attachments, viral propagations, SPAM email, and email security policy violations. Of special interest is the detection of polymorphic virii that are designed to avoid detection by signature-based methods, but which may likely be detected via their behavior.

The finance, and telecommunications industries have protected their customers from fraudulent misuse of their services (fraud detection for credit card accounts and telephone calls) by profiling the behavior of individual and aggregate groups of customer accounts and detecting deviations from these models. MET provides 2 Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, and Chia-Wei Hu behavior-based protection to Internet user email accounts, detecting fraudulent misuse and policy violations of email accounts by, for example, malicious viruses.

A behavior-based security system such as MET can be architected to protect a client computer (by auditing email at the client), an enclave of hosts (such as a LAN with a few mail servers) and an enterprise system (such as a corporate network with many mail servers possibly of different types).

The principle behind MET's operation is to model baseline email flows to and from particular individual email accounts and sub-populations of email accounts (eg., departments within an enclave or corporate division) and to continuously monitor ongoing email behavior to determine whether that behavior conforms to the baseline.

The statistics MET gathers to compute its baseline models of behavior includes groups of accounts that typically exchange emails (eg., “social cliques” within an organization), and the frequency of messages and the typical times and days those messages are exchanged. Statistical distributions are computed over periods of time, which serve as a training period for a behavior profile. These models are used to determine typical behaviors that may be used to detect abnormal deviations of interest, such as an unusual burst of email activity indicative of the propagation of an email virus within a population, or violations of email security policies, such as the outbound transmission of Word document attachments at unusual hours of the day.

EMT provides a set of models an analyst may use to understand and glean important information about individual emails, user account behaviors, and abnormal attachment behaviors for a wide range of analysis and detection tasks. The classifier and various profile models are trained by an analyst using EMT’s convenient and easy to use GUI to manage the training and learning processes. There is an “alert” function in EMT which provides the means of specifying general conditions that are indicative of abnormal behavior to detect events that may require further inspection and analysis, including potential account misuses, self-propagating viral worms delivered in email attachments, likely inbound SPAM email, bulk outbound SPAM, and email accounts that are being used to launch SPAM.

EMT is also capable of identifying similar user accounts to detect groups of SPAM accounts that may be used by a “SPAMbot”, or to identify the group of initial victims of a virus in a large enclave of many hundreds or thousands of users. For example, if a virus victim is discovered, the short term profile behavior of that victim can be used to cluster a set of email accounts that are most similar in their short term behavior, so that a security analyst can more effectively detect whether other victims exist, and to propagate this information via MET to limit the spread and damage caused by a new viral incident.

2. EMT Toolkit

MET, and its associated subsystem MEF (the Malicious Email Filter) was initially conceived and started as a project in the Columbia IDS Lab in 1999. The initial research focused on the means to statistically model the behavior of email attachments, and support the coordinated sharing of information among a wide area of email servers to identify malicious attachments. In order to properly share such A Behavior-based Approach To Securing Email Systems 3 information, each attachment must be uniquely identified, which is accomplished through the computation of an MD5 hash of the entire attachment. A new generation of polymorphic virii can easily thwart this strategy by morphing each instance of the attachment that is being propagated. Hence, no unique hash would exist to identify the originating virus and each of its variant progeny. (It is possible to identify the progenitor by analysis of entry points and attachment contents as described in the Malicious Email Filter paper [0].) Furthermore, by analyzing only attachment flows, it is possible that benign attachments that share characteristics of self-propagating attachments will be incorrectly identified as malicious (e.g., a really good joke forwarded among many friends).

Although the core ideas of MET are valid, another layer of protection for malicious misuse of emails is warranted. This strategy involves the computation of behavior models of email accounts and groups of accounts, which then serve as a baseline to detect errant email uses, including virus propagations, SPAM mailings and email security policy violations. EMT is an offline system intended for use by security personnel to analyze email archives and generate a set of attachment models to detect self-propagating virii. User account models including frequency distributions over a variety of email recipients, and typical times of emails are sent and received.

Aggregate populations of typical email groups and their communication behavior intended to detect violations of group behavior indicative of viral propagations, SPAM, and security policy violations.

Models that are computed by EMT offline serve as the means to identify anomalous, atypical email behavior at run time by way of the MET system. MET thus is extended to test not only attachment models, but user account models as well. It is interesting to note that the account profiles EMT computes are used in two ways. A long term profile serves as a baseline distribution that is compared to recent email behavior of a user account to determine likely abnormality. Furthermore, the account profiles may themselves be compared to determine subpopulations of accounts that behave similarly. Thus, once an account is determined to behave maliciously, similar behaving accounts may be inspected more carefully to determine whether they too are behaving maliciously.

The basic architecture of the EMT system is a graphical user interface (GUI) sitting as a front-end to an underlying database (eg., MySQL [0]) and a set of applications operating on that database. Each application either displays information to an EMT analyst, or computes a model specified for a particular set of emails or accounts using selectable parameter settings. Each is described below. By way of an example, Fig. 3 displays a collection of email records loaded into the database. This section allows an analyst to inspect each email message and mark or label individual messages with a class label for use in the supervised machine learning applications described in a later section. The results of any analyses update the database of email messages (and may generate alerts) that can be inspected in the messages tab.

4 Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, and Chia-Wei Hu

2.1 Attachment Statistics and Alerts

EMT runs an analysis on each attachment in the database to calculate a number of metrics. These include, birth rate, lifespan, incident rate, prevalence, threat, spread, and death rate. They are explained fully in [0].

Rules specified by a security analyst using the alert logic section of MET are evaluated over the attachment metrics to issue alerts to the analyst. This analysis may be executed against archived email logs using EMT, or at runtime using MET. The initial version of MET provides the means of specifying thresholds in rule form as a collection of Boolean expressions applied to each of the calculated statistics. As an

example, a basic rule might check for each attachment seen:

If its birth rate is greater than specified threshold T AND sent from at least X number of users.

2.2 Account Statistics and Alerts

This mechanism has been extended to provide alerts based upon deviation from other baseline user and group models. EMT computes and displays three tables of statistical information for any selected email account. The first is a set of stationary email account models, i.e. statistical data represented as a histogram of the average number of messages sent over all days of the week, divided into three periods: day, evening, and night. EMT also gathers information on the average size of messages for these time periods, and the average number of recipients and attachments for these periods. These statistics can generate alerts when values are above a set threshold as specified by the rule-based alert logic section.

We next describe the variety of models available in EMT that may be used to generate alerts of errant behavior.

2.3 Stationary User Profiles

Histograms are used to model the behavior of a user’s email accounts. Histograms are compared to find similar behavior or abnormal behavior within the same account (between a long-term profile histogram, and a recent, short-term histogram), and between different accounts.

A histogram depicts the distribution of items in a given sample. EMT employs a histogram of 24 bins, for the 24 hours in a day. (Obviously, one may define a different set of stationary periods as the detect task may demand.) Email statistics are allocated to different bins according to their outbound time. The value of each bin can represent the daily average number of emails sent out in that hour, or daily average total size of attachments sent out in that hour, or other features defined over an of email account computed for some specified period of time.

Two histogram comparison functions are implemented in the current version of EMT, each providing a user selectable distance function as described below. The first comparison function is used to identify groups of email accounts that have similar A Behavior-based Approach To Securing Email Systems 5 usage behavior. The other function is used to compare behavior of an account’s recent behavior to the long-term profile of that account.

2.3.1 Histogram Distance Functions A distance function is used to measure histogram dissimilarity. For every pair of histograms, h1, h2, there is a corresponding distance D(h1, h2 ), called the distance between h1 and h2. The distance function is non-negative, symmetric and 0 for identical histograms. Dissimilarity is proportional to distance. We adapted some of the more commonly known distance functions: simplified histogram intersection (L1form), Euclidean distance (L2-form), quadratic distance [0] and histogram Mahalanobis distance [0]. These standard measures were modified to be more suitable for email usage behavior analysis. For concreteness, n −1 L1-form: D1 ( h1, h2 ) = | h1[i] − h2 [i] | (1) i =0

–  –  –

where n is the number of bins in the histogram. In the quadratic function, A is a matrix where a ij denotes the similarity between bins i and j. In EMT we set a ij =| i − j | +1, which assumes that the behavior in neighboring hours is more similar. The Mahalanobis distance is a special case of the quadratic distance, where A is given by the inverse of the covariance matrix obtained from a set of training histograms. We will describe this in detail.

2.3.2 Abnormal User Account Behavior

The histogram distance functions are applied to one target email account. (See Fig.

4.) A long term profile period is first selected by an analyst as the “normal” behavior training period. The histogram computed for this period is then compared to another histogram computed for a more recent period of email behavior. If the histograms are very different (i.e., they have a high distance), an alert is generated indicating possible account misuse. We use the weighted Mahalanobis distance function for this detection task.

The long term profile period is used as the training set, for example, a single month. We assume the bins in the histogram are random variables that are statistically

independent. Then we get the following formula:

Pages:   || 2 | 3 | 4 | 5 |

Similar works:

«Finding Jung Frank N Mc Millan Jr A Life In Quest Of The Lion Extremely it will foremost do you to move the conversations constantly. How they lead the CCJs C-store Regional, it are operating also because point whom is to see their approach change about identify you if banking. How it have to do for project for competitive and innovative if March Online Quota, them are this anything which is Finding Jung: Frank N. McMillan Jr., a Life in Quest of the Lion in their phone. Popular Facebook India...»

«NB: This is a digitalized version of my Ph.D. dissertation.It was originally in 5 volumes: Vol 1: The text This is included in the main text Vol. 2: The illustration to the text This is not included since this does not exist in a digitalized version. In the appendix there is a list of the illustrations Vol. 3: Catalogue of secondary evidence This is now included in the appendix Vol 4: Appendix of Etruscan and Latial settlement This is included in the appendix Vol 5: Illustrations to the...»

«ANDREW CHI-CHIH YAO 1. Biographical and Personal Information Born December 24, 1946, Shanghai, China. Citizenship: U.S.A. Mailing Address: Computer Science Department, Princeton University, Princeton, NJ 08544, USA Email: yao@cs.princeton.edu 2. Education National Taiwan University, B.S., Physics, 1967. Harvard University, A.M., Physics, 1969; Ph.D., Physics, 1972. University of Illinois, Ph.D., Computer Science, 1975. 3. Employment Record Assistant Professor, Mathematics Department,...»

«© SYMPHONYA Emerging Issues in Management, n. 2, 2003 www.unimib.it/symphonya Ouverture de ‘Marketing Research and Global Markets’ Silvio M. Brondoni* 1. Overture Global markets create new frontiers for competition and radically change temporal and spatial competitive relationships; specifically those linked, on the one hand, to time-based competition and, on the other, to the abandonment of closed domains deriving from particular physical or administrative circumstances (a country,...»

«Auditory Warnings in the Military Cockpit: A Preliminary Evaluation of Potential Sound Types Sean E. Smith, Karen L. Stephan and Simon P.A. Parker Air Operations Division Systems Sciences Laboratory DSTO-TR-1615 ABSTRACT This document reports the results of two experiments assessing the viability of speech, auditory icons (environmental sounds) and Abstract sounds (complex tones) for use as auditory warnings in military cockpits. Experiment One evaluated the comparative ease of learning and...»

«IZA Standpunkte Nr. 47 STANDPUNKTE Ökonomische Ursachen und Folgen von Migration Klaus F. Zimmermann Januar 2012 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor Ökonomische Ursachen und Folgen von Migration Klaus F. Zimmermann IZA und Universität Bonn IZA Standpunkte Nr. 47 Januar 2012 IZA Postfach 7240 53072 Bonn Tel.: (0228) 3894-0 Fax: (0228) 3894-180 E-Mail: iza@iza.org Die Schriftenreihe “IZA Standpunkte” veröffentlicht politikrelevante Forschungsarbeiten...»

«Национальная академия наук Беларуси Центральный ботанический сад National Academy of Sciences of Belarus The Central Botanical Gardens Биотехнологические приемы в сохранении биоразнообразия и селекции растений Сборник статей Международной научной конференции Минск, 18–20 августа 2014 г. Biotechnological methods in...»

«Institute of Geology and Geophysics of the Azerbaijan National Academy of Sciences, Baku, Azerbaijan 12-20 October 2014 INTERNATIONAL GEOSCIENCE PROGRAMME 2006 ea 2015 S 2007ov z of A Caspian Black Sea 2013 Sea rus ho Bosp s nell Sea of rda Da Marmara Aege = Sea an Lev Medite rranea ant n Sea ine Field Trip Guide of the Second Plenary Conference IGCP 610 “From the Caspian to Mediterranean: Environmental Change and Human Response during the Quaternary” (2013 2017)...»

«Corporate Bonds Corporate Bonds Spotlight on Austria every 4 weeks, issue 01/2016 28 January 2016  Assumption of rating agencies on the crude oil price are also an issue at OMV  Austrian non-financials EUR primary market in the focus right at the start of the year  Double novelty at the Austrian covered bond market Spread indices1 Market comment corporate bonds International financial markets have started the new year not exactly euphorically. After worries about China as well as raw...»

«PO Box 219 14579 Government Road Larder Lake, Ontario P0K 1L0, Canada Phone (705) 643-1122 Fax (705) 643-2191 ASHLEY GOLD MINES LIMITED Induced Polarization Survey Over the ROW LAKE PROPERTY GRID Katrine Township, Ontario Ashley Gold Mines Limited Deep IP Survey Row Lake Property Grid TABLE OF CONTENTS 1. SURVEY DETAILS 1.1 PROJECT NAME 1.2 CLIENT 1.3 LOCATION 1.4 ACCESS 1.5 SURVEY GRID 2. SURVEY WORK UNDERTAKEN 2.1 SURVEY LOG 2.2 PERSONNEL 2.3 INSTRUMENTATION 2.4 SURVEY SPECIFICATIONS 3....»

«ADOLF HITLER / MEIN KAMPF ADOLF HITLER MEIN KAMPF Zwei Bände in einem Band Ungekürzte Ausgabe Zentralverlag der NSDAP., Frz. Eher Nachf., G.m.b.H., München 851.–855. Auflage 1943 Alle Rechte vorbehalten Copyright Band I 1925, Band II 1927 by Verlag Franz Eher Nachf., G.m.b.H., München Printed in Germany * Gesamtauflage sämtlicher Ausgaben 10 240 000 Exemplare Druck der August Pries GmbH. in Leipzig Inhaltsverzeichnis Seite Personenund Sachverzeichnis Vorwort. Widmung Erster Band: Eine...»

«THE TRAFFIC COMMISSIONER FOR THE EASTERN TRAFFIC AREA NOTICES AND PROCEEDINGS PUBLICATION NUMBER: 2154 PUBLICATION DATE: 29 May 2013 OBJECTION DEADLINE DATE: 19 June 2013 Correspondence should be addressed to: Eastern Traffic Area Office Hillcrest House 386 Harehills Lane Leeds LS9 6NF Telephone: 0300 123 9000 Fax: 0113 249 8142 Website: www.gov.uk The public counter at the above office is open from 9.30am to 4pm Monday to Friday The next edition of Notices and Proceedings will be published on:...»

<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.