@sarnold ok, I think it is about time to start jose's presentation
@sarnold jose earned his phd in biology from chase western university
@sarnold and spends his time working as a researcher at arbor networks
@sarnold for years, he has been interested in studying the network effects of various internet phenomena
@sarnold and today, he is going to share his knowledge on internet worm propagation :)
@sarnold jose has prepared some notes here: http://monkey.org/~jose/wiki/wiki.php?page=DetectingsWorms
@jose_n thank you sarnold, and thank you ismak
@jose_n and mjesus, fernand0, jfs, vizard, everyone at uninet for putting this together.
@jose_n it's a pleasure and an honor to present with this crowd.
@jose_n and it's always a joy to hear great talks like jfs'.
@jose_n his talk provides a nice segway into mine.
@jose_n as sarnold said, you can find the figures and the notes here in my wiki: http://monkey.org/~jose/wiki/wiki.php?page=DetectingsWorms
@jose_n i didn't have a chance to prepare slides, and this talk is sort of new, so i'm still fleshing it out. i'll gladly take any questions in /msg or ... what channel sarnold?
@jose_n is it #qc? which ones for which languages ...
@jose_n anyhow, let's get to it!
@jose_n this talk will cover much of the material in a book i recently completed
@jose_n about detecting and defending against internet worms.
@jose_n this talk will focus on
@jose_n the core concepts in one of the sections, how to detect worm activity on
@jose_n your network.
@jose_n i'm going to focus on worms like sapphire, code red and nimda, although many of these techniques can be applied to email based worms as well.
@jose_n  
@jose_n this talk uses some of the figures and material from my book, due this
@jose_n october from artech publishing (based in europe).
@xtingray \whois xpato
@jose_n the figures show some real data, as well as some generated traffic and summarized data.
@jose_n all of these figures are copyright 2003 by me, you'll have to talk to my publisher about using them.
@jose_n and i'm not sure of the title right now. when i was working on it it was "defending against internet worms"
@jose_n  
@jose_n very briefly, my talk will have the following structure. i will begin
@jose_n by discussing our core strategies in detecting worms.
@jose_n from there i will talk about the main high level concept, traffic analysis.
@jose_n here we analyze the data to discover the trends seen when worms begin their work, namely traffic rates and characteristics.
@jose_n  from there, we will talk about a new kind of analysis i haven't seen many others do and which i am still developing entitled "correlation analysis."
@jose_n i think this is a powerful technique and one that holds a lot of promise.
@jose_n next i'll talk about some analysis we can make of the actual scan engine in the worm itself, also a powerful technique for understanding its spread.
@jose_n lastly, we'll talk about honeypots and blackhole monitors, some of my favorite methods for examining worm activity.
@jose_n i don't think i have time to talk about signature based techniques like snort or prelude.
@jose_n jfs did  an excellent job at this, and i suggest you look at his talk and materials.
@jose_n  i'm going to focus more on techniques for holistic worm detection and even zero-day worm detection.
@jose_n snort and other signature based methods typically require some knowledge of the attack before they can work.
@jose_n  
@jose_n ##traffic analysis##
@jose_n  
@jose_n at the core of worm detection on a network is the idea of traffic analysis.
@jose_n here we capture traffic and analyze its various properties to deduce shifts in trends.
@jose_n this is typically indicative of something amiss on a network, and a larger analysis of these anomalies can be tied to a worm.
@jose_n of course, to detect a shift in your trends you will have to have baselined your network and understood its normal behavior.
@jose_n these properties can include the protocol and port distributions as well as payload analysis.
@jose_n this technique is best applied to worms which actively seek out targets by random scanning.
@jose_n these techniques can be applied to worms which have predefined lists of targets or use passive methods, but that requires much more data analysis.
@jose_n  when a new worm is introduced to a network, traffic rates for a service will usually start to increase in a fashion that at first looks like an exponential growth pattern, but it quickly becomes a sigmoidal graph.
@jose_n this is shown below for the introduction of a worm last fall which targeted windows systems.
@jose_n http://monkey.org/~jose/presentations/worms-infosec03/figs/137data.png
@jose_n the data was captured on a network for traffic that was destined to port 137/UDP, which is used by windows filesharing.
@jose_n this worm used active scanning techniques to identify targets to attack, causing the background network activity to increase for this service.
@jose_n in this figure you can see the pre-worm phase, where low traffic rates for this service are seen.
@jose_n then, as the worm begins to spread, this traffic increases before approaching a plateau value.
@jose_n this is a typical sigmoidal growth pattern.
@jose_n you can see a lot of variability in this data, even from day to day (this data is based on firewall hits in a given period of time).
@jose_n it can be analyzed by this simple equation to study its growth rate and maximum number of affected hosts:
@jose_n P(t) = (KP0)/(P0-(K-P0)e^-rt)
@jose_n we can fit this equation to various values of r, the variable that sets the rate of growth.
@jose_n shown in the next figure is this equation plotted at several values of "r", showing that as "r" increases the growth rate does as well (in this graph this value is shown as "k").
@jose_n http://monkey.org/~jose/presentations/worms-infosec03/figs/growth1.png
@jose_n obviously, as "r" becomes greater and greater the rate at which we approach our plateau value increases.
@jose_n the value of "r" for code red was significantly lower than that for the sapphire worm.
@jose_n this model will form the central basis for our detection strategies.
@jose_n  
@jose_n  //data sources//
@jose_n  
@jose_n there are two main sets of data sources we can use for our traffic analysis.
@jose_n network traffic can be captured by netflow generation from switches and routers, and also raw packet captures.
@jose_n furthermore, we can even perform logfile analysis to analyze the network's behavior.
@jose_n raw packet captures are usually done by tools like tcpdump and ethereal.
@jose_n here a host listens promiscuously to everyone else's traffic and captures packets for analysis.
@jose_n these can be pre-filtered, for example "show me only tcp traffic to and from this network," or it can be everything and it can be sorted out at a later date.
@jose_n raw packet capture is quickly showing its speed limitations as networks reach gigabit ethernet to the desktop.
@jose_n however, if you wish to examine the payload of the traffic, you will have to capture the raw packets.
@jose_n an alternative strategy is to employ expensive capture/analysis hardware on your infrastructure equipment, such as an rmon2 blade in a cisco catalyst switch.
@jose_n also, because so many packets are captured, they are best analyzed in real time as opposed to bulk analysis at a later date.
@jose_n netflow is an abstraction of the packet layer traffic.
@jose_n in netflow, you get the summaries of the IP headers that passed by, usually aggregated.
@jose_n this is typically done by the infrastructure equipment, such as a switch or a router.
@jose_n a typical netflow representation looks like the following:
@jose_n Start             End               Sif   SrcIPaddress    SrcP  DIf   DstIPaddress    DstP    P Fl Pkts       Octets
@jose_n 1030.22:56:34.0   1030.22:56:34.0   0     10.10.32.1      44262 0     203.36.198.97   80    6   0  1          66
@jose_n in some ways you get a lot more information, for example you can see what interfaces were involved in your traffic.
@jose_n however, you lose all information about the payload of your packets aside from the size.
@jose_n netflow scales incredibly well to larger networks, however, and can be used to detect worms as well as a variety of other attacks on a network.
@jose_n netflow also scales very well to faster networks, with the routers performing "sampling" on the data.
@jose_n in this scenario one of every 1000 packets or so can be sampled (this rate is variable).
@jose_n cisco has netflow tools available, as does caida (in their cflowd toolset), and the ohio state univeristy flow-tools kit is also very useful.
@jose_n lastly, logfile data can be analyzed. for example, worms which attack web servers can leave telltale signs by their requests.
@jose_n for example, Code Red and Nimda left logfile marks that looked like the following:
@jose_n 192.168.37.175 - - [05/Aug/2001:07:53:40 -0400] "GET /default.ida?XXX ... HTTP/1.0" 404 205
@jose_n by analyzing logs on a single system or collectively, worm activity can be measured.
@jose_n  
@jose_n ##correlation analysis##
@jose_n  
@jose_n now that we are capturing our data we have to analyze it.
@jose_n remember that as these worms actively seek out additional hosts to compromise, they'll randomly contact a host and attempt their attacks.
@jose_n furthermore, this list of attacks is usually a well defined list with no variety.
@jose_n knowing this we can put this data together to detect worms at work even without knowing the details of them.
@jose_n correlation analysis presumes that any two events that are linked in some real fashion (ie physically linked or linked to the same source or major event) will occur in the same timeframe.
@jose_n this analysis has been popular in molecular analysis, where two events or detection methods are performed to see if two molecules interact.
@jose_n if the signals show a lot of temporal linkages, the two molecules are probably linked. here we do the same kind of analysis but on network events.
@jose_n the first kind of correlation analysis i have been promoting is called "cross correlation" analysis.
@jose_n here the analyst looks for the presence of two events with the same signature and looks at the time interval between them.
@jose_n in the case of a worm like slapper, which affected mod_ssl servers, the worm's attack was preceded by a scan of a network to identify web servers to attack.
@jose_n because of this, there was a shorter time interval between the scans of a network on port 80 and the attack signature of the worm (in this case faults on the  server).
@jose_n normal networks see random time intervals between these two types of events, scans and attacks, but when worms become active this time interval is shorter.
@jose_n furthermore, many observations of this scan and attack behavior will be seen, lending weight to the correlation of these events.
@jose_n this next figure shows a small data set for the slapper worm on a small test network.
@jose_n http://monkey.org/~jose/presentations/worms-infosec03/figs/cross-correlation.png
@jose_n the y axis shows the number of events and the x axis shows the time interval between the scan and the worm's attack.
@jose_n obviously this data is difficult to obtain in real time for zero day threats, because it is very hard to know without pre-knowledge of the worm's behavior what two events will be linked.
@jose_n however, generic filters can be set up which look for scans of a network or a host on a port followed by attacks occurring on this port.
@jose_n these attacks can be identified by your IDS or by logfile analysis which shows the service beginning a round of abnormal actions.
@jose_n a more powerful, and easier to calculate, correlation is the "auto correlation".
@jose_n in this analysis, the time interval between two observations of the //same// activity.
@jose_n this is referred to as "more powerful" because it is easier to track in real time.
@jose_n in this case this would be successive rounds of the SSL server faulting.
@jose_n a data set which has been analyzed using this technique is shown in this figure: http://monkey.org/~jose/presentations/worms-infosec03/figs/auto-correlation.png
@jose_n in this figure, on the y axis we have the number of observations at any time interval and on the x axis we have the time interval between two of the same worm events.
@jose_n here we can see a strong tendency for a short interval between two successive worm attacks (in this case requests for the file "root.exe").
@jose_n under normal operating conditions the interval between two of the same events will be random.
@jose_n this analysis has a problem with two types of events.
@jose_n the first is the "slashdot" effect, where a website becomes popular immediately and the traffic to it increases significantly.
@jose_n the second is a distributed denial of service attack, where many sites will attack a network or a host.
@jose_n in each case the basic analysis will look like this as the requests come in at short intervals.
@jose_n however, secondary analysis of the data will reveal that it is not a worm but another anomaly.
@jose_n  
@jose_n ##analyzing the scan engine##
@jose_n  
@jose_n traffic analysis can be greatly assisted by the isolated analysis of the engine which generates the random network addresses for the worms to attack.
@jose_n in this case it requires either access to the source code of the worm or a de-compilation of the worm's engine.
@jose_n in the figures below we can see the results of this analysis on two different worms' random target generators.
@jose_n the first is the worm "SQLsnake" which used a pre-defined list of networks to scan.
@jose_n the worm had been coded in javascript and biased to scan some network segments more heavily than others, basing this on the number of hosts in any of these networks.
@jose_n however, it would fall back to random network addresses to find additional hosts.
@jose_n http://monkey.org/~jose/presentations/worms-infosec03/figs/SQLsnake.png
@jose_n this bias for some networks over others is immediately visible in this plot of the network address against the frequency of observations.
@jose_n this means that some networks, like the cable modem pool in north america on 64/8, will get hit far more frequently than networks like 50/8
@jose_n another worm engine was analyzed, the one from the Slapper worm.
@jose_n here 1000 data points were analyzed showing that the worm scanned some networks randomly but others not at all.
@jose_n this is a by-product of its technique for generating addresses, where it would focus on the bulk of the network which is occupied.
@jose_n http://monkey.org/~jose/presentations/worms-infosec03/figs/Slapper.png
@jose_n what we can learn from this is how the worm is likely to spread and how we can detect it and defend against it.
@jose_n using this information we can set up detection tools in the right places and monitor its activity more accurately than if we were placed in a blind spot.
@jose_n  
@jose_n ##blackhole detectors and honeypots##
@jose_n  
@jose_n next up is one of my favorite detection techniques, that of the blackhole network.
@jose_n these networks are subnets or networks which are unused but routable on your network (meaning they will receive packets).
@jose_n their value lies in the fact that a worm which is actively spreading will attempt to contact hosts in this space, while no other traffic will go there normally.
@jose_n while it will fail to spread to this space, you can detect this attempt and correlate it to an anomalous event.
@jose_n a simple tcpdump filter for three subnets would look like the following:
@jose_n tcpdump -ni fxp1 net 10.11.12.32/27 or 10.11.12.96/27 or 10.11.12.128/27 -w /var/log/blackhole.pcap
@jose_n here we capture the data to these in the file "blackhole.pcap", which we can analyze later.
@jose_n you can also catch people who scan your network, but if you start to see an exponential growth in the number of hosts performing these connection attempts to this space, you probably have a worm on your network.
@jose_n a simple shell script which can be used to analyze this data is shown on the page at http://monkey.org/~jose/wiki/wiki.php?page=DetectingsWorms
@jose_n this isn't as efficient as it could be (it could be better coded in C, python, or perl), but it illustrates how you could summarize this data. without it, finding trends can be difficult.
@jose_n [script omitted, it is too big]
@jose_n you may even want to graph this data, as that makes it really easy to spot the trends.
@jose_n for example, you can graph the destination ports over time along with the source addresses, and any increases in plot density in any spots will indicate an issue.
@jose_n the visual display of a lot of information is very useful in this scenario.
@jose_n now when we run this program over our captured packet data, we'll see the summary of our blackhole activity:
@jose_n $ tcpdump -ntttr /var/log/blackhole.pcap | ./process.sh
@jose_n packet logs from Nov 03 18:11:48.527192 to Nov 04 13:01:56.111967
@jose_n top ten source addresses are:
@jose_n   45 65.4.18.253
@jose_n [the rest is omitted]
@jose_n here we have a few hits, and if we compare this to the previous time period's activity we may start to see a trend.
@jose_n this is why it's sometimes good to graph the traffic, such as the number of sources and the number of destinations per time-period.
@jose_n trends will stand out right away. you can get this data by using netflow, also.
@jose_n a very powerful technique for fingerprinting this activity is to run a minimal honeypot in this blackhole.
@jose_n here you use a program like "arpd" to accept traffic for an entire subnet and then you send reply packets for connection requests (such as a SYN-ACK in response to a SYN), but nothing more.
@jose_n the first data packet will contain enough of a payload to fingerprint most worms (but not all).
@jose_n honeypots, in contrast, can be used as a response mechanism to a new worm, such as the 137/UDP based worm shown earlier.
@jose_n here, you can observe the network's traffic and notice shifts which may indicate a worm.
@jose_n then you can set up a host to act as a honeypot and become attacked and compromised by this worm, and then pull it offline and analyze the data.
@jose_n you get a very deep view of the worm, and usually the worm binaries and executables, but you typically don't get wide coverage.
@jose_n tools like "honeyd" from niels provos are excellent, but because they cannot be compromised like the real host, they wont help you capture some worm executables.
@jose_n they will work for some worms, however., like windows filesharing worms.
@jose_n remember that honeypots can also start to attack out, meaning you have to control its behavior to be a good citizen.
@jose_n a honeypot is labor intensive, meaning you have to identify what is the likely attack vector, build and deploy a system to mimic this and allow it to be compromised, and then you have to analyze the data afterwards.
@jose_n for some network security operators, this may not be worth the time, all they want to know is what hosts to firewall off and on what services.
@jose_n still, for the researcher or someone who builds their own signatures for their IDS, proxy scanner, or reactive system, this is a needed step.
@jose_n  
@jose_n ##conclusions##
@jose_n  
@jose_n this is only a quick pass over the techniques of detecting worms.
@jose_n it's important to quickly and accurately detect new worms, such as the next code red and sapphire, if you are to protect your network.
@jose_n the basic premise here is to watch your network and detect trends, typically when more and more hosts start acting in an anomalous way.
@jose_n without this information its difficult to deploy network defenses.
@jose_n these techniques have proven themselves useful to a growing number of people, but they are still in their infancy.
@jose_n thank you for your time and attention, i hope this has been interesting.
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@sarnold clap clap clap
@sarnold clap clap clap
@jose_n gracias a todo/merci bocoup!
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@jose_n i'll happily take any questions
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
@ismak plas plas plas plas
pipe_ very good! clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@jose_n gracias mjesus, sarnold, emporer, ismak, fernand0, vizard, a todos!
SeTH-CL muy bueno ..
frodnix really interesting! thank you!
@ismak congratulations
garoeda clap clap clap clap clap
garoeda clap clap clap clap clap
@jose_n it's interesting. this whole correlation analysis was used a lot in single molecule spectroscopy
@MJesus also emperor, xpato. ed0 are traslating to Spanish in #redes aand garoeda, xtingray....
@jose_n back when i was a biochemist (just last year) this was becoming a popular technique. you look at one molecule.
@jose_n so, you want to see if it bound to something else, so you tag them both and then you watch the two tags float around
@jose_n if they float at the same speed and in the same directions, they're probably physically linked
FENlX CLAP CLAP
FENlX CLAP CLAP
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@MJesus clap clap clap clap clap clap clap clap clap clap
@jose_n so, i looked at this and thought "hey, i bet we can do the same with worm detection" and lo and behold it's proven to be a powerful technique
@MJesus very nice work
alcaudon_ thank you!
@jose_n it's a little more complicated than i made it out to be, but we have an implementation in one of our products
@jose_n really impressive stuff, i was pleasantly surprised it worked so well.
@jose_n :)
@sarnold jose_n: cool! :)
@jose_n but our blackhole stuff is also really powerful as a detection technique
frodnix So you have an implementation already?
@jose_n yep. not available as open source, though. its part of a product
fco 2/whois jose_n
@jose_n one of our key developers whipped it up over a weekend.
@jose_n we've been fine tuning it, but its been pretty stable since then.
@jose_n any questions :) ?
@jfs jose_n have you compared your findings with the statistical analysis done by the Honeynet project?
@jose_n our blackhole detection you mean?
@jfs jose_n: no, the traffic flow statictics
@jose_n ah ...
@jose_n well, we're monitoring a different aspect of the internet than they are
@jose_n we get complementary results, but we also get results that don't see and they get stuff we don't see
@jose_n its just how it is ...
@jfs jose_n: however, blackhole detection behaves similar to a honeypot (w/o interaction), does it not?
@jose_n it can.
@jose_n for example niels now has a couple of /24's he's using to catch spam and open proxy scanners
@jose_n we see their hits but we dont fingerprint what they send us
@jose_n but neither of us see many scans for this, compared to other stuff (like 80, 445 ...)
@jose_n but niels gets data telling him who's trying to go where and stuff
@jfs jose_n: but, regarding new attacks (0 days?) have you analysed any trends in scanning before an attack (or worm) is launched
@jose_n heh ... we have some interesting data in that area i hope we can make public soon.
@jfs jose_n: great to hear, will open my ears :-)
@jose_n :)
@MJesus jose, there are not this programs in freesoft ?
@jfs jose_n: still, has there be any other scientific measurements of Internet attacks, trends or exploitations besides yours, Honeynet's or CAIDA's (I'm missing some bibliography in your presentation)
@jose_n MJesus: no, you have to pay for our stuff (this is arbor software)
@jose_n jfs: you seen william arbaugh's stuff?
@jfs jose_n: umm, no, I don't think so
@jose_n he looked at the interval between bugtraq/cert advisories, scans, and attacks
@jose_n cool stuff.
@sarnold william arbaugh == cool
@jose_n i'm hoping to get some papers out soon on long term trends in ddos and worm space ..
@jfs jose_n: there was a paper published in IEEE Computer about CERT advisories and scans, but it is quite outdated
@jose_n based on our data. i think we are cleared to do this ...
@jose_n jfs: when was it?
@jfs jose_n: after all, it talked about PHP-fi
* jose_n nods
@jfs jose_n: quite some time ago, let me check, I might have it around
@jfs jose_n: oh yes, it's William's
@jose_n ok ... yeah, i would like to see an updated version of that ...
@jose_n i think we have some data which could help with it ...
@jfs Windows of Vulnerability: A Case Study Analisis,   William A. Arbaugh, William L. Fithen, John McHugh. IEEE Computer,  December 2000,   (Vol. 33, No. 12, pg 52-59)
@jose_n yep, thats him.
@jfs jose_n: I haven't seen any recent (decent? :-) analysis on that
@jose_n his analysis is ok. it just has too few data points ...
@jfs jose_n: but obviously it's also based on CERT data so it's not complete (and no one has free access to those to validate)
@jfs jose_n: there was also a paper on IEEE Proceedings
@sarnold jfs: semi-related paper: http://lwn.net/Articles/15497/
sheg donde puedo checar las demas conferencias que habran despues???
@jfs I recently wrote a paper on a spanish magazine on "early warning systems", I hope jose proves my (theoretical) point there :-)
sheg alguien sabe el URL
@jose_n jfs: got a copy?
@jose_n translated :) ?
@jose_n sounds very interesting ...
@jfs jose_n: no, not yet, still only in spanish
@jose_n k
@jfs sarnold: thanks, will read (I think I haven't)
@jose_n i'm very curious to see this.

Generated by irclog2html.pl 2.1 by Jeff Waugh - find it at freshmeat.net!