Now that I’ve slacked off for a few weeks and indulged myself in teasing our terroristic friend Samir Khan, it’s time to get back to some serious work. I’d like to direct your attention to a Counter Terrorism project of truly epic proportions, that being the “Dark Web” Counter Terrorism research project underway at the Artificial Intelligence Lab, University of Arizona. After reading about this project at Dancho Danchevs blog I’ve been spending quite a bit of research time over at the AI project site studying thier methodology.
The stated research goals of this project are as follows:
The AI Lab Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect “ALL” web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc.
We have developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics (technical sophistication) analysis, sentiment analysis, authorship analysis, and video analysis in our research.
The approaches and methods developed in this project contribute to advancing the field of Intelligence and Security Informatics (ISI). Such advances will help related stakeholders to perform terrorism research and facilitate international security and peace.
It is our belief that we (US and allies) are facing the dire danger of losing the “The War on Terror” in cyberspace (especially when many young people are being recruited, incited, infected, and radicalized on the web) and we would like to help in our small (computational) way.
Now then, at first glance that doesnt seem all that impressive, let’s dig a little deeper. The Dark Web project is not your typical “vigilante” (thanks Mr. Moss) homegrown cyber-terrorism research effort, it is a well funded, long term, counter terrorism project recieving grants from the Department of Homeland Security, the National Science Foundation and others. In short, the project uses web crawlers to gather information from a (large) list of target sites and forums. This data is then indexed and data mined for actionable information. I once considered a similar method of data acquisition but dismissed it for more targetted methods after considering the amount of computational resources it would take. The Dark Web project has been indexing sites for about five years and have the following to show for their efforts.
Claims: Dr. Gabriel Weimann of the University of Haifa has estimated that there are about 5,000 terrorist web sites as of 2006. Based on our actual spidering experience over the past 5 years, we believe there are about 50,000 sites of extremist and terrorist content as of 2007, including: web sites, forums, blogs, social networking sites, video sites, and virtual world sites (e.g., Second Life). The largest increase in 2006-2007 is in various new Web 2.0 sites (forums, videos, blogs, virtual world, etc.) in different languages (i.e., for home-grown groups, particularly in Europe). We have found significant terrorism content in more than 15 languages.
Testbed: We collect (using computer programs) various web contents every 2 to 3 months; we started spidering in 2002. Currently we only collect the complete contents of about 1,000 sites, in Arabic, Spanish, and English languages. We also have partial contents of about another 10,000 sites. In total, our collection is about 2 TBs in size, with close to 500,000,000 pages/files/postings from more than 10,000 sites.
We believe our Dark Web collection is the largest open-source extremist and terrorist collection in the academic world. (We have no way of knowing what the intelligence, justice, and defense agencies are doing.) Researchers can have graded access to our collection by contacting our research center.
Now, that is impressive. Additionally, the Dark Web researchers perform Social Network Analysis on the data gathered to determine the relationships of online content authors. It is important to realize that these researchers are mathmeticians, not counter terrorism agents, they are applying science to the issue of online Terrorism in an attempt to understand the phenomena.
They describe themselves thusly:
A Few Words about Civil Liberties and Human Rights: The Dark Web project is NOT like Total Information Awareness (TIA) (at least we try very hard not to be like it). This is not a secretive government project conducted by spooks. We perform scientific, longitudinal hypothesis-guided terrorism research like other terrorism researchers (who have done such research for 30+ years). However we are clearly more computationally-oriented; unlike other traditional terrorism research that relies on sociology, communications, and policy based methodologies. Our contents are open source in nature (similar to Google’s contents) and our major research targets are international, Jihadist groups, not regular citizens. Our researchers are primarily computer and information scientists from all over the world. We develop computer algorithms, tools, and systems. Our research goal is to study and understand the international extremism and terrorism phenomena. Some people may refer to this as understanding the “root cause of terrorism.”
There is much much more in depth information at the Dark Web Project site, pay special attention to the Journal Articles, Conference Papers and Presentations links at the bottom of the page and you should stay busy for quite some time.
In closing I’ll quote the following:
As an NSF-funded research project, our research team has generated significant findings and publications in major computer science and information systems journals and conferences. However, we have taken great care not to reveal sensitive group information or technical implementation details (specifics). We hope our research will help educate the next generation of cyber/Internet savvy analysts and agents in the intelligence, justice, and defense communities.
It does indeed.