Skip to main content

Probabilistic Redaction

Joe Loughry

Abstract

An automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automatically flags candidate words, phrases, sentences, and paragraphs in documents under review that are likely classified and suggests redactions to make the document unclassified. Security classification guidance from more than one guide at a time is figured into each suggested redaction. The probabilistic aspect of operation is in the way the system prioritizes its suggestions according to the measured rate of occurrence of words and phrases observed

Address
Rome‚ New York
Institution
United States Air Force Research Laboratory (AFRL)
Month
31 October
Note
In press
Number
FA8750−09−C−0006
Year
2011