summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorjvoisin2018-04-01 15:36:45 +0200
committerjvoisin2018-04-01 15:36:45 +0200
commit7992cd0d51c3b858f36e74abd76ceef986b51df8 (patch)
treeaecc6a3501199cf2418f8d820432fe4c50d17b9f
parent9e7a4bd217c314a0a86bf9e794f0fda4392a19d9 (diff)
Add some documentation
-rw-r--r--doc/implementation_notes.md33
-rw-r--r--doc/threat_model.md85
2 files changed, 118 insertions, 0 deletions
diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md
new file mode 100644
index 0000000..bc83671
--- /dev/null
+++ b/doc/implementation_notes.md
@@ -0,0 +1,33 @@
1Implementation notes
2====================
3
4Symlink attacks
5---------------
6
7MAT2 output predictable filenames (like yourfile.jpg.cleaned).
8This may lead to symlink attack. Please check if you OS prevent
9against them
10
11Archives handling
12-----------------
13
14MAT2 doesn't support archives yet, because we haven't found an usable way to ask the user
15what to do when a non-supported files are encountered.
16
17PDF handling
18------------
19
20MAT was doing some kind of rendering for PDF files, on a cairo surface, then
21printed it to a file. This kept the text selectable, but unfortunately, it
22didn't remove any *deep metadata*, like the ones in embedded pictures. This was
23on of the reason MAT was abandoned: the absence of satisfying solution to
24handle PDF. But apparently, people are ok with [pdf redact
25tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply
26transform the PDF into images. So this is what's MAT2 is doing too.
27
28Images handling
29---------------
30
31When possible, images are handled like PDF: rendered on a surface, then saved
32to the filesystem. This ensures that every metadata is removed.
33
diff --git a/doc/threat_model.md b/doc/threat_model.md
new file mode 100644
index 0000000..6d14ca6
--- /dev/null
+++ b/doc/threat_model.md
@@ -0,0 +1,85 @@
1Threat Model
2============
3The Metadata Anonymisation Toolkit 2 adversary has a number
4of goals, capabilities, and counter-attack types that can be
5used to guide us towards a set of requirements for the MAT2.
6
7This is an overhaul of MAT's (the first iteration of the software) one.
8
9Warnings
10--------
11
12Mat only removes standard metadata from your files, it does _not_:
13
14 - anonymise their content
15 - handle watermarking
16 - handle steganography
17 - handle any non-standard metadata field/system
18
19If you really want to be anonymous format that does not contain any
20metadata, or better : use plain-text. And as usual, think before clicking.
21
22
23Adversary
24------------
25
26* Goals:
27
28 - Identifying the source of the document, since a document
29 always has one. Who/where/when/how was a picture
30 taken, where was the document leaked from and by
31 whom, ...
32
33 - Identify the author; in some cases documents may be
34 anonymously authored or created. In these cases,
35 identifying the author is the goal.
36
37 - Identify the equipment/software used. If the attacker fails
38 to directly identify the author and/or source, his next
39 goal is to determine the source of the equipment used
40 to produce, copy, and transmit the document. This can
41 include the model of camera used to take a photo, or
42 which software was used to produce an office document.
43
44
45* Adversary Capabilities - Positioning
46 - The adversary created the document specifically for this
47 user. This is the strongest position for the adversary to
48 have. In this case, the adversary is capable of inserting
49 arbitrary, custom watermarks specifically for tracking
50 the user. In general, MAT cannot defend against this
51 adversary, but we list it for completeness.
52
53 - The adversary created the document for a group of users.
54 In this case, the adversary knows that they attempted to
55 limit distribution to a specific group of users. They may
56 or may not have watermarked the document for these
57 users, but they certainly know the format used.
58
59 - The adversary did not create the document, the weakest
60 position for the adversary to have. The file format is (most of the time)
61 standard, nothing custom is added: MAT
62 should be able to remove all meta-information from the
63 file.
64
65Requirements
66---------------
67
68* Processing
69 - The MAT2 *should* avoid interactions with information.
70 Its goal is to remove metadata, and the user is solely
71 responsible for the information of the file.
72
73 - The MAT2 *must* warn when encountering an unknown
74 format. For example, in a zipfile, if MAT encounters an
75 unknown format, it should warn the user, and ask if the
76 file should be added to the anonymised archive that is
77 produced.
78
79 - The MAT2 *must* not add metadata, since its purpose is to
80 anonymise files: every added items of metadata decreases
81 anonymity.
82
83 - The MAT2 *should* handle unknown/hidden metadata fields,
84 like proprietary extensions of open formats.
85