summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README251
-rw-r--r--README.security90
-rwxr-xr-xsetup.py2
3 files changed, 172 insertions, 171 deletions
diff --git a/README b/README
index 6c29d86..933df92 100644
--- a/README
+++ b/README
@@ -1,189 +1,100 @@
1METADATA: 1METADATA
2 Metadata consist of information that characterizes data. 2========
3 Metadata are used to provide documentation for data products. 3Metadata consist of information that characterizes data.
4 In essence, metadata answer who, what, when, where, why, and how about 4Metadata are used to provide documentation for data products.
5 every facet of the data that are being documented. 5In essence, metadata answer who, what, when, where, why, and how about
6every facet of the data that are being documented.
6 7
8METADATA AND PRIVACY
9====================
10Metadata within a file can tell a lot about you.
11Cameras record data about when a picture was taken and what
12camera was used. Office documents like PDF or Office automatically adds
13author and company information to documents and spreadsheets.
14Maybe you don't want to disclose those information on the web.
7 15
8METADATA AND PRIVACY: 16WARNINGS
9 Metadata within a file can tell a lot about you. 17========
10 Cameras record data about when a picture was taken and what 18See README.security
11 camera was used. Office documents like PDF or Office automatically adds
12 author and company information to documents and spreadsheets.
13 Maybe you don't want to disclose those information on the web.
14 19
20DEPENDENCIES
21============
22 * python2.7 (at least)
23 * python-hachoir-core and python-hachoir-parser
24 * python-pdfrw, python-gi-cairo for full PDF support
25 * python-gi, python-gobject for the GUI
26 * shred (should be already installed)
15 27
16WARNING : 28OPTIONALS DEPENDENCIES
17 Mat only removes metadata from your files, it does not anonymise their 29======================
18 content, nor can it handle watermarking, steganography, or any too custom 30 * python-mutagen : for massive audio format support
19 metadata field/system. 31 * exiftool : for _massive_ image format support
20 32
21 If you really want to be anonym, use format that does not contain any 33USAGE
22 metadata, or better : use plain-text. 34=====
35 mat --help
36or
23 37
38 mat-gui
24 39
25DEPENDENCIES: 40SUPPORTED FORMAT
26 python2.7 (at least) 41================
27 python-hachoir-core and python-hachoir-parser 42See FORMATS
28 python-pdfrw for full PDF support
29 python-gi, python-gi-cairo, python-gobject for the GUI
30 shred (should be already installed)
31 43
44HOW TO IMPLEMENT NEW FORMATS
45============================
461. Add the format's mimetype to the STRIPPER list in mat.py
472. Inherit the GenericParser class (parser.py)
483. Read the parser.py module
494. Implement at least these three methods:
50 - is_clean(self)
51 - remove_all(self)
52 - get_meta(self)
535. Don't forget to call the do_backup() method when necessary
32 54
33OPTIONALS DEPENDENCIES: 55HOW TO LAUNCH THE TESTSUITE
34 python-mutagen : for massive audio format support 56===========================
35 exiftool : for _massive_ image format support 57 cd ./test
58 python test.py
36 59
60LINKS
61=====
62* Official website: https://mat.boum.org
63* Bugtracker : https://labs.riseup.net/code/projects/mat
64* Git repo: https://gitweb.torproject.org/user/jvoisin/mat.git
37 65
38USAGE: 66CONTACT
39 mat --help 67=======
40 or 68If you have question, patches, bug reports, or simply want to talk about this project,
41 mat-gui 69please use the mailing list (https://mailman.boum.org/listinfo/mat-dev).
70You can also contact contact jvoisin
71on irc.oftc.net or at julien.voisin@dustri.org.
42 72
73LICENSE
74=======
75This program is free software; you can redistribute it and/or modify
76it under the terms of the GNU General Public License version 2 as
77published by the Free Software Foundation.
43 78
44SUPPORTED FORMAT: 79This program is distributed in the hope that it will be useful,
45 Portable Network Graphics (.png) 80but WITHOUT ANY WARRANTY; without even the implied warranty of
46 support : full 81MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
47 metadata : textual metadata + date 82GNU General Public License for more details.
48 method : removal of harmful fields is done with hachoir
49 83
84You should have received a copy of the GNU General Public License
85along with this program; if not, write to the Free Software
86Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
87MA 02110-1301, USA.
50 88
51 Jpeg (.jpeg, .jpg) 89Copyright 2011-2013 Julien Voisin <julien.voisin@dustri.org>
52 support : full
53 metadata : comment + exif/photoshop/adobe
54 method : removal of harmful fields is done with hachoir
55 90
56 91
57 Open Document (.odt, .odx, .ods, ...) 92THANKS
58 support : full 93======
59 metadata : a meta.xml file 94Mat would not exist without :
60 method : removal of the meta.xml file
61 95
96 * the Google Summer of Code,
97 * the hachoir library,
98 * people on #tails@oftc
62 99
63 Office Openxml (.docx, .pptx, .xlsx, ...) 100Many thanks to them !
64 support : full
65 metadata : a docProps folder containings xml metadata files
66 method : removal of the docProps folder
67
68
69 Portable Document Fileformat (.pdf)
70 support : full
71 metadata : a lot
72 method : rendering of the PDF file on a cairo surface with the help of
73 poppler in order to remove all the internal metadata.
74 For now, cairo create some metadata.
75 They can be remove if you install either exiftool, or python-pdfrw.
76 The next version of python-cairo will support PDF metadata.
77
78
79 Tape ARchive (.tar, .tar.bz2, .tar.gz)
80 support : full
81 metadata : metadata from the file itself, metadata from the file contained
82 into the archive, and metadata added by tar to the file at then
83 creation of the archive
84 method : extraction of each file, treatement of the file, add treated file
85 to a new archive, right before the add, remove the metadata added by tar
86 itself. When the new archive is complete, remove all his metadata.
87
88
89 Zip (.zip)
90 support : .partial
91 metadata : metadata from the file itself, metadata from the file contained
92 into the archive, and metadata added by zip to the file when added to
93 the archive.
94
95 method : extraction of each file, treatement of the file, add treated file
96 to a new archive. When the new archive is complete, remove all his metadata
97
98
99 MPEG Audio (.mp3, .mp2, .mp1)
100 support : full
101 metadata : id3
102 method : removal of harmful fields is done with hachoir
103
104
105 Ogg Vorbis (.ogg)
106 support : full
107 metadata : Vorbis
108 method : removal of harmful fields is done with mutagen
109
110
111 Free Lossless Audio Codec (.flac)
112 support : full
113 metadata : Flac, Vorbis
114 method : removal of harmful fields is done with mutagen
115
116 Torrent (.torrent)
117 support : full
118 metadata : torrent
119 method : using the nice bencode lib by Petru Paler,
120 heavily tuned/rewritten.
121
122
123HOW TO IMPLEMENT NEW FORMATS:
124 1. add the format's mimetype to the STRIPPER list in mat.py
125 2. inherit the GenericParser class (parser.py)
126 3. read the parser.py module
127 4. implement at least these three methods:
128 - is_clean(self)
129 - remove_all(self)
130 - get_meta(self)
131 5. don't forget to call the do_backup() method when necessary
132
133
134HOW TO LAUNCH THE TESTSUITE:
135 1. cd ./test
136 2. python test.py : launch all testsuites
137 3. python clitest.py : launch the testsuite for the CLI
138 4. python libtest.py : launch the testsuite for the mat internal library
139
140
141ALTERNATIVES AND COMPLEMENTS:
142for images:
143 exiftool (perl) : metadata manipulation
144 exiv2 (C++) : metadata manipulation
145 graphicsmagick (a fork from imagemagick) : cli image manipulation
146
147for PDF:
148 pdfminer (python) : PDF manipulation
149
150other tools:
151 an hexadecimal editor
152
153
154NOTES:
155 Formats that are not in the test suite are not well-tested,
156 please don't trust the MAT about them !
157
158
159LICENSE:
160 This program is free software; you can redistribute it and/or modify
161 it under the terms of the GNU General Public License version 2 as
162 published by the Free Software Foundation.
163
164 This program is distributed in the hope that it will be useful,
165 but WITHOUT ANY WARRANTY; without even the implied warranty of
166 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
167 GNU General Public License for more details.
168
169 You should have received a copy of the GNU General Public License
170 along with this program; if not, write to the Free Software
171 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
172 MA 02110-1301, USA.
173
174 Copyright © 2011-2013 Julien Voisin <julien.voisin@dustri.org>
175
176
177THANKS:
178 Mat would not exist without :
179 - the Google Summer of Code,
180 - the Python language
181 - the amazing (and messy) hachoir library,
182 - poppler and cairo's python bindings,
183 - and the mutagen library
184 - people on #tails@freenode
185 many thanks to them !
186
187
188KNOWN BUGS:
189 Zipfiles are not totally cleaned, I know.
diff --git a/README.security b/README.security
new file mode 100644
index 0000000..335c537
--- /dev/null
+++ b/README.security
@@ -0,0 +1,90 @@
1Warning
2=======
3Mat only removes metadata from your files, it does not anonymise their
4content, nor can it handle watermarking, steganography, or any too custom
5metadata field/system.
6
7If you really want to be anonymous format that does not contain any
8metadata, or better : use plain-text.
9
10Implementation notes
11======================
12Symlink attacks
13---------------
14MAT output predictable filenames (like yourfile.jpg.bak).
15This may lead to symlink attack. Please check if you OS prevent
16against them
17
18Test suite
19----------
20Formats that are not in the test suite are not well-tested,
21please do not trust the MAT about them!
22
23Threat Model
24============
25The Metadata Anonymisation Toolkit adversary has a number
26of goals, capabilities, and counter-attack types that can be
27used to guide us towards a set of requirements for the MAT.
28
29Adversary
30------------
31
32* Goals:
33
34 - Identifying the source of the document, since a document
35 always has one. Who/where/when/how was a picture
36 taken, where was the document leaked from and by
37 whom, ...
38
39 - Identify the author; in some cases documents may be
40 anonymously authored or created. In these cases,
41 identifying the author is the goal.
42
43 - Identify the equipment/software used. If the attacker fails
44 to directly identify the author and/or source, his next
45 goal is to determine the source of the equipment used
46 to produce, copy, and transmit the document. This can
47 include the model of camera used to take a photo, or
48 which software was used to produce an office document.
49
50
51* Adversary Capabilities - Positioning
52 - The adversary created the document specifically for this
53 user. This is the strongest position for the adversary to
54 have. In this case, the adversary is capable of inserting
55 arbitrary, custom watermarks specifically for tracking
56 the user. In general, MAT cannot defend against this
57 adversary, but we list it for completeness.
58
59 - The adversary created the document for a group of users.
60 In this case, the adversary knows that they attempted to
61 limit distribution to a specific group of users. They may
62 or may not have watermarked the document for these
63 users, but they certainly know the format used.
64
65 - The adversary did not create the document, the weakest
66 position for the adversary to have. The file format is (most of the time)
67 standard, nothing custom is added: MAT
68 should be able to remove all meta-information from the
69 file.
70
71Requirements
72---------------
73
74* Processing
75 - The MAT *should* avoid interactions with information.
76 Its goal is to remove metadata, and the user is solely
77 responsible for the information of the file.
78
79 - The MAT *must* warn when encountering an unknown
80 format. For example, in a zipfile, if MAT encounters an
81 unknown format, it should warn the user, and ask if the
82 file should be added to the anonymised archive that is
83 produced.
84
85 - The MAT *must* not add metadata, since its purpose is to
86 anonymise files: every added items of metadata decreases
87 anonymity.
88
89 - The MAT *must* handle unknown/hidden metadata fields,
90 like proprietary extensions of open formats.
diff --git a/setup.py b/setup.py
index 439c99c..48501b4 100755
--- a/setup.py
+++ b/setup.py
@@ -29,7 +29,7 @@ setup(
29 ( 'share/applications', ['mat.desktop'] ), 29 ( 'share/applications', ['mat.desktop'] ),
30 ( 'share/mat', ['data/FORMATS', 'data/mat.ui'] ), 30 ( 'share/mat', ['data/FORMATS', 'data/mat.ui'] ),
31 ( 'share/pixmaps', ['data/mat.png'] ), 31 ( 'share/pixmaps', ['data/mat.png'] ),
32 ( 'share/doc/mat', ['README', 'TODO'] ), 32 ( 'share/doc/mat', ['README', 'TODO', 'README.security'] ),
33 ( 'share/man/man1', ['mat.1', 'mat-gui.1'] ), 33 ( 'share/man/man1', ['mat.1', 'mat-gui.1'] ),
34 ( 'share/nautilus-python/extensions', ['nautilus/nautilus-mat.py']) 34 ( 'share/nautilus-python/extensions', ['nautilus/nautilus-mat.py'])
35 ], 35 ],