diff options
| author | jvoisin | 2013-07-17 14:41:58 +0200 |
|---|---|---|
| committer | jvoisin | 2013-07-17 14:41:58 +0200 |
| commit | b9cb97f2dc6d84156e93cbcfce768340db862955 (patch) | |
| tree | 4bc9cbb2350d7ef2ec2b73051f7f70f9f3602ca4 | |
| parent | 6cec127a5defcb395b855e7f3241462eb3d4e7dc (diff) | |
Split (and update) the README
| -rw-r--r-- | README | 251 | ||||
| -rw-r--r-- | README.security | 90 | ||||
| -rwxr-xr-x | setup.py | 2 |
3 files changed, 172 insertions, 171 deletions
| @@ -1,189 +1,100 @@ | |||
| 1 | METADATA: | 1 | METADATA |
| 2 | Metadata consist of information that characterizes data. | 2 | ======== |
| 3 | Metadata are used to provide documentation for data products. | 3 | Metadata consist of information that characterizes data. |
| 4 | In essence, metadata answer who, what, when, where, why, and how about | 4 | Metadata are used to provide documentation for data products. |
| 5 | every facet of the data that are being documented. | 5 | In essence, metadata answer who, what, when, where, why, and how about |
| 6 | every facet of the data that are being documented. | ||
| 6 | 7 | ||
| 8 | METADATA AND PRIVACY | ||
| 9 | ==================== | ||
| 10 | Metadata within a file can tell a lot about you. | ||
| 11 | Cameras record data about when a picture was taken and what | ||
| 12 | camera was used. Office documents like PDF or Office automatically adds | ||
| 13 | author and company information to documents and spreadsheets. | ||
| 14 | Maybe you don't want to disclose those information on the web. | ||
| 7 | 15 | ||
| 8 | METADATA AND PRIVACY: | 16 | WARNINGS |
| 9 | Metadata within a file can tell a lot about you. | 17 | ======== |
| 10 | Cameras record data about when a picture was taken and what | 18 | See README.security |
| 11 | camera was used. Office documents like PDF or Office automatically adds | ||
| 12 | author and company information to documents and spreadsheets. | ||
| 13 | Maybe you don't want to disclose those information on the web. | ||
| 14 | 19 | ||
| 20 | DEPENDENCIES | ||
| 21 | ============ | ||
| 22 | * python2.7 (at least) | ||
| 23 | * python-hachoir-core and python-hachoir-parser | ||
| 24 | * python-pdfrw, python-gi-cairo for full PDF support | ||
| 25 | * python-gi, python-gobject for the GUI | ||
| 26 | * shred (should be already installed) | ||
| 15 | 27 | ||
| 16 | WARNING : | 28 | OPTIONALS DEPENDENCIES |
| 17 | Mat only removes metadata from your files, it does not anonymise their | 29 | ====================== |
| 18 | content, nor can it handle watermarking, steganography, or any too custom | 30 | * python-mutagen : for massive audio format support |
| 19 | metadata field/system. | 31 | * exiftool : for _massive_ image format support |
| 20 | 32 | ||
| 21 | If you really want to be anonym, use format that does not contain any | 33 | USAGE |
| 22 | metadata, or better : use plain-text. | 34 | ===== |
| 35 | mat --help | ||
| 36 | or | ||
| 23 | 37 | ||
| 38 | mat-gui | ||
| 24 | 39 | ||
| 25 | DEPENDENCIES: | 40 | SUPPORTED FORMAT |
| 26 | python2.7 (at least) | 41 | ================ |
| 27 | python-hachoir-core and python-hachoir-parser | 42 | See FORMATS |
| 28 | python-pdfrw for full PDF support | ||
| 29 | python-gi, python-gi-cairo, python-gobject for the GUI | ||
| 30 | shred (should be already installed) | ||
| 31 | 43 | ||
| 44 | HOW TO IMPLEMENT NEW FORMATS | ||
| 45 | ============================ | ||
| 46 | 1. Add the format's mimetype to the STRIPPER list in mat.py | ||
| 47 | 2. Inherit the GenericParser class (parser.py) | ||
| 48 | 3. Read the parser.py module | ||
| 49 | 4. Implement at least these three methods: | ||
| 50 | - is_clean(self) | ||
| 51 | - remove_all(self) | ||
| 52 | - get_meta(self) | ||
| 53 | 5. Don't forget to call the do_backup() method when necessary | ||
| 32 | 54 | ||
| 33 | OPTIONALS DEPENDENCIES: | 55 | HOW TO LAUNCH THE TESTSUITE |
| 34 | python-mutagen : for massive audio format support | 56 | =========================== |
| 35 | exiftool : for _massive_ image format support | 57 | cd ./test |
| 58 | python test.py | ||
| 36 | 59 | ||
| 60 | LINKS | ||
| 61 | ===== | ||
| 62 | * Official website: https://mat.boum.org | ||
| 63 | * Bugtracker : https://labs.riseup.net/code/projects/mat | ||
| 64 | * Git repo: https://gitweb.torproject.org/user/jvoisin/mat.git | ||
| 37 | 65 | ||
| 38 | USAGE: | 66 | CONTACT |
| 39 | mat --help | 67 | ======= |
| 40 | or | 68 | If you have question, patches, bug reports, or simply want to talk about this project, |
| 41 | mat-gui | 69 | please use the mailing list (https://mailman.boum.org/listinfo/mat-dev). |
| 70 | You can also contact contact jvoisin | ||
| 71 | on irc.oftc.net or at julien.voisin@dustri.org. | ||
| 42 | 72 | ||
| 73 | LICENSE | ||
| 74 | ======= | ||
| 75 | This program is free software; you can redistribute it and/or modify | ||
| 76 | it under the terms of the GNU General Public License version 2 as | ||
| 77 | published by the Free Software Foundation. | ||
| 43 | 78 | ||
| 44 | SUPPORTED FORMAT: | 79 | This program is distributed in the hope that it will be useful, |
| 45 | Portable Network Graphics (.png) | 80 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 46 | support : full | 81 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 47 | metadata : textual metadata + date | 82 | GNU General Public License for more details. |
| 48 | method : removal of harmful fields is done with hachoir | ||
| 49 | 83 | ||
| 84 | You should have received a copy of the GNU General Public License | ||
| 85 | along with this program; if not, write to the Free Software | ||
| 86 | Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, | ||
| 87 | MA 02110-1301, USA. | ||
| 50 | 88 | ||
| 51 | Jpeg (.jpeg, .jpg) | 89 | Copyright 2011-2013 Julien Voisin <julien.voisin@dustri.org> |
| 52 | support : full | ||
| 53 | metadata : comment + exif/photoshop/adobe | ||
| 54 | method : removal of harmful fields is done with hachoir | ||
| 55 | 90 | ||
| 56 | 91 | ||
| 57 | Open Document (.odt, .odx, .ods, ...) | 92 | THANKS |
| 58 | support : full | 93 | ====== |
| 59 | metadata : a meta.xml file | 94 | Mat would not exist without : |
| 60 | method : removal of the meta.xml file | ||
| 61 | 95 | ||
| 96 | * the Google Summer of Code, | ||
| 97 | * the hachoir library, | ||
| 98 | * people on #tails@oftc | ||
| 62 | 99 | ||
| 63 | Office Openxml (.docx, .pptx, .xlsx, ...) | 100 | Many thanks to them ! |
| 64 | support : full | ||
| 65 | metadata : a docProps folder containings xml metadata files | ||
| 66 | method : removal of the docProps folder | ||
| 67 | |||
| 68 | |||
| 69 | Portable Document Fileformat (.pdf) | ||
| 70 | support : full | ||
| 71 | metadata : a lot | ||
| 72 | method : rendering of the PDF file on a cairo surface with the help of | ||
| 73 | poppler in order to remove all the internal metadata. | ||
| 74 | For now, cairo create some metadata. | ||
| 75 | They can be remove if you install either exiftool, or python-pdfrw. | ||
| 76 | The next version of python-cairo will support PDF metadata. | ||
| 77 | |||
| 78 | |||
| 79 | Tape ARchive (.tar, .tar.bz2, .tar.gz) | ||
| 80 | support : full | ||
| 81 | metadata : metadata from the file itself, metadata from the file contained | ||
| 82 | into the archive, and metadata added by tar to the file at then | ||
| 83 | creation of the archive | ||
| 84 | method : extraction of each file, treatement of the file, add treated file | ||
| 85 | to a new archive, right before the add, remove the metadata added by tar | ||
| 86 | itself. When the new archive is complete, remove all his metadata. | ||
| 87 | |||
| 88 | |||
| 89 | Zip (.zip) | ||
| 90 | support : .partial | ||
| 91 | metadata : metadata from the file itself, metadata from the file contained | ||
| 92 | into the archive, and metadata added by zip to the file when added to | ||
| 93 | the archive. | ||
| 94 | |||
| 95 | method : extraction of each file, treatement of the file, add treated file | ||
| 96 | to a new archive. When the new archive is complete, remove all his metadata | ||
| 97 | |||
| 98 | |||
| 99 | MPEG Audio (.mp3, .mp2, .mp1) | ||
| 100 | support : full | ||
| 101 | metadata : id3 | ||
| 102 | method : removal of harmful fields is done with hachoir | ||
| 103 | |||
| 104 | |||
| 105 | Ogg Vorbis (.ogg) | ||
| 106 | support : full | ||
| 107 | metadata : Vorbis | ||
| 108 | method : removal of harmful fields is done with mutagen | ||
| 109 | |||
| 110 | |||
| 111 | Free Lossless Audio Codec (.flac) | ||
| 112 | support : full | ||
| 113 | metadata : Flac, Vorbis | ||
| 114 | method : removal of harmful fields is done with mutagen | ||
| 115 | |||
| 116 | Torrent (.torrent) | ||
| 117 | support : full | ||
| 118 | metadata : torrent | ||
| 119 | method : using the nice bencode lib by Petru Paler, | ||
| 120 | heavily tuned/rewritten. | ||
| 121 | |||
| 122 | |||
| 123 | HOW TO IMPLEMENT NEW FORMATS: | ||
| 124 | 1. add the format's mimetype to the STRIPPER list in mat.py | ||
| 125 | 2. inherit the GenericParser class (parser.py) | ||
| 126 | 3. read the parser.py module | ||
| 127 | 4. implement at least these three methods: | ||
| 128 | - is_clean(self) | ||
| 129 | - remove_all(self) | ||
| 130 | - get_meta(self) | ||
| 131 | 5. don't forget to call the do_backup() method when necessary | ||
| 132 | |||
| 133 | |||
| 134 | HOW TO LAUNCH THE TESTSUITE: | ||
| 135 | 1. cd ./test | ||
| 136 | 2. python test.py : launch all testsuites | ||
| 137 | 3. python clitest.py : launch the testsuite for the CLI | ||
| 138 | 4. python libtest.py : launch the testsuite for the mat internal library | ||
| 139 | |||
| 140 | |||
| 141 | ALTERNATIVES AND COMPLEMENTS: | ||
| 142 | for images: | ||
| 143 | exiftool (perl) : metadata manipulation | ||
| 144 | exiv2 (C++) : metadata manipulation | ||
| 145 | graphicsmagick (a fork from imagemagick) : cli image manipulation | ||
| 146 | |||
| 147 | for PDF: | ||
| 148 | pdfminer (python) : PDF manipulation | ||
| 149 | |||
| 150 | other tools: | ||
| 151 | an hexadecimal editor | ||
| 152 | |||
| 153 | |||
| 154 | NOTES: | ||
| 155 | Formats that are not in the test suite are not well-tested, | ||
| 156 | please don't trust the MAT about them ! | ||
| 157 | |||
| 158 | |||
| 159 | LICENSE: | ||
| 160 | This program is free software; you can redistribute it and/or modify | ||
| 161 | it under the terms of the GNU General Public License version 2 as | ||
| 162 | published by the Free Software Foundation. | ||
| 163 | |||
| 164 | This program is distributed in the hope that it will be useful, | ||
| 165 | but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| 166 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| 167 | GNU General Public License for more details. | ||
| 168 | |||
| 169 | You should have received a copy of the GNU General Public License | ||
| 170 | along with this program; if not, write to the Free Software | ||
| 171 | Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, | ||
| 172 | MA 02110-1301, USA. | ||
| 173 | |||
| 174 | Copyright © 2011-2013 Julien Voisin <julien.voisin@dustri.org> | ||
| 175 | |||
| 176 | |||
| 177 | THANKS: | ||
| 178 | Mat would not exist without : | ||
| 179 | - the Google Summer of Code, | ||
| 180 | - the Python language | ||
| 181 | - the amazing (and messy) hachoir library, | ||
| 182 | - poppler and cairo's python bindings, | ||
| 183 | - and the mutagen library | ||
| 184 | - people on #tails@freenode | ||
| 185 | many thanks to them ! | ||
| 186 | |||
| 187 | |||
| 188 | KNOWN BUGS: | ||
| 189 | Zipfiles are not totally cleaned, I know. | ||
diff --git a/README.security b/README.security new file mode 100644 index 0000000..335c537 --- /dev/null +++ b/README.security | |||
| @@ -0,0 +1,90 @@ | |||
| 1 | Warning | ||
| 2 | ======= | ||
| 3 | Mat only removes metadata from your files, it does not anonymise their | ||
| 4 | content, nor can it handle watermarking, steganography, or any too custom | ||
| 5 | metadata field/system. | ||
| 6 | |||
| 7 | If you really want to be anonymous format that does not contain any | ||
| 8 | metadata, or better : use plain-text. | ||
| 9 | |||
| 10 | Implementation notes | ||
| 11 | ====================== | ||
| 12 | Symlink attacks | ||
| 13 | --------------- | ||
| 14 | MAT output predictable filenames (like yourfile.jpg.bak). | ||
| 15 | This may lead to symlink attack. Please check if you OS prevent | ||
| 16 | against them | ||
| 17 | |||
| 18 | Test suite | ||
| 19 | ---------- | ||
| 20 | Formats that are not in the test suite are not well-tested, | ||
| 21 | please do not trust the MAT about them! | ||
| 22 | |||
| 23 | Threat Model | ||
| 24 | ============ | ||
| 25 | The Metadata Anonymisation Toolkit adversary has a number | ||
| 26 | of goals, capabilities, and counter-attack types that can be | ||
| 27 | used to guide us towards a set of requirements for the MAT. | ||
| 28 | |||
| 29 | Adversary | ||
| 30 | ------------ | ||
| 31 | |||
| 32 | * Goals: | ||
| 33 | |||
| 34 | - Identifying the source of the document, since a document | ||
| 35 | always has one. Who/where/when/how was a picture | ||
| 36 | taken, where was the document leaked from and by | ||
| 37 | whom, ... | ||
| 38 | |||
| 39 | - Identify the author; in some cases documents may be | ||
| 40 | anonymously authored or created. In these cases, | ||
| 41 | identifying the author is the goal. | ||
| 42 | |||
| 43 | - Identify the equipment/software used. If the attacker fails | ||
| 44 | to directly identify the author and/or source, his next | ||
| 45 | goal is to determine the source of the equipment used | ||
| 46 | to produce, copy, and transmit the document. This can | ||
| 47 | include the model of camera used to take a photo, or | ||
| 48 | which software was used to produce an office document. | ||
| 49 | |||
| 50 | |||
| 51 | * Adversary Capabilities - Positioning | ||
| 52 | - The adversary created the document specifically for this | ||
| 53 | user. This is the strongest position for the adversary to | ||
| 54 | have. In this case, the adversary is capable of inserting | ||
| 55 | arbitrary, custom watermarks specifically for tracking | ||
| 56 | the user. In general, MAT cannot defend against this | ||
| 57 | adversary, but we list it for completeness. | ||
| 58 | |||
| 59 | - The adversary created the document for a group of users. | ||
| 60 | In this case, the adversary knows that they attempted to | ||
| 61 | limit distribution to a specific group of users. They may | ||
| 62 | or may not have watermarked the document for these | ||
| 63 | users, but they certainly know the format used. | ||
| 64 | |||
| 65 | - The adversary did not create the document, the weakest | ||
| 66 | position for the adversary to have. The file format is (most of the time) | ||
| 67 | standard, nothing custom is added: MAT | ||
| 68 | should be able to remove all meta-information from the | ||
| 69 | file. | ||
| 70 | |||
| 71 | Requirements | ||
| 72 | --------------- | ||
| 73 | |||
| 74 | * Processing | ||
| 75 | - The MAT *should* avoid interactions with information. | ||
| 76 | Its goal is to remove metadata, and the user is solely | ||
| 77 | responsible for the information of the file. | ||
| 78 | |||
| 79 | - The MAT *must* warn when encountering an unknown | ||
| 80 | format. For example, in a zipfile, if MAT encounters an | ||
| 81 | unknown format, it should warn the user, and ask if the | ||
| 82 | file should be added to the anonymised archive that is | ||
| 83 | produced. | ||
| 84 | |||
| 85 | - The MAT *must* not add metadata, since its purpose is to | ||
| 86 | anonymise files: every added items of metadata decreases | ||
| 87 | anonymity. | ||
| 88 | |||
| 89 | - The MAT *must* handle unknown/hidden metadata fields, | ||
| 90 | like proprietary extensions of open formats. | ||
| @@ -29,7 +29,7 @@ setup( | |||
| 29 | ( 'share/applications', ['mat.desktop'] ), | 29 | ( 'share/applications', ['mat.desktop'] ), |
| 30 | ( 'share/mat', ['data/FORMATS', 'data/mat.ui'] ), | 30 | ( 'share/mat', ['data/FORMATS', 'data/mat.ui'] ), |
| 31 | ( 'share/pixmaps', ['data/mat.png'] ), | 31 | ( 'share/pixmaps', ['data/mat.png'] ), |
| 32 | ( 'share/doc/mat', ['README', 'TODO'] ), | 32 | ( 'share/doc/mat', ['README', 'TODO', 'README.security'] ), |
| 33 | ( 'share/man/man1', ['mat.1', 'mat-gui.1'] ), | 33 | ( 'share/man/man1', ['mat.1', 'mat-gui.1'] ), |
| 34 | ( 'share/nautilus-python/extensions', ['nautilus/nautilus-mat.py']) | 34 | ( 'share/nautilus-python/extensions', ['nautilus/nautilus-mat.py']) |
| 35 | ], | 35 | ], |
