summaryrefslogtreecommitdiff
path: root/doc/implementation_notes.md
diff options
context:
space:
mode:
authorgeorg2019-11-28 02:15:20 +0000
committerjvoisin2019-11-30 01:14:41 -0800
commit697cb36b814d7e01da336c43b1932264302a2528 (patch)
treec14075024469adf5aa30614c95173cada156229c /doc/implementation_notes.md
parent6e52661cfb4e79a76a6ff80637d5adf495a15479 (diff)
This is mat2, not MAT2
Closes #131
Diffstat (limited to '')
-rw-r--r--doc/implementation_notes.md18
1 files changed, 9 insertions, 9 deletions
diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md
index 7555d2e..e298646 100644
--- a/doc/implementation_notes.md
+++ b/doc/implementation_notes.md
@@ -4,7 +4,7 @@ Implementation notes
4Lightweight cleaning mode 4Lightweight cleaning mode
5------------------------- 5-------------------------
6 6
7Due to *popular* request, MAT2 is providing a *lightweight* cleaning mode, 7Due to *popular* request, mat2 is providing a *lightweight* cleaning mode,
8that only cleans the superficial metadata of your file, but not 8that only cleans the superficial metadata of your file, but not
9the ones that might be in **embedded** resources. Like for example, 9the ones that might be in **embedded** resources. Like for example,
10images in a PDF or an office document. 10images in a PDF or an office document.
@@ -19,7 +19,7 @@ are entirely removed.
19 deleted. For example journalists that are editing a document to erase 19 deleted. For example journalists that are editing a document to erase
20 mentions sources mentions. 20 mentions sources mentions.
21 21
22- Or they are aware of it, and will likely not expect MAT2 to be able to keep 22- Or they are aware of it, and will likely not expect mat2 to be able to keep
23 the revisions, that are basically traces about how, when and who edited the 23 the revisions, that are basically traces about how, when and who edited the
24 document. 24 document.
25 25
@@ -27,15 +27,15 @@ are entirely removed.
27Race conditions 27Race conditions
28--------------- 28---------------
29 29
30MAT2 does its very best to avoid crashing at runtime. This is why it's checking 30mat2 does its very best to avoid crashing at runtime. This is why it's checking
31if the file is valid __at parser creation__. MAT2 doesn't take any measure to 31if the file is valid __at parser creation__. mat2 doesn't take any measure to
32ensure that the file is not changed between the time the parser is 32ensure that the file is not changed between the time the parser is
33instantiated, and the call to clean or show the metadata. 33instantiated, and the call to clean or show the metadata.
34 34
35Symlink attacks 35Symlink attacks
36--------------- 36---------------
37 37
38MAT2 output predictable filenames (like yourfile.jpg.cleaned). 38mat2 output predictable filenames (like yourfile.jpg.cleaned).
39This may lead to symlink attack. Please check if you OS prevent 39This may lead to symlink attack. Please check if you OS prevent
40against them 40against them
41 41
@@ -65,10 +65,10 @@ didn't remove any *deep metadata*, like the ones in embedded pictures. This was
65on of the reason MAT was abandoned: the absence of satisfying solution to 65on of the reason MAT was abandoned: the absence of satisfying solution to
66handle PDF. But apparently, people are ok with [pdf redact 66handle PDF. But apparently, people are ok with [pdf redact
67tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply 67tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply
68transform the PDF into images. So this is what's MAT2 is doing too. 68transform the PDF into images. So this is what's mat2 is doing too.
69 69
70Of course, it would be possible to detect images in PDf file, and process them 70Of course, it would be possible to detect images in PDf file, and process them
71with MAT2, but since a PDF can contain a lot of things, like images, videos, 71with mat2, but since a PDF can contain a lot of things, like images, videos,
72javascript, pdf, blobs, … this is the easiest and safest way to clean them. 72javascript, pdf, blobs, … this is the easiest and safest way to clean them.
73 73
74Images handling 74Images handling
@@ -81,7 +81,7 @@ XML attacks
81----------- 81-----------
82 82
83Since our threat model conveniently excludes files crafted to specifically 83Since our threat model conveniently excludes files crafted to specifically
84bypass MAT2, fileformats containing harmful XML are out of our scope. 84bypass mat2, fileformats containing harmful XML are out of our scope.
85But since MAT2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities) 85But since mat2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities)
86to process XML, it's "only" vulnerable to DoS, and not memory corruption: 86to process XML, it's "only" vulnerable to DoS, and not memory corruption:
87odds are that the user will notice that the cleaning didn't succeed. 87odds are that the user will notice that the cleaning didn't succeed.