Unverified Commit bc4e0cdb by Wesley Shields Committed by GitHub

Handle invalid unicode in metadata values. (#136)

* Handle invalid unicode in metadata values.

In #135 it was brought up that you can crash the python interpreter if you have
invalid unicode in a metadata value. This is my attempt to fix that by
attempting to create a string, and if that fails falling back to a bytes object.
On the weird chance that the bytes object fails to create I added a safety check
so that we don't add a NULL ptr to the dictionary (this is how the crash was
manifesting).

It's debatable if we want to ONLY add strings as metadata, and NOT fallback to
bytes. If we don't fall back to bytes the only other option I see is to silently
drop that metadata on the floor. The tradeoff here is that now you may end up
with a string or a bytes object in your metadata dictionary, which is less than
ideal IMO.

I'm open to suggestions on this one.

Fixes #135

* Add error handling to conversion to Unicode
Metadata test accepts stripped or original characters

* Remove 'or' clause from tests and add another NULL test check.

Co-authored-by: malvidin <malvidin@gmail.com>
......@@ -692,6 +692,25 @@ class TestYara(unittest.TestCase):
'rule test { condition: entrypoint >= 0 }',
])
def testMeta(self):
r = yara.compile(source=r'rule test { meta: a = "foo\x80bar" condition: true }')
self.assertTrue(list(r)[0].meta['a'] == 'foobar')
# This test ensures that anything after the NULL character is stripped.
def testMetaNull(self):
r = yara.compile(source=r'rule test { meta: a = "foo\x00bar\x80" condition: true }')
self.assertTrue(list(r)[0].meta['a'] == 'foo')
# This test is similar to testMeta but it tests the meta data generated
# when a Match object is created.
def testScanMeta(self):
r = yara.compile(source=r'rule test { meta: a = "foo\x80bar" condition: true }')
m = r.match(data='dummy')
self.assertTrue(list(m)[0].meta['a'] == 'foobar')
def testFilesize(self):
self.assertTrueRules([
......
......@@ -46,7 +46,7 @@ typedef long Py_hash_t;
#endif
#if PY_MAJOR_VERSION >= 3
#define PY_STRING(x) PyUnicode_FromString(x)
#define PY_STRING(x) PyUnicode_DecodeUTF8(x, strlen(x), "ignore" )
#define PY_STRING_TO_C(x) PyUnicode_AsUTF8(x)
#define PY_STRING_CHECK(x) PyUnicode_Check(x)
#else
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment