Secure Coding Technique: Processing XML data, part 1
Extensible Markup Language (XML) is a markup language used for encoding documents in a format that is both easy to handle for machines and human-readable. However, this commonly used format includes multiple security flaws. In this first XML related blog post, I will explain the basics of handling XML documents securely by using a schema.
OWASP divides the different vulnerabilities related to XML and XML schemas in two categories.
Malformed XML documents
Malformed XML documents are documents that do not follow the W3C XML specifications. Some examples that result in a malformed document are the removing of an ending tag, changing the order of different elements or the use of forbidden characters. All of these errors should result in a fatal error and the document should not undergo any additional processing.
In order to avoid vulnerabilities caused by malformed documents, you should use a well-tested XML parser that follows W3C specifications and does not take significantly longer to process malformed documents.
Invalid XML documents
Invalid XML documents are well formed but contain unexpected values. Here an attacker may take advantage of applications that do not properly define an XML schema to identify whether documents are valid. Below you can find a simple example of a document that, if not validated correctly, might have unintended consequences.
A web store which stores its transactions in XML data:
<purchase></purchase>
<id>123</id>
<price>200</price>
And the user only has control over the <id> value. It is then possible, without the right counter measures, for an attacker to input something like this:</id>
<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>
If the parser that processes this document only reads the first instance of the <id> and <price> tags this will lead to unwanted results. </price></id>
It is also possible that the schema is not restrictive enough or that other input validation is insufficient, so that negative numbers, special decimals (like NaN or Infinity) or exceedingly big values can be entered where they are not expected, leading to similar unintended behavior.
Avoiding vulnerabilities related to invalid XML documents should be done by defining a precise and restrictive XML Schema to avoid problems of improper data validation.
Next blog post we will go into some more advanced attacks on XML documents such as Jumbo Payloads and the feared OWASP Top Ten number four, XXE.
In the meantime you can hone or challenge your skills on XML input validation on our portal.
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing.
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks.
Application Security Researcher - R&D Engineer - PhD Candidate
Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoApplication Security Researcher - R&D Engineer - PhD Candidate
Extensible Markup Language (XML) is a markup language used for encoding documents in a format that is both easy to handle for machines and human-readable. However, this commonly used format includes multiple security flaws. In this first XML related blog post, I will explain the basics of handling XML documents securely by using a schema.
OWASP divides the different vulnerabilities related to XML and XML schemas in two categories.
Malformed XML documents
Malformed XML documents are documents that do not follow the W3C XML specifications. Some examples that result in a malformed document are the removing of an ending tag, changing the order of different elements or the use of forbidden characters. All of these errors should result in a fatal error and the document should not undergo any additional processing.
In order to avoid vulnerabilities caused by malformed documents, you should use a well-tested XML parser that follows W3C specifications and does not take significantly longer to process malformed documents.
Invalid XML documents
Invalid XML documents are well formed but contain unexpected values. Here an attacker may take advantage of applications that do not properly define an XML schema to identify whether documents are valid. Below you can find a simple example of a document that, if not validated correctly, might have unintended consequences.
A web store which stores its transactions in XML data:
<purchase></purchase>
<id>123</id>
<price>200</price>
And the user only has control over the <id> value. It is then possible, without the right counter measures, for an attacker to input something like this:</id>
<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>
If the parser that processes this document only reads the first instance of the <id> and <price> tags this will lead to unwanted results. </price></id>
It is also possible that the schema is not restrictive enough or that other input validation is insufficient, so that negative numbers, special decimals (like NaN or Infinity) or exceedingly big values can be entered where they are not expected, leading to similar unintended behavior.
Avoiding vulnerabilities related to invalid XML documents should be done by defining a precise and restrictive XML Schema to avoid problems of improper data validation.
Next blog post we will go into some more advanced attacks on XML documents such as Jumbo Payloads and the feared OWASP Top Ten number four, XXE.
In the meantime you can hone or challenge your skills on XML input validation on our portal.
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing.
Extensible Markup Language (XML) is a markup language used for encoding documents in a format that is both easy to handle for machines and human-readable. However, this commonly used format includes multiple security flaws. In this first XML related blog post, I will explain the basics of handling XML documents securely by using a schema.
OWASP divides the different vulnerabilities related to XML and XML schemas in two categories.
Malformed XML documents
Malformed XML documents are documents that do not follow the W3C XML specifications. Some examples that result in a malformed document are the removing of an ending tag, changing the order of different elements or the use of forbidden characters. All of these errors should result in a fatal error and the document should not undergo any additional processing.
In order to avoid vulnerabilities caused by malformed documents, you should use a well-tested XML parser that follows W3C specifications and does not take significantly longer to process malformed documents.
Invalid XML documents
Invalid XML documents are well formed but contain unexpected values. Here an attacker may take advantage of applications that do not properly define an XML schema to identify whether documents are valid. Below you can find a simple example of a document that, if not validated correctly, might have unintended consequences.
A web store which stores its transactions in XML data:
<purchase></purchase>
<id>123</id>
<price>200</price>
And the user only has control over the <id> value. It is then possible, without the right counter measures, for an attacker to input something like this:</id>
<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>
If the parser that processes this document only reads the first instance of the <id> and <price> tags this will lead to unwanted results. </price></id>
It is also possible that the schema is not restrictive enough or that other input validation is insufficient, so that negative numbers, special decimals (like NaN or Infinity) or exceedingly big values can be entered where they are not expected, leading to similar unintended behavior.
Avoiding vulnerabilities related to invalid XML documents should be done by defining a precise and restrictive XML Schema to avoid problems of improper data validation.
Next blog post we will go into some more advanced attacks on XML documents such as Jumbo Payloads and the feared OWASP Top Ten number four, XXE.
In the meantime you can hone or challenge your skills on XML input validation on our portal.
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing.
Click on the link below and download the PDF of this resource.
Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
View reportBook a demoApplication Security Researcher - R&D Engineer - PhD Candidate
Extensible Markup Language (XML) is a markup language used for encoding documents in a format that is both easy to handle for machines and human-readable. However, this commonly used format includes multiple security flaws. In this first XML related blog post, I will explain the basics of handling XML documents securely by using a schema.
OWASP divides the different vulnerabilities related to XML and XML schemas in two categories.
Malformed XML documents
Malformed XML documents are documents that do not follow the W3C XML specifications. Some examples that result in a malformed document are the removing of an ending tag, changing the order of different elements or the use of forbidden characters. All of these errors should result in a fatal error and the document should not undergo any additional processing.
In order to avoid vulnerabilities caused by malformed documents, you should use a well-tested XML parser that follows W3C specifications and does not take significantly longer to process malformed documents.
Invalid XML documents
Invalid XML documents are well formed but contain unexpected values. Here an attacker may take advantage of applications that do not properly define an XML schema to identify whether documents are valid. Below you can find a simple example of a document that, if not validated correctly, might have unintended consequences.
A web store which stores its transactions in XML data:
<purchase></purchase>
<id>123</id>
<price>200</price>
And the user only has control over the <id> value. It is then possible, without the right counter measures, for an attacker to input something like this:</id>
<purchase></purchase>
<id>123</id>
<price>0</price>
<id></id>
<price>200</price>
If the parser that processes this document only reads the first instance of the <id> and <price> tags this will lead to unwanted results. </price></id>
It is also possible that the schema is not restrictive enough or that other input validation is insufficient, so that negative numbers, special decimals (like NaN or Infinity) or exceedingly big values can be entered where they are not expected, leading to similar unintended behavior.
Avoiding vulnerabilities related to invalid XML documents should be done by defining a precise and restrictive XML Schema to avoid problems of improper data validation.
Next blog post we will go into some more advanced attacks on XML documents such as Jumbo Payloads and the feared OWASP Top Ten number four, XXE.
In the meantime you can hone or challenge your skills on XML input validation on our portal.
Specifications for XML and XML schemas include multiple security flaws. At the same time, these specifications provide the tools required to protect XML applications. Even though we use XML schemas to define the security of XML documents, they can be used to perform a variety of attacks: file retrieval, server side request forgery, port scanning, or brute forcing.
Table of contents
Application Security Researcher - R&D Engineer - PhD Candidate
Secure Code Warrior is here for your organization to help you secure code across the entire software development lifecycle and create a culture in which cybersecurity is top of mind. Whether you’re an AppSec Manager, Developer, CISO, or anyone involved in security, we can help your organization reduce risks associated with insecure code.
Book a demoDownloadResources to get you started
Resources to get you started
10 Key Predictions: Secure Code Warrior on AI & Secure-by-Design’s Influence in 2025
Organizations are facing tough decisions on AI usage to support long-term productivity, sustainability, and security ROI. It’s become clear to us over the last few years that AI will never fully replace the role of the developer. From AI + developer partnerships to the increasing pressures (and confusion) around Secure-by-Design expectations, let’s take a closer look at what we can expect over the next year.
OWASP Top 10 For LLM Applications: What’s New, Changed, and How to Stay Secure
Stay ahead in securing LLM applications with the latest OWASP Top 10 updates. Discover what's new, what’s changed, and how Secure Code Warrior equips you with up-to-date learning resources to mitigate risks in Generative AI.
Trust Score Reveals the Value of Secure-by-Design Upskilling Initiatives
Our research has shown that secure code training works. Trust Score, using an algorithm drawing on more than 20 million learning data points from work by more than 250,000 learners at over 600 organizations, reveals its effectiveness in driving down vulnerabilities and how to make the initiative even more effective.
Reactive Versus Preventive Security: Prevention Is a Better Cure
The idea of bringing preventive security to legacy code and systems at the same time as newer applications can seem daunting, but a Secure-by-Design approach, enforced by upskilling developers, can apply security best practices to those systems. It’s the best chance many organizations have of improving their security postures.