Java Language => JAXP API를 사용한 XML 구문 분석

비고

XML 구문 분석은 "노드", "속성", "문서", "네임 스페이스"또는 이러한 구문과 관련된 이벤트 등 민감한 구문을 사용하여 XML 문서를 해석하기위한 것입니다.

Java에는 JAXP 라고하는 XML 문서 처리 용 네이티브 API 또는 XML 처리 용 Java API가 있습니다. JAXP 및 참조 구현은 Java 1.4 (JAXP v1.1) 이후의 모든 Java 릴리스에 번들로 제공되어 이후 발전해 왔습니다. Java 8은 JAXP 버전 1.6과 함께 제공됩니다.

API는 다음과 같은 다양한 XML 문서와 상호 작용할 수있는 방법을 제공합니다.

DOM 인터페이스 (Document Object Model)
SAX 인터페이스 (XML 용 Simple API)
StAX 인터페이스 (XML 용 스트리밍 API)

DOM 인터페이스의 원리

DOM 인터페이스는 XML을 해석하는 W3C DOM 호환 방식을 제공하는 것을 목표로합니다. JAXP의 다양한 버전은 다양한 DOM 수준의 사양 (최대 3 수준)을 지원합니다.

Document Object Model 인터페이스에서 XML 문서는 "Document Element"로 시작하는 트리로 표현됩니다. API의 기본 유형은 Node 유형이며, Node 에서 부모, 그 자식 또는 형제로 탐색 할 수 있습니다 (모든 Node 가 자식을 가질 수는 없지만, 예를 들어 Text 노드는 트리에서 마지막입니다. 그리고 결코 아이를 가지지 마십시오.) XML 태그는 Element 와 같이 표현되며, 특히 속성 관련 메소드로 Node 를 확장합니다.

DOM 인터페이스는 XML 문서를 나무로 "한 줄"파싱 할 수 있고, 생성 된 트리 (노드 추가, 억제, 복사 등)를 쉽게 수정할 수 있고, 마지막으로 직렬화 (디스크로 돌아 가기)하기 때문에 매우 유용합니다. ) 수정 사항을 게시합니다. 그러나 가격이 책정됩니다. 트리가 메모리에 있으므로 DOM 트리가 거대한 XML 문서에 항상 실용적이지는 않습니다. 게다가 트리 구조가 XML 컨텐트를 다루는 가장 빠른 방법은 아닙니다. 특히 XML 문서의 모든 부분에 관심이없는 경우에 특히 그렇습니다.

SAX 인터페이스의 원리

SAX API는 XML 문서를 처리하는 이벤트 지향 API입니다. 이 모델에서 XML 문서의 구성 요소는 이벤트로 해석됩니다 (예 : "태그가 열렸습니다", "태그가 닫혔습니다", "텍스트 노드가 발생했습니다", "주석이 발생했습니다"). ..

SAX API는 "push parsing"접근법을 사용합니다.이 접근법에서는 SAX Parser 가 XML 문서를 해석하고 XML 문서에있는 이벤트를 처리 할 수있는 델리게이트 ( ContentHandler ) 메소드를 호출합니다. 대개 파서를 작성하지 않지만 XML 문서에서 필요한 모든 정보를 수집하는 처리기를 제공합니다.

SAX 인터페이스는 파서 레벨 (예 : 네임 스페이스 컨텍스트, 유효성 검사 상태)에서 필요한 최소한의 데이터 만 유지함으로써 DOM 인터페이스의 한계를 극복하기 때문에 개발자 인 개발자가 책임지는 ContentHandler 가 보관하는 정보 만 기억에 남는다. 단점은 이러한 접근 방식을 사용하여 "시간에 맞추어 돌아가는 / XML 문서"를 만들 수있는 방법이 없다는 것입니다. DOM은 Node 가 부모에게 돌아갈 수 있도록 허용하지만 SAX에는 그러한 가능성이 없습니다.

StAX 인터페이스의 원리

StAX API는 XML을 SAX API (즉, 이벤트 구동)로 처리하는 것과 유사한 접근 방식을 사용합니다. StAX는 끌어 오기 구문 분석기 (SAX가 푸시 구문 분석기 인 경우)뿐입니다. SAX에서는 Parser 가 제어되며 ContentHandler 콜백을 사용합니다. Stax에서는 파서를 호출하고 다음 XML "이벤트"를 가져올 시간 / 시점을 제어합니다.

API는 XMLStreamReader (또는 XMLEventReader )로 시작하며, 개발자가 nextEvent() 를 반복기 스타일 방식으로 요청할 수있는 게이트웨이입니다.

DOM API를 사용하여 문서 구문 분석 및 탐색

다음 문서를 고려하십시오.

<?xml version='1.0' encoding='UTF-8' ?>
<library>
   <book id='1'>Effective Java</book>
   <book id='2'>Java Concurrency In Practice</book>
</library>

다음 코드를 사용하여 String 밖으로 DOM 트리를 작성할 수 있습니다.

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.StringReader;

public class DOMDemo {

public static void main(String[] args) throws Exception {
    String xmlDocument = "<?xml version='1.0' encoding='UTF-8' ?>"
            + "<library>"
            + "<book id='1'>Effective Java</book>"
            + "<book id='2'>Java Concurrency In Practice</book>"
            + "</library>";

    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    // This is useless here, because the XML does not have namespaces, but this option is usefull to know in cas
    documentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    // There are various options here, to read from an InputStream, from a file, ...
    Document document = documentBuilder.parse(new InputSource(new StringReader(xmlDocument)));

    // Root of the document
    System.out.println("Root of the XML Document: " + document.getDocumentElement().getLocalName());

    // Iterate the contents
    NodeList firstLevelChildren = document.getDocumentElement().getChildNodes();
    for (int i = 0; i < firstLevelChildren.getLength(); i++) {
        Node item = firstLevelChildren.item(i);
        System.out.println("First level child found, XML tag name is: " + item.getLocalName());
        System.out.println("\tid attribute of this tag is : " + item.getAttributes().getNamedItem("id").getTextContent());
    }

    // Another way would have been
    NodeList allBooks = document.getDocumentElement().getElementsByTagName("book");
}
}

이 코드는 다음을 산출합니다.

Root of the XML Document: library
First level child found, XML tag name is: book
id attribute of this tag is : 1
First level child found, XML tag name is: book
id attribute of this tag is : 2

StAX API를 사용하여 문서 구문 분석하기

다음 문서를 고려하십시오.

<?xml version='1.0' encoding='UTF-8' ?>
<library>
   <book id='1'>Effective Java</book>
   <book id='2'>Java Concurrency In Practice</book>
   <notABook id='3'>This is not a book element</notABook>
</library>

다음 코드를 사용하여 구문 분석하고 책 ID별로 책 제목의지도를 작성할 수 있습니다.

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;

public class StaxDemo {

public static void main(String[] args) throws Exception {
    String xmlDocument = "<?xml version='1.0' encoding='UTF-8' ?>"
            + "<library>"
                + "<book id='1'>Effective Java</book>"
                + "<book id='2'>Java Concurrency In Practice</book>"
                + "<notABook id='3'>This is not a book element </notABook>"
            + "</library>";

    XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
    // Various flavors are possible, e.g. from an InputStream, a Source, ...
    XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new StringReader(xmlDocument));

    Map<Integer, String> bookTitlesById = new HashMap<>();

    // We go through each event using a loop
    while (xmlStreamReader.hasNext()) {
        switch (xmlStreamReader.getEventType()) {
            case XMLStreamConstants.START_ELEMENT:
                System.out.println("Found start of element: " + xmlStreamReader.getLocalName());
                // Check if we are at the start of a <book> element
                if ("book".equals(xmlStreamReader.getLocalName())) {
                    int bookId = Integer.parseInt(xmlStreamReader.getAttributeValue("", "id"));
                    String bookTitle = xmlStreamReader.getElementText();
                    bookTitlesById.put(bookId, bookTitle);
                }
                break;
            // A bunch of other things are possible : comments, processing instructions, Whitespace...
            default:
                break;
        }
        xmlStreamReader.next();
    }

    System.out.println(bookTitlesById);
}

이 결과는 다음과 같습니다.

Found start of element: library
Found start of element: book
Found start of element: book
Found start of element: notABook
{1=Effective Java, 2=Java Concurrency In Practice}

이 샘플에서는 다음과 같은 몇 가지 사항에주의해야합니다.

xmlStreamReader.getAttributeValue 사용은 파서가 START_ELEMENT 상태에 있는지 먼저 확인했기 때문에 작동합니다. 다른 상태 ( ATTRIBUTES 제외하다)에서는, 속성은 요소의 선두에만 출현 할 수 있기 (위해) 때문에, 파서는 IllegalStateException 를 슬로우하도록 ( ATTRIBUTES ) 요구되고 있습니다.
xmlStreamReader.getTextContent() 마찬가지입니다. START_ELEMENT 있기 때문에 작동합니다.이 문서에서는 <book> 요소에 텍스트가 아닌 자식 노드가 없다는 것을 알고 있습니다.

좀 더 복잡한 문서 구문 분석 (더 깊고 중첩 된 요소 등)을 위해서는 파서를 하위 메서드 또는 다른 BookParser 예 : BookParser 클래스 또는 메서드)에 "위임"하여 모든 요소를 처리하도록하는 것이 BookParser 책 XML 태그의 START_ELEMENT에서 END_ELEMENT까지

Stack 개체를 사용하여 트리의 위아래로 중요한 데이터를 보관할 수도 있습니다.

Modified text is an extract of the original Stack Overflow Documentation

아래 라이선스 CC BY-SA 3.0

와 제휴하지 않음 Stack Overflow

Java Language
JAXP API를 사용한 XML 구문 분석

수색…

비고

DOM 인터페이스의 원리

SAX 인터페이스의 원리

StAX 인터페이스의 원리

DOM API를 사용하여 문서 구문 분석 및 탐색

StAX API를 사용하여 문서 구문 분석하기