XML TUTORIAL PDF FOR

adminComment(0)
    Contents:

This tutorial will teach you the basics of XML. The tutorial is divided into sections such as in this tutorial, please notify us at [email protected] XML Tutorial in PDF - Learn XML in simple and easy steps starting from basic to advanced concepts with examples including Overview, XML document syntax. xml version="" encoding="UTF-8"?> Belgian Waffles $ Two of our famous .


Xml Tutorial Pdf For

Author:ANNALEE ZIESEMER
Language:English, German, Japanese
Country:Andorra
Genre:Children & Youth
Pages:279
Published (Last):23.01.2016
ISBN:406-6-18207-260-7
ePub File Size:21.59 MB
PDF File Size:11.43 MB
Distribution:Free* [*Registration Required]
Downloads:33053
Uploaded by: DANN

Basic XML Concepts. 3. „XML is the cure for your data exchange, information pdf">. Parsing XML. A Basic XML Document. Differences Between XML and HTML. Common Mistakes. White Space. Closing Tags. Nesting Tags. Root Element. An XML Tutorial. JTC1/ depend upon XML technologies in the future! . PDF. • XML Schema an alternative to a DTD and used to validate.

You can craft the rules for how the elements fit together based on your specific needs. You can be very specific or keep element names more generic. You can create rules for what each element is allowed to contain and make these rules strict, lax, or something in between.

Just be sure to create elements that identify the parts of your documents that you feel are important. Because this declaration must be first in the file, if you plan to combine smaller XML files into a larger file, you might want to omit this optional information. The root element's beginning and end tags surround your XML document's content. Only one root element is in the file, and you need this "wrapper" to contain it all. See Download for the full XML file. When you create your XML, be sure that your beginning and end tags match in case.

If the case doesn't match, you might get an error when you use or view the XML. Internet Explorer, for example, will not display the file content if the case is mismatched.

Instead, Internet Explorer displays messages about the beginning and end tags not matching. Here are a few things to note about your naming:. An XML document can have some empty tags that do not have anything inside and can be expressed as a single tag instead of as a set of beginning and end tags. Nesting is the placement of elements inside other elements. These new elements are called child elements, and the elements that enclose them are their parent elements. Nesting can be many levels deep in an XML document.

A common syntax error is improper nesting of parent and child elements. Any child element must be completely enclosed between the starting and end tags of its parent element. Sibling elements must each end before the next sibling begins. The code in Listing 3 shows proper nesting. The tags begin and end without intermingling with other tags.

Attributes are sometimes added to elements. Attributes consist of a name-value pair, with the value in double quotation marks " , thus: Attributes provide a way to store additional information each time you use an element, varying the attribute value as needed from one instance of an element to another within the same document.

You type the attribute—or even multiple attributes—within the starting tag of an element: If you add multiple attributes, separate them with spaces: Listing 4 shows the XML file as it currently stands. And making things easier means having to do more homework beforehand! Although you may groan at the thought of this kind of exercise, a set of well-defined requirements can make the project run a lot more smoothly.

What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: What kind of content will the CMS handle? How is each type of content broken down? Who will be visiting the site, and what behaviors do these users expect to find?

For example, will they want to browse a hierarchical list of articles, search for articles by keyword, see links to related articles, or all three?

What do the site administrators need to do? For example, they may need to log in securely, create content, edit content, publish content, and delete content. If your CMS will provide different roles for administrative users — such as site administrators, editors, and writers — your system will become more complex. In the world of XML, each of these different types of content is, naturally enough, called a document type.

You also have to know how each of these content types will break out into its separate components, or metadata. Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track.

The final challenge — to define various types of metadata — can be a blessing in disguise. In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type.

For example, the client might start to track the date on which an article is first drafted. When was it first published? When should it automatically be removed from the site, or archived? How is this document uniquely identified in the system? Who holds the copyright to it?

What other content is it related to? Which keywords describe the content for indexing or search purposes in other words, how do we find the content? Who should have access to the content the entire public, only site subscribers, or company staff? Does the CMS view an article body as being separate from headings and paragraphs, or are all these items seen as one big lump of XML?

Gathering metadata can be very tricky.

At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that. However, we may later decide that we want site visitors to search or browse articles by author. In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID.

Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name. It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings. The other two are site functionality and site design.

Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up.

Site Behavior Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more.

Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on. It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has.

It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site. A workflow is simply a set of rules that allow you to define who does what, when, and how. For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site.

In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments. Defining your Content Types We want to publish articles and news stories on our site.

We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there.

Results of the validation will appear under the Results area, as illustrated in Figure 1. For most purposes, an online resource will do the job nicely. If you work in a company that has an established software development group, chances are that one of the XML-savvy developers has already set up a good validating parser.

This project will help ground your skills as you obtain firsthand experience with practical XML development techniques, issues, and processes. It usually consists of the following components:.

Before you build any kind of CMS, first you must gather information that defines the basic requirements for the project. The goal of the CMS is to make things easier for those who need to develop and run the site. And making things easier means having to do more homework beforehand! Although you may groan at the thought of this kind of exercise, a set of well-defined requirements can make the project run a lot more smoothly.

What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: In the world of XML, each of these different types of content is, naturally enough, called a document type. You also have to know how each of these content types will break out into its separate components, or metadata.

Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track.

The final challenge — to define various types of metadata — can be a blessing in disguise. In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type.

For example, the client might start to track the date on which an article is first drafted. Gathering metadata can be very tricky. At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that. However, we may later decide that we want site visitors to search or browse articles by author. In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID.

Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name. It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings.

The other two are site functionality and site design.

XML basics for new users

Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up. Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more.

Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on. Our CMS will need to have an administrative component for each content type.

Create PDF from Extensible Markup Language (XML) files

It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has.

It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site. A workflow is simply a set of rules that allow you to define who does what, when, and how. For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site.

In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments. We want to publish articles and news stories on our site. We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there.

The articles in our CMS will be the mainstay of our site. In addition to the article text, each of our articles will be endowed with the following pieces of metadata:.

Furthermore, because we need to identify each article in our system uniquely with an ID of some sort, it makes sense to add an id attribute to the root element that will contain this value. A unique identifier will ensure that no mistakes occur when we try to edit, delete, or view an existing article.

Now, each of our articles will have an author, so we need to reserve a spot for that information. Our article will need a headline, a short description, a publication date, and some keywords. The keyword listing can be handled in one of two ways. This approach will satisfy the structure nuts out there, but it turns out to be too complicated for the way we will eventually use these keywords. We also need to track status information on the article. However, you probably already see that status is very similar to keyword listings in that it has the potential to belong to many different content types.

As such, it makes sense to centralize this information. As most of our content will be displayed in a Web browser, it makes sense to use as many tags as possible that a browser like IE or Firefox can already understand. But for the purposes of our article storage system, we want to treat all of the HTML tags and text that make up the document body as a simple text string, rather than having to handle every single HTML tag that could appear in the article body.

My goal for that chapter was to show you how flexible XML really is. It is both a style sheet specification and a kind of programming language that allows you to transform an XML document into the format of your choice: XPath is a language for locating and processing nodes in an XML document.

Because each XML document is, by definition, a hierarchical structure, it becomes possible to navigate this structure in a logical, formal way i. A document type definition DTD is a set of rules that governs the order in which your elements can be used, and the kind of information each can contain.

While a DTD can provide only general control over element ordering and containment, schemas are a lot more specific. They can, for example, allow elements to appear only a certain number of times, or require that elements contain specific types of data such as dates and numbers. Both technologies allow you to set rules for the contents of your XML documents. If you need to share your XML documents with another group, or you must rely on receiving well-formed XML from someone else, these technologies can help ensure that your particular set of rules is properly followed.

The ability of XML to allow you to define your own elements provides flexibility and scope. XML namespaces attempt to keep different semantic usages of the same XML elements separate and unambiguous. In our example, each person could define their own namespace and then prepend the name of their namespace to specific tags: No one in their right mind could reasonably expect them all to switch to XML overnight.

But we can expect that some of these pages — and a large percentage of the new pages that are being coded as you read this — will make the transition thanks to XHTML. As you can see, the XML family of technologies is a pretty big group — those XML family reunions are undoubtedly interesting!

Although this means that some ideas take quite a while to reach fruition, and tend to be built by committee, it also means that no single vendor is in total control of XML.

And this, as Martha Stewart might say, is a good thing. So, what do you say? Not sure? Well, put bluntly, the Web has reached a point at which just about anything will fly when it comes to HTML documents.

Take a look at the following snippet:. Believe it or not, that snippet will render without a problem in most Web browsers. And so will this:. But, exactly what does this mean? Use this with CSS to minimize presentational clutter.

XML Namespaces were invented to rectify a common problem: Imagine you were running a bookstore and had an inventory file called inventory. A human being could probably figure out that one title has nothing to do with the other, but an application that tried to sort it out would go nuts. We need to have a way to distinguish between the two different semantic universes in which these identical terms exist. Your inventory file stores information about books on the shelf, but the sales file stores information about books that have been bought by customers.

In either situation, regardless of the chasm that lies between the contexts of these identical terms, we need a way to properly label each context. Namespaces to the rescue! To use and declare a namespace, we must first tie the namespace to a URI. URIs can take the following forms:. Uniform Resource Locator: Uniform Resource Name: For example, all published books have an ISBN. However, armed with the ISBN, you could walk into the store, ask an employee to search for you, and they could take you right to the book provided, of course, that it was in stock.

We want to use our namespace throughout our XML documents, though, and the last thing we want to do is type out an entire URI every time we need to distinguish one context from another.

So, we define a prefix to represent our namespace to ease the strain on our typing fingers:. The agreed way to do that is to prefix the namespace declaration with xmlns: At this point, we have something useful. If we needed to, we could add our prefix to appropriate elements to disambiguate I love that term! In most cases, placing your namespace declarations will be rather easy.

Please note, however, that namespaces have scope. Namespaces affect the element in which they are declared, as well as all the child elements of that element. However, if you want to limit your namespace scope to a certain part of a document, feel free to do so — remembering, of course, that this can get pretty tricky. It would become pretty tiresome to have to type a prefix for every single element in a document. Notice the non-prefixed namespace: On the other side of the coin, all XSLT elements must be given the xsl: This document contains a root element letter that contains three other elements to , from , and message , each of which contains text.

When you display your XML document, you should see something similar to Figure 2. Figure 2. As you can see, CSS did a marvelous job of rendering a nicely shaded box around the entire letter, setting fonts, and even displaying things like margins and padding.

Strictly speaking, the CSS standard does allow for this sort of thing with the content property, which can produce generated text before and after document elements.

Think of it as a tool that you can use to transform your XML documents into other documents. Here are some of the possibilities:. XSLT is a rules-based, or functional language. Because XSLT can be a little bewildering even for veteran programmers, the best way to tackle it is to walk through a series of examples. Keeping both these elements simple will give us the opportunity to step through the major concepts involved.

They must therefore follow the rules that apply to all XML documents: The version attribute is required. The xmlns: In our example, we will use an xsl prefix on all the stylesheet-related tags in our XSL documents to associate them with this namespace. The next element will be the output element, which is used to define the type of output you want from the XSL file. Now we come to the heart of XSLT — the template and apply-templates elements.

Together, these two elements make the transformations happen. Put simply, the XSLT processor for our immediate purposes, the browser starts reading the input document, looking for elements that match any of the template elements in our style sheet. When one is found, the contents of the corresponding template element tells the processor what to output before continuing its search. Where a template contains an apply-templates element, the XSLT processor will search for XML elements contained within the current element and apply templates associated with them.

The first thing we want to do is match the letter element that contains the rest of our document. This is fairly straightforward:.

This very simple batch of XSLT simply states: Were the value simply letter , the template would match letter elements throughout the document. By default, apply-templates will match not only elements, but text and even whitespace between the elements as well.

XSLT processors have a set of default, or implicit templates, one of which simply outputs any text or whitespace it encounters. We do this with another XPath expression: Each of these templates matches one of the elements we expect to find inside the letter element: In each case, we output a text label e. The last thing we have to do in the XSL file is close off the stylesheet element that began the file:. Left this way, the output would look something like this:.

Each of our three main templates begins with a line break and then some whitespace before the label, which is being carried through to the output.

But wait — what about the line break and whitespace that ends each template? Well by default, the XSLT standard mandates that whenever there in only whitespace including line breaks between two tags, the whitespace should be ignored.

But when there is text between two tags e.

XML basics for new users

The vast majority of XML books and tutorials out there completely ignore these whitespace treatment issues. Best to get a good grasp of it now, rather than waiting for insanity to set in when you least expect it. All it does is output the text it contains, even if it is just whitespace.

Notice how each template now outputs its label e. This gives us the fine control over formatting that we need when outputting a plain text file. Are we done yet? Not quite. When you view the XML document in Firefox, you should see something similar to the result pictured in Figure 2. Internet Explorer interprets the result as HTML code, even when the style sheet clearly specifies that it will output text.

As a result, whitespace is collapsed and our whole document appears on one line. For this reason, it is not yet practical to rely on browser support for XSLT in a real-world website.

You should see something similar to Figure 2. What happens if you need to transform your own XML document into an XML document that meets the needs of another organization or person? Not to worry — XSLT will save the day! You see, Web browsers only supply collapsible tree formatting for XML documents without style sheets. XML documents that result from a style sheet transformation are displayed without any styling at all, or at best are treated as HTML — not at all the desired result.

There are several things that need to be added to your style sheet to signal to the browser that the document is more than a plain XML file, though. Here we have declared a default namespace for tags without prefixes in the style sheet.

Next up, we can flesh out the output element to more fully describe the output document type: In addition to the method and indent attributes, we have specified a number of new attributes here: Internet Explorer for Windows displays XHTML documents in Quirks Mode when this declaration is present, so by omitting it we can ensure that this browser will display it in the more desirable Standards Compliance mode. The rest of the style sheet is as it was for the HTML output example we saw above. Now, we need to identify exactly what we need for our news items, binary files, and Web copy.

We must also manage and track site administrators using XML. Compared to our article content type, news will be fairly straightforward.An XML document can have some empty tags that do not have anything inside and can be expressed as a single tag instead of as a set of beginning and end tags. XML namespaces attempt to keep different semantic usages of the same XML elements separate and unambiguous.

Applications of XML Although there are countless numbers of applications that use XML, here are a few examples of the current platforms and applications that are making use of this technology: You can create content and mark it up with delimiting tags, making each word, phrase, or chunk into identifiable, sortable information. If you do not have novaPDF installed, use the Download page to download and install it.

How do you do that? XPath also has a number of useful functions built in.