with Lorelle and Brent VanFossen

Importing Into WordPress with the Import-mt

I finally got the import-mt.php import file that comes with the default WordPress installation to work. Whew! What a long and hard ride, but I learned a lot and I hope you will find some lessons here, too.

Some of the following information might be a little redundant, but if you are finding this page first, and not reading through all my previous attempts YET, then this article will be of more value than the others. I do recommend you take a peek at them as you will learn a whole lot about this process.

The magic of making the import for MoveableType work for static HTML pages having consistent formatting on every page. On my original pages, every title was in an H2 heading which is never used elsewhere on the page. The author is also in a unique div, as is the content and footer, and so on. The more uniform your page’s code layout, the easier this process will be.

First, copy your HTML code to a text editor. If you use Word or WordPerfect or other word-processing program, you are asking for mistakes unless you are an expert at tweaking that program so it will not convert quote marks or hyphens into character codes, will not screw up your html tags, and – well, if you know what I’m talking about, you can use a word processor. If you don’t – don’t use one. Trust me. You are only asking for totally borked code. So get a powerful text or html editor and you are good to go.

The only criteria, and this makes things a little more difficult, is that the program should have extensive search and replace capabilities, specifically searching and replacing multiple lines of text.

During this process, we recommend you make many backups along the way for you will make mistakes. It’s part of the process. Make many and frequent backups. Name the backups by the date and time so you can easily go back to the most recent one if you make a mistake.

During the extensive search and replace process to turn your html document into the form needed for the import-mt, your goal is to emulate the following example of the end result you need in order to import your data using this import technique that emulates the MoveableType import/export format. The information MUST BE IN THIS EXACT ORDER AND LAYOUT.

--------
AUTHOR: Author Name
TITLE: Title of the Post or Article
STATUS: Publish
ALLOW COMMENTS: 0
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY: Home
CATEGORY: About
DATE: 7/10/2000 03:10:03 PM
-----
BODY:
<Article text is here with html and all kinds of information including <i class="red">html using quotes</i> lots of lists and other information....on and on to the end.</p>
-----
EXTENDED BODY:
A rattling on of the article information.
-----
EXCERPT:
A summary of the information which is about nothing important.
-----
KEYWORDS:
fred, sally, nothing, important, but keywords, here
-----
COMMENT:
AUTHOR: ben
EMAIL: something@something.org
IP: 123-45-6789
URL: http://www.asite.com
DATE: 10/07/2002 06:58:26 PM
So...How did you do?
-----
COMMENT:
AUTHOR: fred smith
EMAIL: somethingelse@something.org
IP: 123-45-6789
URL: http://www.asite.com
DATE: 10/08/2002 08:58:34 AM
Comment here that rattles on about something important.
-----
--------
AUTHOR: Silly Person
TITLE: Some Fascinating Idea
STATUS: Publish
ALLOW COMMENTS: 2
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY:
DATE: 10/05/2002 03:10:03 PM
-----
BODY:
.....and it continues on.

Begin The Process

Copy each html page into the editor and put an 8 dashed line (——–) in between each html page, taking advantage of the doctype or <html> that begins every web page to use for your search and replace. This dashed line is the divider between your “records” (individual web pages).

Now, we are going to use that 8 dash line as our starting point for the next search and replace sequence. Search for the 8 dashes followed by a hard return (line break):

——–

and replace it with the 8 dashes, the hard return (line break) and the following:

--------
AUTHOR: Author Name
TITLE: Title of the Post or Article
STATUS: Publish
ALLOW COMMENTS: 2
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY: Home
DATE: 7/10/2000 03:10:03 PM
-----
BODY:

Between BODY and the line above is a five dash line (—–). This is the separator between fields.
Adjust the information to your needs. Some of this information may need to be individually searched and replaced. For example, my articles have the author name in a unique DIV, so I searched and replaced:

<div id=author>Author Name</div>

with:

AUTHOR: Author Name

Replacing “Author Name” with the correct name.

At the end of the post, before comments and other information, at the end of the post information, search and replace the post ending code with a five dash line. For example, I had a DIV that states:

<div id="next"><p><a title="next article in the series href="article42.html">Next Article: Article Name 42</a></p></div>

This is consistent (with a different file name) at the end of every article, so I could easily replace:

<div id="next">

with the 5 dash line as the field separator:

-----

Now, you will have a lot of excessive information still left in your file, meta tags, sidebar information, CSS, and other lines of code that you won’t need any more. If they are consistent, search and replace to get rid of them. If they aren’t, get rid of as much as you can and then you will have to clean up the rest manually.

HTML Meets XHTML

To make sure that everything is XML compliant and ready for WordPress, I went through and checked all the code. Here are a list of my search and replaces:

  • <hr> --> <hr />
  • <br> --< <br />
  • Curly Quotes to text quotes (no character codes only quote marks)
  • Curly Apostrophes to plain apostrophes (no character codes only apostrophes)
  • [hyphen] --> - (hard encoded dash – usually formed by holding down the Cntrl+hyphen key)
  • Double Blank Lines --> Single Blank Lines
  • img tag endings from "> to " /> (inspect each one before changing as "> is found on hyperlinks)

These are the most common. Your html code may be different, so either have it converted using special software or manually inspect for your own needs.

Manually Check the Data

Besides checking for XML and non-friendly WordPress characters, I go through the data and clean up things that either might mess up the import, or that just need cleaning.

For instance, occasionally I would break a title into two lines with a line break. This won’t work, so I manually had to go through and clean those out. I sometimes also use ID in the DIV such as <h2 id=”information”> which would be missed in a simple <h2> search and replace. This has to be caught and corrected.

Other code, tags, and information that didn’t belong any more, and that wasn’t consistent across multiple pages, had to be deleted. This is time consuming, but it has to be removed.

When you think you have it all cleaned up, you probably don’t. Go through every bit of data and double check it. As you go, you will need to fill in the “missing” information from your earlier search and replaces. Check the following:

Author
Enter the name(s) of the article or post author(s). If these names deviate in any way from the spelling of the WordPress administrator, new Author User Profiles will be created and the maximum permission level these users will have is 9. By ignorance, I set my User Profile name to be “Lorelle” and yet my import listed “Lorelle VanFossen”. “Lorelle” has level 10 top administrator status, but “Lorelle VanFossen” is stuck at user level 9. Do check your name carefully if you are the only author so your posts will be put into the administrator’s name.
Title
Add the title for each article or post. < a title="article about the compromises made by switching to WordPress" href="index.php?p=588">Think about the post title as you add it. If you choose to use permalinks, these will become the new link titles for your post. If your title was “Another Day at the Office” the permalink would become:

http://example.com/another-day-at-the-office

If this is the point of the post, leave it. If the story of the post is about the copy machine breaking down and your attempts to confront its innards results in a burst of toner that covers you from head to toe and the panic that followed, maybe the title should be more fitting and be “Assaulted by Copier” or something more memorable.

STATUS
You have two choices for your post status: publish or draft. If ”’draft”’ is chosen, the post will be added to your draft post list and you can access and edit it from the Write Post screen from a link below the menu tabs. If you choose ”’publish”’ the post will be immediately viewable after import on your website.
ALLOW COMMENTS
You have two choices here, too. If you put the number 1 here, comments will be open. If you use a number 0, comments for this post will be closed. You can open them later, and it’s recommended that you set them to be closed until you have finished all the editing and checks after the import has been made, or people might be writing comments about how messed up the post is and you will spend more time checking comments than cleaning up your site. It’s up to you.

CONVERT BREAKS
Again, you have two value choices. To not convert breaks, use 0, and to convert breaks, use 1.
ALLOW PINGS
To allow pings, use the value 1, and to turn them off on this post, use 0.

PRIMARY CATEGORY
You have two choices for your categories, a primary category and a subcategory, called “category”. WordPress uses parent categories and the subcategories are also known as children categories. If your post has only one category, list it here. The category can be one or more words with spaces or dashes in between, but no commas or other characters.
Category
If your post is also in a subcategory or is a child category, state the subcategory here. If the post isn’t in a subcategory, remove this line.
Date
Manually edit the date based upon the American format of month/date/year and include the time as follows:

07/21/2003 03:10:03 PM

If you want a different date format, that is controlled from within WordPress. WordPress sorts posts chronologically from most recent to oldest. If dates aren’t important to your content, they are important to creating an order to the posts in a series. Posts which run in series should be dated as follows:

Article 1 March 15
Article 2 March 14
Article 3 March 13
Article 4 March 12
Article 5 March 11
Article 6 March 10

This way, the order is preserved even if the dates are unimportant.

Recheck Manually

Eyes get tired going through all this data, but take time to rest them by saving this information and then coming back to it at least six hours later. Overnight is even better. This way, you are refreshed and ready to look at all of this with new eyes.

Look for little details that might have gotten forgotten or missed in the first manual edits. Stray unwanted code might have been missed, a title or author forgotten, or some other detail. Go through it all with a mental magnifying glass to see if you can catch anything that you don’t want or that might get in the way of the import.

Make sure that every web page record has the record divider of the 8 dashes and the fields are separated by 5 dashes. If anything is blank, delete it. Leave only the barest essentials you need for the import.

If you are really serious about this, spell check your post content and then do one more thorough edit to see if there is anything more than can be cleaned up.

With the HTML inside of the post content, make sure that all the tags are still there, and all open tags are closed and all self-closing tags are closed.

Things You Need To Know

A lot of questions come up during these imports and here are a few of them with the answers:

What about my intra-site links?
Intra-site links are links within a post or article that link to another one on your site. Leave them. They will import just fine and you can later either manually edit them to the correct URL address or redirect the links through the use of the .htaccess file rewrites and redirects.
Will quote marks or apostrophes halt my import?
Unlike an import directly into the MySQL database, WordPress’s import-mt process ignores quotes and apostrophes in the post content section so they will import without any problems. Just leave them alone.
What about links to my graphics and photographs? Will I lose them?
If you leave your graphics and photographs in the same folders that they currently reside in, and remove the relative links to them, so instead of the image link being:

../../photos/travel/spain/barcelona42.jpg

You remove the dots and slashes to the following, if photos is in your site’s root folder:

/photos/travel/spain/barcelona42.jpg

If it isn’t, then add the parent folder before photos folder with a forward slash in front of it. If you keep all your graphics and photographs in specific folders, this can be easily changed later, after the import, if you have problems seeing the images.

What will happen to my styles listed in the head of each web page?
If you have any styles listed in the head of any web pages, this information must be moved into the core style.CSS for the WordPress Theme you are using, or set into another separate style sheet that you can add later using one of the many WordPress conditional tags within the WordPress PHP Loop. If it is not saved, it will be lost, as such information must be removed from the import file.

Any inline styles such as:

<p style="font-size:110%; color: green; margin: 10px">

will remain undisturbed so you can leave them.

Will it import duplicate posts?
By default, the WordPress import scripts will not import duplicate posts, saving you a lot of time and effort to track those down.

Begin the Import

Once you have triple checked your import document, and your concerns have been answered, then it is time to put WordPress’ import-mt.php to work.

Save the file as import.txt, making sure that the file is indeed a text file and not any other type. Upload the file to the wp-admin folder on your WordPress site. Then direct your browser to the following address, using your specific website information:

http://example.com/wp-admin/import-mt.php

The rest of the process is up to WordPress.

If you do get an error, it is usually very specific so you can track it down. Or not all of the posts will import. It will show you the list of what has been imported. If some haven’t been imported, this is usually because the 8 and 5 dashes lines weren’t set right, or there is some other detail that is not right in the import file, like “category” being misspelled. Carefully check that post against the other ones to help find the error.

If you find the error, you can either copy and paste that page record into its own import.txt file and repeat the import, or reimport the fixed original import.txt files, as WordPress will not permit duplicate posts to be imported. The fixed post will be imported and the rest should be ignored.

If you are really having trouble with a particular post, then manually add it to WordPress through the Administration Write Post screen, copying each bit of information into the right spot.

If it worked great and everything imported, then it’s time to start checking the results in WordPress. To view the new material, type your site’s URL in the browser and crawl around looking. There may be a few little bits and pieces that aren’t right, but you can now go into the WordPress Admin area and edit your posts to clean up these details. As long as they are in the database, you can do anything you want with them.

Our goal as been achieved!

4 Comments

  • WordPress › Error

    There has been a critical error on this website.

    Learn more about troubleshooting WordPress.