Weird XML/HTML format

Topics: Developer Forum
Jun 26, 2013 at 2:55 PM
I'm having a problem with a weirdly formatted excel file, it opens ok in Windows Excel (with a warning about being in an incorrect format) but I can't seem to get it to read via PHPExcel. I have tried with auto load (incorrectly identifies as CSV) and HTML and Excel2003XML but both say its invalid format. Just wondered if anyone had seen anything like this before and could recommend a reader to use or a fix for it? I've only included the header of the file as the data is sensitive but can provide a copy blanked out if required:
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel">
<head>
  <meta name="generator" content="Traxsource Label System">
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta name="ProgId" content="Excel.Sheet">
  <meta name="Generator" content="Microsoft Excel 11">
  <link rel="File-List" href="traxsource_files/filelist.xml">
  <link rel="Edit-Time-Data" href="traxsource_files/editdata.mso">
  <link rel="OLE-Object-Data" href="traxsource_files/oledata.mso">
  <!--[if gte mso 9]>
    <xml>
     <o:DocumentProperties>
      <o:Author>Traxsource</o:Author>
      <o:LastAuthor>Marc</o:LastAuthor>
      <o:Created>2011-02-01T15:29:06Z</o:Created>
      <o:LastSaved>2011-02-01T15:30:45Z</o:LastSaved>
      <o:Version>11.9999</o:Version>
     </o:DocumentProperties>
    </xml>
  <![endif]-->

  <style type="text/css">
<!--table
  {mso-displayed-decimal-separator:"\.";
  mso-displayed-thousand-separator:"\,";}
  @page
  {margin:1.0in .75in 1.0in .75in;
  mso-header-margin:.5in;
  mso-footer-margin:.5in;}
  tr
  {mso-height-source:auto;}
  col
  {mso-width-source:auto;}
  br
  {mso-data-placement:same-cell;}
  .style0
  {mso-number-format:General;
  text-align:general;
  vertical-align:bottom;
  white-space:nowrap;
  mso-rotate:0;
  mso-background-source:auto;
  mso-pattern:auto;
  color:windowtext;
  font-size:10.0pt;
  font-weight:400;
  font-style:normal;
  text-decoration:none;
  font-family:Arial;
  mso-generic-font-family:auto;
  mso-font-charset:0;
  border:none;
  mso-protection:locked visible;
  mso-style-name:Normal;
  mso-style-id:0;}
  td
  {mso-style-parent:style0;
  padding-top:1px;
  padding-right:1px;
  padding-left:1px;
  mso-ignore:padding;
  color:windowtext;
  font-size:10.0pt;
  font-weight:400;
  font-style:normal;
  text-decoration:none;
  font-family:Arial;
  mso-generic-font-family:auto;
  mso-font-charset:0;
  mso-number-format:General;
  text-align:general;
  vertical-align:bottom;
  border:none;
  mso-background-source:auto;
  mso-pattern:auto;
  mso-protection:locked visible;
  white-space:nowrap;
  mso-rotate:0;}
  .xl24
  {mso-style-parent:style0;
  font-size:14.0pt;
  font-weight:700;
  font-family:Arial, sans-serif;
  mso-font-charset:0;
  text-align:left;}
  .xl25
  {mso-style-parent:style0;
  text-align:right;}
  .xxl24
  {mso-style-parent:style0;
  font-family:"";
  mso-generic-font-family:auto;
  mso-font-charset:128;
  text-align:center;
  border:.5pt solid black;
  background:silver;
  mso-pattern:auto none;}
  .xxl25
  {mso-style-parent:style0;
  font-family:"";
  mso-generic-font-family:auto;
  mso-font-charset:128;
  border:.5pt solid silver;
  mso-background-source:auto;
  mso-pattern:auto;
  white-space:nowrap;}
  .xxl26
  {mso-style-parent:style0;
  mso-number-format:"\0022\0022\#\,\#\#0\.00";
  font-family:"";
  mso-generic-font-family:auto;
  mso-font-charset:128;
  text-align:right;
  border:.5pt solid silver;
  mso-background-source:auto;
  mso-pattern:auto;
  white-space:nowrap;}
  .xx127
  {mso-style-parent:style0;
  font-family:"";
  mso-generic-font-family:auto;
  mso-font-charset:128;
  text-align:right;
  border:.5pt solid silver;
  mso-background-source:auto;
  mso-pattern:auto;
  white-space:nowrap;}
  
  -->
  </style>
<!--[if gte mso 9]>
<xml>
 <x:ExcelWorkbook>
  <x:ExcelWorksheets>
   <x:ExcelWorksheet>
    <x:Name>Traxsource Royalties</x:Name>
    <x:WorksheetOptions>
     <x:Print>
      <x:ValidPrinterInfo/>
      <x:HorizontalResolution>600</x:HorizontalResolution>
      <x:VerticalResolution>600</x:VerticalResolution>
     </x:Print>
     <x:Selected/>
     <x:Panes>
      <x:Pane>
       <x:Number>1</x:Number>
       <x:ActiveRow>1</x:ActiveRow>
      </x:Pane>
     </x:Panes>
     <x:ProtectContents>False</x:ProtectContents>
     <x:ProtectObjects>False</x:ProtectObjects>
     <x:ProtectScenarios>False</x:ProtectScenarios>
    </x:WorksheetOptions>
   </x:ExcelWorksheet>
  </x:ExcelWorksheets>
  <x:WindowHeight>12585</x:WindowHeight>
  <x:WindowWidth>15315</x:WindowWidth>
  <x:WindowTopX>360</x:WindowTopX>
  <x:WindowTopY>30</x:WindowTopY>
  <x:ProtectStructure>False</x:ProtectStructure>
  <x:ProtectWindows>False</x:ProtectWindows>
 </x:ExcelWorkbook>
</xml>
<![endif]-->

  <title>Traxsource Excel Export</title>
</head>

<body>
  
  <table x:str="" border="0" cellspacing="0" cellpadding="0" style="border-collapse: collapse;table-layout:fixed;width:500px">
    <tr>
    ... (table data here)
  </tr>
 </table>
</body>
</html>
   
I did try forcing the isValid check in the HTML reader to true in the hopes it would work but sadly no luck.
Coordinator
Jun 27, 2013 at 11:31 AM
Edited Jun 27, 2013 at 11:32 AM
It looks pretty arbitrary. I don't know what application has been used to create this; but it's an odd mix of HTML and XML.
Does the section marked
... (table data here)
actually contain data, or that literal text?

If it did comprise a series of tr/td and values, then I would expect it to load in PHPExcel with the HTML Reader; but non of the readers will pick up on the xml block; and nor does MS Excel when I tried it
Jun 20, 2014 at 8:19 AM
you can find some sample codes and easy way to read excel with PHPExcel. in this post:
http://stackoverflow.com/questions/9695695/how-to-use-phpexcel-to-read-data-and-insert-into-database
hope it helps

Junior developer for wanting to learn? check code 128 generator excel if you are interested
Dec 16, 2014 at 2:38 AM
I also met some problem with Excel.I am doing a test about creating code128 for MS Excel.

We're getting an .XLS file from a customer that Calc v4.0.1 can't handle. When I try to open it in Calc, it's treated as a text file for import. If you accept import, Calc errors out.

I've tried opening the file in Excel 2003 and 2010 and it opens successfully but if I try to open it in the latest version of MS Excel Viewer, it errors out saying it can't open files of this type.

Below is the top portion of this file and it opens as readable code in a text editor. A normal .XLS file is binary and displays as gobble-de-gook in a text editor.

I'm guessing this is some kind of MS scripting language that isn't really an .XLS file but creates one when it loaded into Excel.

Is there any way to get Calc to handle it?