VARUNA JAYASIRI

@vpj

Small data preparation

May 24, 2015

One of the biggest small data problems, is that data is structured in different ways. Some legacy systems give out reports in the formats like below.

SUPPLIER STATEMENT                         38 November 2011

Supplier   B0001   SupplierX Corp
                   51, AAA STREET, BBB
===========================================================
DATE     REF.No.        DESCRIPTION       DEBIT      CREDIT
===========================================================
01/04/10                Balance B/F                1,000.00

01/05/10 AAAA-00001     PAYMENT          500,00
01/06/10 AAAA-00002     PAYMENT          250.00

01/07/10 IIII-00001     INVOICE                    2,000.00
                                      ---------  ----------
                                         750.00    3,000.00

31/03/11                Balance C/F                2,250.00

Contd. ......


SUPPLIER STATEMENT                         38 November 2011

Supplier   B0002   SupplierY Corp
                   51, ACD STREET, BBB
===========================================================
DATE     REF.No.        DESCRIPTION       DEBIT      CREDIT
===========================================================
01/04/10                Balance B/F                1,500.00

01/05/10 AAAA-00003     PAYMENT        1,500,00

01/07/10 IIII-00002     INVOICE                    2,000.00
01/08/10 IIII-00003     INVOICE                    1,000.00
                                      ---------  ----------
                                       1,500.00    4,500.00

31/03/11                Balance C/F                3,500.00

Contd. ......

Getting this into a spreadsheet or a database table to analyse it is not easy. So I started a project to help transform data in various report formats into simple tables and export as comma-separated files.

The project is called cellular, and it is still in the early stages. I thought of writing about it to get ideas and suggestions. It has a simple user interface with the table on the left side with a sidepane on right side.

Here's a small screen cast of using cellular to transform the below report.

<<< <iframe width="550" height="310" src="https://www.youtube.com/embed/AcSIzSQIDQ8?autoplay=0" frameborder="0" allowfullscreen></iframe> >>> Here's a small screen cast of using **cellular** to transform the below report. One of the biggest **small data** problems, is that data is structured in different ways. Some legacy systems give out reports in the formats like below. ``` SUPPLIER STATEMENT 38 November 2011 Supplier B0001 SupplierX Corp 51, AAA STREET, BBB =========================================================== DATE REF.No. DESCRIPTION DEBIT CREDIT =========================================================== 01/04/10 Balance B/F 1,000.00 01/05/10 AAAA-00001 PAYMENT 500,00 01/06/10 AAAA-00002 PAYMENT 250.00 01/07/10 IIII-00001 INVOICE 2,000.00 --------- ---------- 750.00 3,000.00 31/03/11 Balance C/F 2,250.00 Contd. ...... SUPPLIER STATEMENT 38 November 2011 Supplier B0002 SupplierY Corp 51, ACD STREET, BBB =========================================================== DATE REF.No. DESCRIPTION DEBIT CREDIT =========================================================== 01/04/10 Balance B/F 1,500.00 01/05/10 AAAA-00003 PAYMENT 1,500,00 01/07/10 IIII-00002 INVOICE 2,000.00 01/08/10 IIII-00003 INVOICE 1,000.00 --------- ---------- 1,500.00 4,500.00 31/03/11 Balance C/F 3,500.00 Contd. ...... ///This is what SAP exports look like. Getting this into a spreadsheet or a database table to analyse it is not easy. So I started a project to help transform data in various report formats into simple tables and export as comma-separated files. The project is called **<<http://vpj.github.io/cellular/(cellular)>>**, and it is still in the early stages. I thought of writing about it to get ideas and suggestions. It has a simple user interface with the table on the left side with a sidepane on right side. >>> <<< <iframe src="https://ghbtns.com/github-btn.html?user=vpj&repo=cellular&type=star&count=true&size=large" frameborder="0" scrolling="0" width="160px" height="30px"></iframe> ###**<<http://vpj.github.io/cellular/(demo)>>**